Fine-grained Contrastive Learning for Definition Generation

2025-04-22 0 0 474KB 12 页 10玖币
侵权投诉
Fine-grained Contrastive Learning for Definition Generation
Hengyuan Zhang1*, Dawei Li2*, Shiping Yang3, Yanran Li4†
1Shenzhen International Graduate School, Tsinghua University
2Halicio˘
glu Data Science Institute, University of California, San Diego
3School of Computer Science, Beijing University of Posts and Telecommunications
4Independent Researcher
zhang-hy22@mails.tsinghua.edu.cn, dal034@ucsd.edu,
yangshiping@bupt.edu.cn, yanranli.summer@gmail.com
Abstract
Recently, pre-trained transformer-based mod-
els have achieved great success in the task
of definition generation (DG). However, pre-
vious encoder-decoder models lack effective
representation learning to contain full seman-
tic components of the given word, which leads
to generating under-specific definitions. To
address this problem, we propose a novel
contrastive learning method, encouraging the
model to capture more detailed semantic repre-
sentations from the definition sequence encod-
ing. According to both automatic and manual
evaluation, the experimental results on three
mainstream benchmarks demonstrate that the
proposed method could generate more specific
and high-quality definitions compared with
several state-of-the-art models.
1 Introduction
When readers find some expressions unfamiliar
during reading a text, machines can help. The task
of Definition Generation (DG) aims to generate a
textual definition for a given word or phrase (the tar-
get), according to a surrounding context (the local
context) (Ni and Wang,2017). In addition to assist-
ing readers in comprehending expressions, the task
of DG is also useful for generating definition when
building dictionaries.
Recently, pre-trained encoder-decoder models
have achieved great successes on this task (Huang
et al.,2021;Kong et al.,2022). Despite their
successes, the definitions produced by these pre-
trained models often contain several types of er-
rors (Noraset et al.,2017;Huang et al.,2021). Ac-
cording to Table 1, “under-specific problem” is the
most frequent error that the generated definition
conforms to the general semantics but loses certain
parts of meaning of the target word. As presented
in Table 2, the definition produced by T5 model is
*Equal contribution
Corresponding author
Error Types Ratio
Under-spcified 9.0%
Over-specified 5.5%
Self-reference 3.0%
Wrong part-of-speech 1.0%
Opposite 1.0%
Table 1: Ratio of each error type of the definitions gen-
erated in Huang et al. (2021).
word double
Reference twice as great or many
Generated characterized by two equal parts
or components
Table 2: The definition of word “double”, where Refer-
ence is from WordNet dictionary and Generated is by
T5-Base of Huang et al. (2021).
under-specific as it omits the meaning of great in
the word “double” under the context “ate a dou-
ble portion”. The under-specific problem harms
the accuracy of the generated definitions and in
turn limits the applications of definition generation
techniques in many scenarios.
This problem is partially attributed to the de-
coder’s inability to fully extract the semantic com-
ponents from the word encoding (Li et al.,2020a).
For pre-trained encoder-decoder models, they fo-
cus on restoring and denoising the whole text in
the pre-training stage, rather than learning fine-
grained semantic representation of a single word
or phrase (Lewis et al.,2019;Bi et al.,2020;
Shao et al.,2021). In other words, the pre-trained
encoder-decoder models are ineffective in captur-
ing rich semantic components for the given word
thus leading to generating under-specific defini-
tions.
To remedy the under-specific problem in pre-
trained definition generation models, we get in-
spired from contrastive learning method (Radford
arXiv:2210.00543v1 [cs.CL] 2 Oct 2022
et al.,2021;Li et al.,2020b) and propose a novel
definition generation method based on a designed
contrastive objective. Conceptually, definition gen-
eration is to transform the encoding of the target
word to its textual interpretation. To this end, the
encoding and the decoding of the target word can
be regarded as two views of representations with
respect to the same semantics. Our idea is then to
leverage the two representations in the definition
generation model, and encourage them to align
with each other to capture fine-grained semantics.
Specifically, we treat the target word representa-
tion and the definition representation as a positive
pair, and feed them into a contrastive learning ob-
jective. This kind of contrastive loss is naturally
complementary for the language generation loss,
and can be seamlessly incorporated into existing
pre-trained encoder-decoder models.
To validate the effectiveness of our proposal,
we conduct a series of experiments on three pub-
licly available datasets. Both automatic and manual
evaluation results suggest that our method gener-
ates more specific definitions and addresses well
the under-specific problem in the task of definition
generation. In general, our contributions can be
summarized as follows:
We tackle the under-specific problem for pre-
trained definition generation models by devel-
oping a novel fine-grained contrastive learning
objective.
We validate the effectiveness of the proposed
method through comparing with several SOTA
models on three popular datasets using both
automatic and manual judgments.1
We analyze the details of our method by per-
forming ablated studies and demonstrate the
effect of our method in addressing under-
specific problem based on case studies.
2 Related Work
2.1 Definition Generation
The task of Definition Generation is firstly pro-
posed by Noraset et al. (2017). They used word
embedding to generate its corresponding definition,
and utilize definition generation as an auxiliary task
for reverse dictionary and word embedding training.
1
Our code could be found in
https:
//github.com/rattlesnakey/
Definition-Gneration-Contrastive
Some later works explore more application scenar-
ios and model architectures for definition genera-
tion. Ni and Wang (2017) propose a dual-encoder
model to generate the proper definition of the given
word under a specific context, and use it for explain-
ing emerging words on the Internet. Gadetsky et al.
(2018) use both local and global information of the
words in their model for word disambiguation. Fol-
lowing them, Ishiwatari et al. (2019) design gate
mechanisms to fuse multi-source information of
the word and context. Furthermore, some works at-
tempt to utilize other information of the target word.
Washio et al. (2019) build relation of defined and
defining words using word pair embedding (Joshi
et al.,2018). Different from former works that
using distributed representations of target words,
Yang et al. (2019) introduce target words’ concepts
in HowNet (Dong and Dong,2003) as fine-grained
knowledge in Chinese definition modeling. Also,
there exist literature works based on refined meth-
ods to learn the target words. Both Li et al. (2020a)
and Reid et al. (2020) decompose the meaning of
the target word into a group of latent variables and
rely on variational inference for estimation.
Recently, pre-trained encoder-decoder mod-
els have been used in definition generation and
achieved great success. Bevilacqua et al. (2020)
use special tokens to mark the target word in the
context and feed them into a BART model (Lewis
et al.,2019). Huang et al. (2021) fine-tune a T5
model and re-rank all the candidate results from
the T5 model to obtain definitions in a proper speci-
ficity. Kong et al. (2022) design a MASS model
based on multi-task framework to generate simple
definition in an unsupervised manner. Despite of
their promising performances on definition gen-
eration, the under-specific problem has been less
investigated. Although Huang et al. (2021) de-
sign a scoring mechanism that measures defini-
tions’ specificity, we argue that the fundamental
reason of the under-specific problem lies in the lack
of fine-grained semantic learning in pre-trained
encoder-decoder models, which we leverage con-
trastive learning to address in this work.
2.2 Contrastive Learning in Semantic
Representation
Contrastive learning has been widely used in en-
hancing semantic information for various NLP
tasks. For example, Gao et al. (2021) use a dropout
trick to derive positive samples in the embedding
level, and then apply both supervised and self-
supervised methods to acquire better sentence em-
bedding. Radford et al. (2021) use contrastive
learning to pre-train a vision language model to
align the message between images and their cor-
responding text. Li et al. (2022) use masked lan-
guage modeling and contrastive learning to perform
multi-task pre-training, and demonstrate that con-
trastive learning benefits in connecting word gloss
and its corresponding vectors. Li et al. (2020b) and
Srivastava and Vemulapati (2022) implement con-
trastive learning as an auxiliary task to encourage
the transformer encoder better capture the semantic
alignment.
In this work, we borrow the idea of using con-
trastive methods in semantic representation learn-
ing. For a given target word, there are two repre-
sentations in the task of definition generation: the
word representation generated by the encoder, and
the definition representation produced by the de-
coder. These two kinds of representations can be
regarded as two views of the semantics of the target
word to be explained. By aligning the representa-
tion spaces between the encoder and the decoder
using contrastive learning, we force the model to
pay much attention to the fine-grained semantic
information during representation learning. In this
way, the under-specific problem will be mitigated
when using pre-trained encoder-decoder models to
generate definitions.
3 Method
In this section, we present our method of using
contrastive learning to enhance target words’ rep-
resentation for definition generation. Specifically,
we first formulate the definition generation task
and introduce the denotations (Section 3.1). Then
we provide a preliminary description of the def-
inition generation processing based on T5 (Sec-
tion 3.2). Finally, we explain how to apply the
contrastive loss in the training process to solve the
under-specific problem and improve the generation
quality (Section 3.3). Figure 1depicts the overview
pipeline of our method.
3.1 Task Formulation
Given a word or phrase
W={wi, ..., wj}
and
its surrounding context
C={w0, ..., Wk}(0 <
i<j<k)
, the task of definition generation is to
generate the definition
D={d0, ...dT}
to explain
the meaning of
W
under
C
. This process can be
formulated as:
P(D|W, C) =
T
Y
t=0
p(dt|d<t, W, C)(1)
3.2 Definition Generation with T5
Our work aims at addressing the under-specific
problem when using pre-trained encoder-decoder
models for definition generation. Without loss
of generality, we take T5 (Raffel et al.,2020) as
our backbone model, which is a transformer-based
encoder-decoder model trained on large-scale cor-
pus, and has demonstrated its effectiveness on defi-
nition generation task (Huang et al.,2021).
To apply T5 for definition generation, we first
concatenate the target word and the given context
together behind the prefix prompts “word:” and
“context:” respectively. The concatenated input is
then fed to the T5 encoder with
LE
layers of en-
coder block
E_Block
. Then we get the last hidden
state
HLE
, which contains the semantic informa-
tion of the target word and local context:
H0= Emb(Splice(W, C)) (2)
Hl= E_Block(Hl1), l [1, LE](3)
Here
W
stands for the target word,
C
for the
given context, and
Splice
is the operation to con-
catenate the target word and the given context with
their corresponding prefixes. Also,
Emb
is the Em-
bedding layer that converts the input tokens into
embedding vectors.
After encoding, the T5 decoder will learn to
generate an appropriate definition conditioned on
encoding
HLE
and the previous generation result.
During decoding, the teacher-forcing mechanism
is applied to guarantee the previous information
being attended at the current step t:
G0
t= Emb(Dt)(4)
Gl
t= D_Block(HLE,Gl1
t), l [1, LD](5)
Here
Dt
represents the
tth
token in the definition
sequence. After passing through
LD
layers of the
decoder block
D_Block
, we get the decoder’s last
hidden state GLD.
Finally, a softmax function is added upon a lin-
ear head to transform
GLD
into a prediction dis-
tribution matrix
VR|V|×|D|
. Here
|V|
and
|D|
摘要:

Fine-grainedContrastiveLearningforDenitionGenerationHengyuanZhang1*,DaweiLi2*,ShipingYang3,YanranLi4†1ShenzhenInternationalGraduateSchool,TsinghuaUniversity2Halicio gluDataScienceInstitute,UniversityofCalifornia,SanDiego3SchoolofComputerScience,BeijingUniversityofPostsandTelecommunications4Independ...

展开>> 收起<<
Fine-grained Contrastive Learning for Definition Generation.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:474KB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注