Fine-grained Contrastive Learning for Deﬁnition Generation

2025-04-22 0 0 474KB 12 页 10玖币

侵权投诉

Hengyuan Zhang1*, Dawei Li2*, Shiping Yang3, Yanran Li4†

1Shenzhen International Graduate School, Tsinghua University

2Halicio˘

glu Data Science Institute, University of California, San Diego

3School of Computer Science, Beijing University of Posts and Telecommunications

4Independent Researcher

zhang-hy22@mails.tsinghua.edu.cn, dal034@ucsd.edu,

yangshiping@bupt.edu.cn, yanranli.summer@gmail.com

Abstract

Recently, pre-trained transformer-based mod-

els have achieved great success in the task

of deﬁnition generation (DG). However, pre-

vious encoder-decoder models lack effective

representation learning to contain full seman-

tic components of the given word, which leads

to generating under-speciﬁc deﬁnitions. To

address this problem, we propose a novel

contrastive learning method, encouraging the

model to capture more detailed semantic repre-

sentations from the deﬁnition sequence encod-

ing. According to both automatic and manual

evaluation, the experimental results on three

mainstream benchmarks demonstrate that the

proposed method could generate more speciﬁc

and high-quality deﬁnitions compared with

several state-of-the-art models.

1 Introduction

When readers ﬁnd some expressions unfamiliar

during reading a text, machines can help. The task

of Deﬁnition Generation (DG) aims to generate a

textual deﬁnition for a given word or phrase (the tar-

get), according to a surrounding context (the local

context) (Ni and Wang,2017). In addition to assist-

ing readers in comprehending expressions, the task

of DG is also useful for generating deﬁnition when

building dictionaries.

Recently, pre-trained encoder-decoder models

have achieved great successes on this task (Huang

et al.,2021;Kong et al.,2022). Despite their

successes, the deﬁnitions produced by these pre-

trained models often contain several types of er-

rors (Noraset et al.,2017;Huang et al.,2021). Ac-

cording to Table 1, “under-speciﬁc problem” is the

most frequent error that the generated deﬁnition

conforms to the general semantics but loses certain

parts of meaning of the target word. As presented

in Table 2, the deﬁnition produced by T5 model is

*Equal contribution

†Corresponding author

Error Types Ratio

Under-spciﬁed 9.0%

Over-speciﬁed 5.5%

Self-reference 3.0%

Wrong part-of-speech 1.0%

Opposite 1.0%

Table 1: Ratio of each error type of the deﬁnitions gen-

erated in Huang et al. (2021).

word double

Reference twice as great or many

Generated characterized by two equal parts

or components

Table 2: The deﬁnition of word “double”, where Refer-

ence is from WordNet dictionary and Generated is by

T5-Base of Huang et al. (2021).

under-speciﬁc as it omits the meaning of great in

the word “double” under the context “ate a dou-

ble portion”. The under-speciﬁc problem harms

the accuracy of the generated deﬁnitions and in

turn limits the applications of deﬁnition generation

techniques in many scenarios.

This problem is partially attributed to the de-

coder’s inability to fully extract the semantic com-

ponents from the word encoding (Li et al.,2020a).

For pre-trained encoder-decoder models, they fo-

cus on restoring and denoising the whole text in

the pre-training stage, rather than learning ﬁne-

grained semantic representation of a single word

or phrase (Lewis et al.,2019;Bi et al.,2020;

Shao et al.,2021). In other words, the pre-trained

encoder-decoder models are ineffective in captur-

ing rich semantic components for the given word

thus leading to generating under-speciﬁc deﬁni-

tions.

To remedy the under-speciﬁc problem in pre-

trained deﬁnition generation models, we get in-

spired from contrastive learning method (Radford

arXiv:2210.00543v1 [cs.CL] 2 Oct 2022

et al.,2021;Li et al.,2020b) and propose a novel

deﬁnition generation method based on a designed

contrastive objective. Conceptually, deﬁnition gen-

eration is to transform the encoding of the target

word to its textual interpretation. To this end, the

encoding and the decoding of the target word can

be regarded as two views of representations with

respect to the same semantics. Our idea is then to

leverage the two representations in the deﬁnition

generation model, and encourage them to align

with each other to capture ﬁne-grained semantics.

Speciﬁcally, we treat the target word representa-

tion and the deﬁnition representation as a positive

pair, and feed them into a contrastive learning ob-

jective. This kind of contrastive loss is naturally

complementary for the language generation loss,

and can be seamlessly incorporated into existing

pre-trained encoder-decoder models.

To validate the effectiveness of our proposal,

we conduct a series of experiments on three pub-

licly available datasets. Both automatic and manual

evaluation results suggest that our method gener-

ates more speciﬁc deﬁnitions and addresses well

the under-speciﬁc problem in the task of deﬁnition

generation. In general, our contributions can be

summarized as follows:

• We tackle the under-speciﬁc problem for pre-

trained deﬁnition generation models by devel-

oping a novel ﬁne-grained contrastive learning

objective.

•

We validate the effectiveness of the proposed

method through comparing with several SOTA

models on three popular datasets using both

automatic and manual judgments.1

•

We analyze the details of our method by per-

forming ablated studies and demonstrate the

effect of our method in addressing under-

speciﬁc problem based on case studies.

2 Related Work

2.1 Deﬁnition Generation

The task of Deﬁnition Generation is ﬁrstly pro-

posed by Noraset et al. (2017). They used word

embedding to generate its corresponding deﬁnition,

and utilize deﬁnition generation as an auxiliary task

for reverse dictionary and word embedding training.

Our code could be found in

https:

//github.com/rattlesnakey/

Definition-Gneration-Contrastive

Some later works explore more application scenar-

ios and model architectures for deﬁnition genera-

tion. Ni and Wang (2017) propose a dual-encoder

model to generate the proper deﬁnition of the given

word under a speciﬁc context, and use it for explain-

ing emerging words on the Internet. Gadetsky et al.

(2018) use both local and global information of the

words in their model for word disambiguation. Fol-

lowing them, Ishiwatari et al. (2019) design gate

mechanisms to fuse multi-source information of

the word and context. Furthermore, some works at-

tempt to utilize other information of the target word.

Washio et al. (2019) build relation of deﬁned and

deﬁning words using word pair embedding (Joshi

et al.,2018). Different from former works that

using distributed representations of target words,

Yang et al. (2019) introduce target words’ concepts

in HowNet (Dong and Dong,2003) as ﬁne-grained

knowledge in Chinese deﬁnition modeling. Also,

there exist literature works based on reﬁned meth-

ods to learn the target words. Both Li et al. (2020a)

and Reid et al. (2020) decompose the meaning of

the target word into a group of latent variables and

rely on variational inference for estimation.

Recently, pre-trained encoder-decoder mod-

els have been used in deﬁnition generation and

achieved great success. Bevilacqua et al. (2020)

use special tokens to mark the target word in the

context and feed them into a BART model (Lewis

et al.,2019). Huang et al. (2021) ﬁne-tune a T5

model and re-rank all the candidate results from

the T5 model to obtain deﬁnitions in a proper speci-

ﬁcity. Kong et al. (2022) design a MASS model

based on multi-task framework to generate simple

deﬁnition in an unsupervised manner. Despite of

their promising performances on deﬁnition gen-

eration, the under-speciﬁc problem has been less

investigated. Although Huang et al. (2021) de-

sign a scoring mechanism that measures deﬁni-

tions’ speciﬁcity, we argue that the fundamental

reason of the under-speciﬁc problem lies in the lack

of ﬁne-grained semantic learning in pre-trained

encoder-decoder models, which we leverage con-

trastive learning to address in this work.

2.2 Contrastive Learning in Semantic

Representation

Contrastive learning has been widely used in en-

hancing semantic information for various NLP

tasks. For example, Gao et al. (2021) use a dropout

trick to derive positive samples in the embedding

level, and then apply both supervised and self-

supervised methods to acquire better sentence em-

bedding. Radford et al. (2021) use contrastive

learning to pre-train a vision language model to

align the message between images and their cor-

responding text. Li et al. (2022) use masked lan-

guage modeling and contrastive learning to perform

multi-task pre-training, and demonstrate that con-

trastive learning beneﬁts in connecting word gloss

and its corresponding vectors. Li et al. (2020b) and

Srivastava and Vemulapati (2022) implement con-

trastive learning as an auxiliary task to encourage

the transformer encoder better capture the semantic

alignment.

In this work, we borrow the idea of using con-

trastive methods in semantic representation learn-

ing. For a given target word, there are two repre-

sentations in the task of deﬁnition generation: the

word representation generated by the encoder, and

the deﬁnition representation produced by the de-

coder. These two kinds of representations can be

regarded as two views of the semantics of the target

word to be explained. By aligning the representa-

tion spaces between the encoder and the decoder

using contrastive learning, we force the model to

pay much attention to the ﬁne-grained semantic

information during representation learning. In this

way, the under-speciﬁc problem will be mitigated

when using pre-trained encoder-decoder models to

generate deﬁnitions.

3 Method

In this section, we present our method of using

contrastive learning to enhance target words’ rep-

resentation for deﬁnition generation. Speciﬁcally,

we ﬁrst formulate the deﬁnition generation task

and introduce the denotations (Section 3.1). Then

we provide a preliminary description of the def-

inition generation processing based on T5 (Sec-

tion 3.2). Finally, we explain how to apply the

contrastive loss in the training process to solve the

under-speciﬁc problem and improve the generation

quality (Section 3.3). Figure 1depicts the overview

pipeline of our method.

3.1 Task Formulation

Given a word or phrase

W={wi, ..., wj}

and

its surrounding context

C={w0, ..., Wk}(0 <

i<j<k)

, the task of deﬁnition generation is to

generate the deﬁnition

D={d0, ...dT}

to explain

the meaning of

under

. This process can be

formulated as:

P(D|W, C) =

t=0

p(dt|d<t, W, C)(1)

3.2 Deﬁnition Generation with T5

Our work aims at addressing the under-speciﬁc

problem when using pre-trained encoder-decoder

models for deﬁnition generation. Without loss

of generality, we take T5 (Raffel et al.,2020) as

our backbone model, which is a transformer-based

encoder-decoder model trained on large-scale cor-

pus, and has demonstrated its effectiveness on deﬁ-

nition generation task (Huang et al.,2021).

To apply T5 for deﬁnition generation, we ﬁrst

concatenate the target word and the given context

together behind the preﬁx prompts “word:” and

“context:” respectively. The concatenated input is

then fed to the T5 encoder with

layers of en-

coder block

E_Block

. Then we get the last hidden

state

HLE

, which contains the semantic informa-

tion of the target word and local context:

H0= Emb(Splice(W, C)) (2)

Hl= E_Block(Hl−1), l ∈[1, LE](3)

Here

stands for the target word,

for the

given context, and

Splice

is the operation to con-

catenate the target word and the given context with

their corresponding preﬁxes. Also,

Emb

is the Em-

bedding layer that converts the input tokens into

embedding vectors.

After encoding, the T5 decoder will learn to

generate an appropriate deﬁnition conditioned on

encoding

HLE

and the previous generation result.

During decoding, the teacher-forcing mechanism

is applied to guarantee the previous information

being attended at the current step t:

t= Emb(Dt)(4)

t= D_Block(HLE,Gl−1

≤t), l ∈[1, LD](5)

Here

represents the

tth

token in the deﬁnition

sequence. After passing through

layers of the

decoder block

D_Block

, we get the decoder’s last

hidden state GLD.

Finally, a softmax function is added upon a lin-

ear head to transform

GLD

into a prediction dis-

tribution matrix

V∈R|V|×|D|

. Here

|V|

and

|D|

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Fine-grainedContrastiveLearningforDenitionGenerationHengyuanZhang1*,DaweiLi2*,ShipingYang3,YanranLi41ShenzhenInternationalGraduateSchool,TsinghuaUniversity2HaliciogluDataScienceInstitute,UniversityofCalifornia,SanDiego3SchoolofComputerScience,BeijingUniversityofPostsandTelecommunications4Independ...

展开>> 收起<<

Fine-grained Contrastive Learning for Deﬁnition Generation.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fine-grained Contrastive Learning for Deﬁnition Generation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: