Generative Prompt Tuning for Relation Classiﬁcation Jiale Han1 Shuai Zhao1 Bo Cheng1 Shengkun Ma1and Wei Lu2 1State Key Laboratory of Networking and Switching Technology

2025-05-06 0 0 1.3MB 16 页 10玖币

侵权投诉

Generative Prompt Tuning for Relation Classiﬁcation

Jiale Han1, Shuai Zhao1, Bo Cheng1, Shengkun Ma1and Wei Lu2

1State Key Laboratory of Networking and Switching Technology,

Beijing University of Posts and Telecommunications

2StatNLP Research Group, Singapore University of Technology and Design

{hanjl,zhaoshuaiby,chengbo,mashengkun}@bupt.edu.cn, luwei@sutd.edu.sg

Abstract

Using prompts to explore the knowledge con-

tained within pre-trained language models for

downstream tasks has now become an active

topic. Current prompt tuning methods mostly

convert the downstream tasks to masked lan-

guage modeling problems by adding cloze-

style phrases and mapping all labels to verbal-

izations with ﬁxed length, which has proven

effective for tasks with simple label spaces.

However, when applied to relation classiﬁca-

tion exhibiting complex label spaces, vanilla

prompt tuning methods may struggle with la-

bel verbalizations with arbitrary lengths due

to rigid prompt restrictions. Inspired by the

text inﬁlling task for pre-training generative

models that can ﬂexibly predict missing spans,

we propose a novel generative prompt tun-

ing method to reformulate relation classiﬁca-

tion as an inﬁlling problem, which frees our

approach from limitations of current prompt

based approaches and thus fully exploits rich

semantics of entity and relation types. In ad-

dition, we design entity-guided decoding and

discriminative relation scoring to generate and

align relations effectively and efﬁciently dur-

ing inference. Extensive experiments under

fully supervised settings and low-resource set-

tings demonstrate the effectiveness of our ap-

proach.

1 Introduction

Relation classiﬁcation (RC) is a fundamental task

in natural language processing (NLP), aiming to de-

tect the relations between the entities contained in

a sentence. With the rise of a series of pre-trained

language models (PLMs) (Devlin et al.,2019;Liu

et al.,2019;Lewis et al.,2020;Raffel et al.,2020),

ﬁne-tuning PLMs has become a dominating ap-

proach to RC (Joshi et al.,2020;Xue et al.,2021;

Zhou and Chen,2021). However, the signiﬁcant

objective gap between pre-training and ﬁne-tuning

may hinder the full potential of pre-trained knowl-

edge for such a downstream task.

positive

negative

label set

great

bad

verbalizations

Input Template

Best pizza ever !  It was [MASK] .

(a) text classiﬁcation

Bidirectional Encoder

<s>organization person top members or em

organization person top members or employe

Christinais theWNOpera's director.

Input

Christinais theWNOpera's director.

[MASK] .

per:date_of_birth

org:founded_by

per:alternate_names

org:top_members/employees

label set

verbalizations

Input

Template

(b) relation classiﬁcation

...

Relation betweenWNOperaandChristinais

Source

[v0][MASK]WNOpera[v1][MASK]Christina[v2][MASK].

Template

Target

Source Target

Bidirectional Encoder

Christinais theWNOpera's director.

[v0][MASK]WNOpera[v1][MASK]Christina[v2][MASK]



<s>organization person top members or employees

organization person top members or employees</s>

Bidirectional Encoder

[v0]  WNOpera[v1] Christina[v2]      .

[MASK] [MASK] [MASK]

Source Target

organization person top members or employees<s>

organization person top members or employees </s>

Source Targ

Source Target

Bidirectional Encoder

A[MASK] C E

(a) MLM

[CLS]Christinais the

(b) Vanilla

Prompt Tuning

[MASK]

Christinais theWNOpera's director.

Input

Template

[MASK] [MASK] [MASK]

[X] B D E[Y] [Z]

Bidirectional

Encoder

A[X] C[Y] F

Autoregressive

Decoder

[v0] W

Christina

[X]

Figure 1: Examples of prompt tuning applied to (a) text

classiﬁcation and (b) relation classiﬁcation. Existing

prompt-based approaches are effective when the label

space is simple, but struggle in cases where labels re-

quire more complex and elaborate descriptions. We can

see from Figure (b) that different classes own label to-

kens of arbitrary lengths, and it may not always be easy

to map them to verbalizations of the same length with-

out losing semantic information.

To this end, prompt tuning (Brown et al.,2020;

Schick and Schütze,2021a,b;Liu et al.,2021a)

has been recently proposed and proven effective

especially for low-resource scenarios (Gao et al.,

2021;Scao and Rush,2021). The core idea is

to convert the objective of downstream tasks to

be closer to that of the pre-training tasks, which

designs a template to reformulate input examples as

cloze-style phrases and a verbalizer to map labels

to candidate words. By predicting the mask token,

we can determine the label of the input example.

One disadvantage of prompt tuning is the rigid

template restriction, in which the number and po-

sition of masked tokens are typically ﬁxed. As

presented in Figure 1, when the label space is

simple, downstream tasks can easily adapt to this

paradigm (Hambardzumyan et al.,2021;Lester

arXiv:2210.12435v1 [cs.CL] 22 Oct 2022

et al.,2021), which predicts one verbalization token

at one masked position. However, when applying

prompt tuning to RC with complex label space that

conveys rich semantic information, vanilla prompt

tuning methods may struggle with handling com-

plex label verbalizations with varying lengths. As

an attempt to resolve this issue, Han et al. (2021c)

abridge different labels into verbalizations of ﬁxed

length, which, however, may lead to loss of im-

portant semantic information. Sainz et al. (2021)

convert RC to an entailment problem with hand-

crafted verbalizations as hypothesis. Such an ap-

proach requires expert efforts, making it difﬁcult

to adapt to new datasets and tasks.

We argue that the fundamental reason for this

limitation is that the existing prompt tuning meth-

ods imitate masked language modeling (MLM),

which predicts only one token at one masked po-

sition. Different from MLM, the text inﬁlling task

(Zhu et al.,2019) for pre-training generative mod-

els (Lewis et al.,2020;Raffel et al.,2020) appears

to be more compatible with RC. The task drops

consecutive spans of tokens and learns to predict

not only which but also how many tokens are miss-

ing from each snippet. Following this paradigm

allows the model to generate an arbitrary number

of tokens at multiple prediction slots.

This paper proposes a novel Generative Prompt

Tuning method (GenPT), which reformulates RC

as a text inﬁlling task to eliminate the rigid prompt

restrictions and thus fully exploit the label seman-

tics. Entity type information is further injected

thanks to the ﬂexible task format, which is crucial

for RC (Zhou and Chen,2021). Speciﬁcally, we

construct a multi-slot continuous prompt, in which

the template converts input sentences to inﬁlling

style phrases by leveraging three sentinel tokens

as placeholders and desires to ﬁll in the label ver-

balizations of head entity type, tail entity type, and

relation, respectively. Trainable continuous prompt

embeddings are employed to avoid manual prompt

engineering. In addition, how to efﬁciently deter-

mine the ﬁnal class label is a practical problem

when applying generative models to discrimina-

tive tasks. We design entity-guided decoding and

relation scoring strategies to align the generated se-

quences with the pre-deﬁned set of labels, making

the prediction process more effective and efﬁcient.

Extensive experiments are conducted on four

widely used relation classiﬁcation datasets under

fully supervised and low-resource settings. Com-

pared to a series of strong discriminative and gen-

erative baselines, our method achieves better per-

formance, especially in cases where relations are

rarely seen during training, demonstrating the effec-

tiveness of our approach. Our main contributions

are summarized as follows:1

•

We reformulate RC as a text inﬁlling task

and propose a novel generative prompt tuning

method, which eliminates the rigid prompt

restrictions and makes full use of semantic

information of entity types and relation labels.

•

We design entity-guided decoding and dis-

criminative relation scoring strategies to pre-

dict relations effectively and efﬁciently.

•

Experiments on four datasets demonstrate the

effectiveness of our model in both fully super-

vised and low-resource settings.

2 Background

2.1 MLM and Text Inﬁlling

Masked language modeling (Taylor,1953) is

widely adopted as a pre-training task to obtain a

bidirectional pre-trained model (Devlin et al.,2019;

Liu et al.,2019;Conneau and Lample,2019). Gen-

erally speaking, a masked language model (MLM)

randomly masks out some tokens from the input

sentences. Each

[MASK]

corresponds to one to-

ken. The objective is to predict the masked word

based on the rest of the tokens (see Figure 2(a)).

Different from MLM which only predicts one to-

ken for one

[MASK]

, the text inﬁlling task for pre-

training seq2seq models (Raffel et al.,2020;Lewis

et al.,2020) can ﬂexibly recover spans of different

lengths. As shown in Figure 2(b), the text inﬁlling

task samples a number of text spans with differ-

ent lengths from the original sentence. Then each

span is replaced with a single sentinel token. The

encoder is fed with the corrupted sequence, and

the decoder sequentially produces the consecutive

tokens of dropped-out spans delimited by sentinel

tokens. This task is more ﬂexible and can be more

compatible with some complex downstream tasks,

but is now signiﬁcantly overlooked.

2.2 Prompt-Tuning of PLMs

For standard ﬁne-tuning of classiﬁcation, the in-

put instance

is converted to a token sequence

Our code is available at

https://github.com/

hanjiale/GenPT.

Bidirectional Encoder

A[MASK] C E

(a) MLM

[MASK]

Bidirectional Encoder Autoregressive Decoder

organization person top members or employees[X] [Y] [Z] [W]

[v0] WNOpera[v1] Christina[v2]    

Christinais theWNOpera's director.

Input Template

[X] [Y] [Z]

A[X] C E

[Y]

B D E

[Y][X] [Z]

(b) Text Infilling

Figure 2: An illustration of (a) MLM pre-training, (b) text inﬁlling pre-training, and (c) our proposed generative

prompt tuning approach for RC.

x=[CLS] x[SEP]

. The model predicts the out-

put classes by feeding the input sequence into

PLMs and adding a classiﬁer on top of the

[CLS]

representations, which introduces extra parameters

and makes it hard to generalize well, especially for

low-resource setting. To this end, prompt-tuning

is proposed to make the downstream task consis-

tent with the pre-training task. Current prompt-

tuning approaches mainly cast tasks to cloze-style

questions to imitate MLM. Formally, a prompt con-

sists of two key components (Schick and Schütze,

2021a), a template and a verbalizer. The tem-

plate

T(·)

reformulates the original input

as a

cloze-style phrase

T(x)

by adding a set of addi-

tional tokens and one

[MASK]

token. The verbalizer

φ:R→V

maps task labels

to textual tokens

, where

refers to a set of label words in the

vocabulary of a language model

. In this way,

a classiﬁcation task is transformed into an MLM

task:

P(r∈ R|x) = P([MASK] =φ(r)|T(x))

Most prompt tuning methods include one mask

token in the template and map each label to one

verbalization token to predict classes. Although

effective, it is hard to handle tasks with complex

label spaces involving labels of varying lengths.

3 Approach

As presented in Figure 2(c), this paper consid-

ers relation classiﬁcation as a text inﬁlling style

task, which takes the sequence

T(x)

processed by

the template as source input and outputs a target

sequence

to predict relations. The problem def-

inition is formally given in Section 3.1. We ﬁrst

introduce how to construct entity-oriented prompts

in Section 3.2, and then present the model and

training objective in Section 3.3. The inference de-

tails including entity-guided decoding and relation

scoring are discussed in Section 3.4.

3.1 Problem Deﬁnition

Formally, given an instance

x= [x1, x2, ..., x|x|]

with head and tail entity mentions

and

span-

ning several tokens in the sequence, as well as

entity types

and

, relation classiﬁcation task

is required to predict the relation

r∈ R

between

the entites, where

is the set of possible relations.

represents the corresponding label verbalization.

Take a sentence

“Christina is the Washing-

ton National Opera’s director” with relation

“org:top_members/employees” as an example,

and

are “Christina” and “Washington National

Opera”, and their entity types are “organization”

and “person” respectively. The relation label ver-

balization r=“top members or employees”2.

3.2 Entity-Oriented Prompt Construction

We design an entity-oriented continuous template

T(·)combining entity mentions and type informa-

tion, which uses a series of learnable continuous

tokens (Liu et al.,2021b) as prompts rather than

handcrafted token phrases. Speciﬁcally, for an in-

put sentence

with two marked entities

and

instead of utilizing a template with discrete tokens

like “

The relation between

[X] eh

and

[Y] et

[Z]

.” which is hand-crafted and it is hard to ﬁnd

The relation label verbalization

is derived from label

which involves removing attribute words “org:”, discarding

symbols of “_”, and replacing “/” with “or”.

the optimal prompt, we leverage a few learnable

continuous tokens to serve as prompts that can be

optimized by gradient descent.

T(x) = x.[v0:n0−1][X] eh[vn0:n1−1]

[Y] et[vn1:n2−1][Z].

where

[vi]∈Rd

refers to the

-th continuous token

in the template, and

n1−n0

, and

n2−n1

are the lengths of these token phrases. We add

three sentinel tokens in the template, where

[X]

and

[Y]

in front of entity mentions denote type

information of head and tail entities, and

[Z]

used to represent relation label tokens. The target

sequence then consists of head and tail entity types

and label verbalizations, delimited by the sentinel

tokens used in the input plus a ﬁnal sentinel token

[W].

y=[X] th[Y] tt[Z] r[W]

Our prompt is ﬂexible to handle predicted to-

kens with arbitrary lengths at arbitrary positions,

beneﬁting from the generative text inﬁlling form.

3.3 Model and Training

Given a PLM

and a template

T(x)

as input, we

map

T(x)

into embeddings in which the continu-

ous tokens are mapped to a sequence of continuous

vectors,

e(x), h0, ..., hn0−1,e([X]),e(eh), hn0, ...,

hn1−1,e([Y]),e(et), hn1, ..., hn2−1,e([Z])

where

e(·)

is the embedding layer of

hi∈

are trainable embedding tensors with random

initialization,

is the embedding dimension of

and

0≤i<n2

. We feed the input embeddings

to the encoder of the model, and obtain hidden

representations hof the sentence:

h=Enc(T(x))

At the

-th step of the decoder, the model attends

to previously generated tokens

y<j

and the encoder

output

, and then predicts the probability of the

next token:

p(yj|y<j , T (x)) = Dec(y<j ,h)

We train our model by minimizing the negative

log-likelihood of label text

tokens given

T(x)

input:

Lgen =−

|y|

j=1

logp(yj|y<j , T (x))

relation

Autoregressive Decoder

organization person top members or

[X] [Y] [Z]

org:top_members/employees

...

score

top

members

Vocabulary

employees

Figure 3: Entity-guided decoding and relation scoring.

3.4 Entity-Guided Decoding and Scoring

We propose a simple yet effective entity-guided

decoding strategy, which exploits entity type infor-

mation to implicitly inﬂuence the choice of pos-

sible candidate relations. As shown in Figure 3,

at the beginning of decoding, instead of only in-

putting the start-of-sequence token

<s>

to the de-

coder, we also append the entity type tokens. With

y=<s> [X] th[Y] tt[Z]

as initial decoder inputs

that serves as “preamble”, the model iteratively

predicts the subsequent tokens:

pyj=p(yj|ˆ

y, y<j , T (x))

where

pyj

is the probability of token

at the

-th

prediction step. We sum the predicted probabilities

of tokens in a label verbalization and normalize it

by verbalization length to get the prediction score

of the corresponding relation. Formally, for each

relation

r∈ R

with its label verbalization

, the

prediction score sris calculated as follows:

sr=1

|r|

j=1

prj

where

prj

represents the probability of token

the

-th step of decoding. In this simple way, we

can easily align the generated sequences with the

label set. Following the work of Sainz et al. (2021),

we discard relations that do not match the entity

types of instances. The sentence is classiﬁed into

the relation with the highest score.

4 Experimental Setup

4.1 Datasets and Setups

Following the work of Han et al. (2021c); Chen

et al. (2022); Zhou and Chen (2021), we conduct ex-

periments on popular RC datasets TACRED (Zhang

et al.,2017), TACREV (Alt et al.,2020), and Re-

TACRED (Stoica et al.,2021). Wiki80 (Han et al.,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GenerativePromptTuningforRelationClassicationJialeHan1,ShuaiZhao1,BoCheng1,ShengkunMa1andWeiLu21StateKeyLaboratoryofNetworkingandSwitchingTechnology,BeijingUniversityofPostsandTelecommunications2StatNLPResearchGroup,SingaporeUniversityofTechnologyandDesign{hanjl,zhaoshuaiby,chengbo,mashengkun}@bupt...

展开>> 收起<<

Generative Prompt Tuning for Relation Classiﬁcation Jiale Han1 Shuai Zhao1 Bo Cheng1 Shengkun Ma1and Wei Lu2 1State Key Laboratory of Networking and Switching Technology.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Generative Prompt Tuning for Relation Classiﬁcation Jiale Han1 Shuai Zhao1 Bo Cheng1 Shengkun Ma1and Wei Lu2 1State Key Laboratory of Networking and Switching Technology

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: