Generative Prompt Tuning for Relation Classification Jiale Han1 Shuai Zhao1 Bo Cheng1 Shengkun Ma1and Wei Lu2 1State Key Laboratory of Networking and Switching Technology

2025-05-06 0 0 1.3MB 16 页 10玖币
侵权投诉
Generative Prompt Tuning for Relation Classification
Jiale Han1, Shuai Zhao1, Bo Cheng1, Shengkun Ma1and Wei Lu2
1State Key Laboratory of Networking and Switching Technology,
Beijing University of Posts and Telecommunications
2StatNLP Research Group, Singapore University of Technology and Design
{hanjl,zhaoshuaiby,chengbo,mashengkun}@bupt.edu.cn, luwei@sutd.edu.sg
Abstract
Using prompts to explore the knowledge con-
tained within pre-trained language models for
downstream tasks has now become an active
topic. Current prompt tuning methods mostly
convert the downstream tasks to masked lan-
guage modeling problems by adding cloze-
style phrases and mapping all labels to verbal-
izations with fixed length, which has proven
effective for tasks with simple label spaces.
However, when applied to relation classifica-
tion exhibiting complex label spaces, vanilla
prompt tuning methods may struggle with la-
bel verbalizations with arbitrary lengths due
to rigid prompt restrictions. Inspired by the
text infilling task for pre-training generative
models that can flexibly predict missing spans,
we propose a novel generative prompt tun-
ing method to reformulate relation classifica-
tion as an infilling problem, which frees our
approach from limitations of current prompt
based approaches and thus fully exploits rich
semantics of entity and relation types. In ad-
dition, we design entity-guided decoding and
discriminative relation scoring to generate and
align relations effectively and efficiently dur-
ing inference. Extensive experiments under
fully supervised settings and low-resource set-
tings demonstrate the effectiveness of our ap-
proach.
1 Introduction
Relation classification (RC) is a fundamental task
in natural language processing (NLP), aiming to de-
tect the relations between the entities contained in
a sentence. With the rise of a series of pre-trained
language models (PLMs) (Devlin et al.,2019;Liu
et al.,2019;Lewis et al.,2020;Raffel et al.,2020),
fine-tuning PLMs has become a dominating ap-
proach to RC (Joshi et al.,2020;Xue et al.,2021;
Zhou and Chen,2021). However, the significant
objective gap between pre-training and fine-tuning
may hinder the full potential of pre-trained knowl-
edge for such a downstream task.
positive
negative
label set
great
bad
verbalizations
Input Template
Best pizza ever !  It was [MASK] .
(a) text classification
Bidirectional Encoder
<s>organization person top members or em
organization person top members or employe
Christinais theWNOpera's director.
Input
Christinais theWNOpera's director.
[MASK] .
per:date_of_birth
org:founded_by
per:alternate_names
org:top_members/employees
label set
?
?
?
?
verbalizations
Input
Template
(b) relation classification
...
...
Relation betweenWNOperaandChristinais
Source
[v0][MASK]WNOpera[v1][MASK]Christina[v2][MASK].
Template
Target
Source Target
Bidirectional Encoder
Christinais theWNOpera's director.
[v0][MASK]WNOpera[v1][MASK]Christina[v2][MASK]
<s>organization person top members or employees
organization person top members or employees</s>
Bidirectional Encoder
[v0]  WNOpera[v1] Christina[v2]      .
[MASK] [MASK] [MASK]
Source Target
Source Target
organization person top members or employees<s>
organization person top members or employees </s>
Source Targ
Source Target
Bidirectional Encoder
B
A[MASK] C E
(a) MLM
[CLS]Christinais the
D
(c) Text Infilling
(b) Vanilla
Prompt Tuning
[MASK]
In
Christinais theWNOpera's director.
Input
Template
[MASK] [MASK] [MASK]
[X] B D E[Y] [Z]
Bidirectional
Encoder
A[X] C[Y] F
Autoregressive
Decoder
[v0] W
Christina
[X]
Figure 1: Examples of prompt tuning applied to (a) text
classification and (b) relation classification. Existing
prompt-based approaches are effective when the label
space is simple, but struggle in cases where labels re-
quire more complex and elaborate descriptions. We can
see from Figure (b) that different classes own label to-
kens of arbitrary lengths, and it may not always be easy
to map them to verbalizations of the same length with-
out losing semantic information.
To this end, prompt tuning (Brown et al.,2020;
Schick and Schütze,2021a,b;Liu et al.,2021a)
has been recently proposed and proven effective
especially for low-resource scenarios (Gao et al.,
2021;Scao and Rush,2021). The core idea is
to convert the objective of downstream tasks to
be closer to that of the pre-training tasks, which
designs a template to reformulate input examples as
cloze-style phrases and a verbalizer to map labels
to candidate words. By predicting the mask token,
we can determine the label of the input example.
One disadvantage of prompt tuning is the rigid
template restriction, in which the number and po-
sition of masked tokens are typically fixed. As
presented in Figure 1, when the label space is
simple, downstream tasks can easily adapt to this
paradigm (Hambardzumyan et al.,2021;Lester
arXiv:2210.12435v1 [cs.CL] 22 Oct 2022
et al.,2021), which predicts one verbalization token
at one masked position. However, when applying
prompt tuning to RC with complex label space that
conveys rich semantic information, vanilla prompt
tuning methods may struggle with handling com-
plex label verbalizations with varying lengths. As
an attempt to resolve this issue, Han et al. (2021c)
abridge different labels into verbalizations of fixed
length, which, however, may lead to loss of im-
portant semantic information. Sainz et al. (2021)
convert RC to an entailment problem with hand-
crafted verbalizations as hypothesis. Such an ap-
proach requires expert efforts, making it difficult
to adapt to new datasets and tasks.
We argue that the fundamental reason for this
limitation is that the existing prompt tuning meth-
ods imitate masked language modeling (MLM),
which predicts only one token at one masked po-
sition. Different from MLM, the text infilling task
(Zhu et al.,2019) for pre-training generative mod-
els (Lewis et al.,2020;Raffel et al.,2020) appears
to be more compatible with RC. The task drops
consecutive spans of tokens and learns to predict
not only which but also how many tokens are miss-
ing from each snippet. Following this paradigm
allows the model to generate an arbitrary number
of tokens at multiple prediction slots.
This paper proposes a novel Generative Prompt
Tuning method (GenPT), which reformulates RC
as a text infilling task to eliminate the rigid prompt
restrictions and thus fully exploit the label seman-
tics. Entity type information is further injected
thanks to the flexible task format, which is crucial
for RC (Zhou and Chen,2021). Specifically, we
construct a multi-slot continuous prompt, in which
the template converts input sentences to infilling
style phrases by leveraging three sentinel tokens
as placeholders and desires to fill in the label ver-
balizations of head entity type, tail entity type, and
relation, respectively. Trainable continuous prompt
embeddings are employed to avoid manual prompt
engineering. In addition, how to efficiently deter-
mine the final class label is a practical problem
when applying generative models to discrimina-
tive tasks. We design entity-guided decoding and
relation scoring strategies to align the generated se-
quences with the pre-defined set of labels, making
the prediction process more effective and efficient.
Extensive experiments are conducted on four
widely used relation classification datasets under
fully supervised and low-resource settings. Com-
pared to a series of strong discriminative and gen-
erative baselines, our method achieves better per-
formance, especially in cases where relations are
rarely seen during training, demonstrating the effec-
tiveness of our approach. Our main contributions
are summarized as follows:1
We reformulate RC as a text infilling task
and propose a novel generative prompt tuning
method, which eliminates the rigid prompt
restrictions and makes full use of semantic
information of entity types and relation labels.
We design entity-guided decoding and dis-
criminative relation scoring strategies to pre-
dict relations effectively and efficiently.
Experiments on four datasets demonstrate the
effectiveness of our model in both fully super-
vised and low-resource settings.
2 Background
2.1 MLM and Text Infilling
Masked language modeling (Taylor,1953) is
widely adopted as a pre-training task to obtain a
bidirectional pre-trained model (Devlin et al.,2019;
Liu et al.,2019;Conneau and Lample,2019). Gen-
erally speaking, a masked language model (MLM)
randomly masks out some tokens from the input
sentences. Each
[MASK]
corresponds to one to-
ken. The objective is to predict the masked word
based on the rest of the tokens (see Figure 2(a)).
Different from MLM which only predicts one to-
ken for one
[MASK]
, the text infilling task for pre-
training seq2seq models (Raffel et al.,2020;Lewis
et al.,2020) can flexibly recover spans of different
lengths. As shown in Figure 2(b), the text infilling
task samples a number of text spans with differ-
ent lengths from the original sentence. Then each
span is replaced with a single sentinel token. The
encoder is fed with the corrupted sequence, and
the decoder sequentially produces the consecutive
tokens of dropped-out spans delimited by sentinel
tokens. This task is more flexible and can be more
compatible with some complex downstream tasks,
but is now significantly overlooked.
2.2 Prompt-Tuning of PLMs
For standard fine-tuning of classification, the in-
put instance
x
is converted to a token sequence
1
Our code is available at
https://github.com/
hanjiale/GenPT.
Bidirectional Encoder
B
A[MASK] C E
(a) MLM
D
[MASK]
Bidirectional Encoder Autoregressive Decoder
Bidirectional Encoder Autoregressive Decoder
organization person top members or employees[X] [Y] [Z] [W]
[v0] WNOpera[v1] Christina[v2]    
Christinais theWNOpera's director.
Input Template
[X] [Y] [Z]
A[X] C E
[Y]
B D E
[Y][X] [Z]
(b) Text Infilling
(c) Our Generative Prompt Tuning
Figure 2: An illustration of (a) MLM pre-training, (b) text infilling pre-training, and (c) our proposed generative
prompt tuning approach for RC.
e
x=[CLS] x[SEP]
. The model predicts the out-
put classes by feeding the input sequence into
PLMs and adding a classifier on top of the
[CLS]
representations, which introduces extra parameters
and makes it hard to generalize well, especially for
low-resource setting. To this end, prompt-tuning
is proposed to make the downstream task consis-
tent with the pre-training task. Current prompt-
tuning approaches mainly cast tasks to cloze-style
questions to imitate MLM. Formally, a prompt con-
sists of two key components (Schick and Schütze,
2021a), a template and a verbalizer. The tem-
plate
T(·)
reformulates the original input
x
as a
cloze-style phrase
T(x)
by adding a set of addi-
tional tokens and one
[MASK]
token. The verbalizer
φ:R→V
maps task labels
R
to textual tokens
V
, where
V
refers to a set of label words in the
vocabulary of a language model
M
. In this way,
a classification task is transformed into an MLM
task:
P(r∈ R|x) = P([MASK] =φ(r)|T(x))
Most prompt tuning methods include one mask
token in the template and map each label to one
verbalization token to predict classes. Although
effective, it is hard to handle tasks with complex
label spaces involving labels of varying lengths.
3 Approach
As presented in Figure 2(c), this paper consid-
ers relation classification as a text infilling style
task, which takes the sequence
T(x)
processed by
the template as source input and outputs a target
sequence
y
to predict relations. The problem def-
inition is formally given in Section 3.1. We first
introduce how to construct entity-oriented prompts
in Section 3.2, and then present the model and
training objective in Section 3.3. The inference de-
tails including entity-guided decoding and relation
scoring are discussed in Section 3.4.
3.1 Problem Definition
Formally, given an instance
x= [x1, x2, ..., x|x|]
with head and tail entity mentions
eh
and
et
span-
ning several tokens in the sequence, as well as
entity types
th
and
tt
, relation classification task
is required to predict the relation
r∈ R
between
the entites, where
R
is the set of possible relations.
r
represents the corresponding label verbalization.
Take a sentence
x=
Christina is the Washing-
ton National Opera’s director” with relation
r=
org:top_members/employees” as an example,
eh
and
et
are “Christina” and “Washington National
Opera”, and their entity types are “organization
and “person” respectively. The relation label ver-
balization r=top members or employees2.
3.2 Entity-Oriented Prompt Construction
We design an entity-oriented continuous template
T(·)combining entity mentions and type informa-
tion, which uses a series of learnable continuous
tokens (Liu et al.,2021b) as prompts rather than
handcrafted token phrases. Specifically, for an in-
put sentence
x
with two marked entities
eh
and
et
,
instead of utilizing a template with discrete tokens
like “
x.
The relation between
[X] eh
and
[Y] et
is
[Z]
.” which is hand-crafted and it is hard to find
2
The relation label verbalization
r
is derived from label
r
,
which involves removing attribute words “org:”, discarding
symbols of “_”, and replacing “/” with “or”.
the optimal prompt, we leverage a few learnable
continuous tokens to serve as prompts that can be
optimized by gradient descent.
T(x) = x.[v0:n01][X] eh[vn0:n11]
[Y] et[vn1:n21][Z].
where
[vi]Rd
refers to the
i
-th continuous token
in the template, and
n0
,
n1n0
, and
n2n1
are the lengths of these token phrases. We add
three sentinel tokens in the template, where
[X]
and
[Y]
in front of entity mentions denote type
information of head and tail entities, and
[Z]
is
used to represent relation label tokens. The target
sequence then consists of head and tail entity types
and label verbalizations, delimited by the sentinel
tokens used in the input plus a final sentinel token
[W].
y=[X] th[Y] tt[Z] r[W]
Our prompt is flexible to handle predicted to-
kens with arbitrary lengths at arbitrary positions,
benefiting from the generative text infilling form.
3.3 Model and Training
Given a PLM
M
and a template
T(x)
as input, we
map
T(x)
into embeddings in which the continu-
ous tokens are mapped to a sequence of continuous
vectors,
e(x), h0, ..., hn01,e([X]),e(eh), hn0, ...,
hn11,e([Y]),e(et), hn1, ..., hn21,e([Z])
where
e(·)
is the embedding layer of
M
,
hi
Rd
are trainable embedding tensors with random
initialization,
d
is the embedding dimension of
M
,
and
0i<n2
. We feed the input embeddings
to the encoder of the model, and obtain hidden
representations hof the sentence:
h=Enc(T(x))
At the
j
-th step of the decoder, the model attends
to previously generated tokens
y<j
and the encoder
output
h
, and then predicts the probability of the
next token:
p(yj|y<j , T (x)) = Dec(y<j ,h)
We train our model by minimizing the negative
log-likelihood of label text
y
tokens given
T(x)
as
input:
Lgen =
|y|
X
j=1
logp(yj|y<j , T (x))
relation
Autoregressive Decoder
organization person top members or
[X] [Y] [Z]
org:top_members/employees
...
score
top
members
or
Vocabulary
employees
Figure 3: Entity-guided decoding and relation scoring.
3.4 Entity-Guided Decoding and Scoring
We propose a simple yet effective entity-guided
decoding strategy, which exploits entity type infor-
mation to implicitly influence the choice of pos-
sible candidate relations. As shown in Figure 3,
at the beginning of decoding, instead of only in-
putting the start-of-sequence token
<s>
to the de-
coder, we also append the entity type tokens. With
ˆ
y=<s> [X] th[Y] tt[Z]
as initial decoder inputs
that serves as “preamble”, the model iteratively
predicts the subsequent tokens:
pyj=p(yj|ˆ
y, y<j , T (x))
where
pyj
is the probability of token
yj
at the
j
-th
prediction step. We sum the predicted probabilities
of tokens in a label verbalization and normalize it
by verbalization length to get the prediction score
of the corresponding relation. Formally, for each
relation
r∈ R
with its label verbalization
r
, the
prediction score sris calculated as follows:
sr=1
|r|
|r|
X
j=1
prj
where
prj
represents the probability of token
rj
at
the
j
-th step of decoding. In this simple way, we
can easily align the generated sequences with the
label set. Following the work of Sainz et al. (2021),
we discard relations that do not match the entity
types of instances. The sentence is classified into
the relation with the highest score.
4 Experimental Setup
4.1 Datasets and Setups
Following the work of Han et al. (2021c); Chen
et al. (2022); Zhou and Chen (2021), we conduct ex-
periments on popular RC datasets TACRED (Zhang
et al.,2017), TACREV (Alt et al.,2020), and Re-
TACRED (Stoica et al.,2021). Wiki80 (Han et al.,
摘要:

GenerativePromptTuningforRelationClassicationJialeHan1,ShuaiZhao1,BoCheng1,ShengkunMa1andWeiLu21StateKeyLaboratoryofNetworkingandSwitchingTechnology,BeijingUniversityofPostsandTelecommunications2StatNLPResearchGroup,SingaporeUniversityofTechnologyandDesign{hanjl,zhaoshuaiby,chengbo,mashengkun}@bupt...

展开>> 收起<<
Generative Prompt Tuning for Relation Classification Jiale Han1 Shuai Zhao1 Bo Cheng1 Shengkun Ma1and Wei Lu2 1State Key Laboratory of Networking and Switching Technology.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.3MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注