Better Few-Shot Relation Extraction with Label Prompt Dropout Peiyuan Zhang and Wei Lu StatNLP Research Group

2025-05-06 0 0 2.83MB 11 页 10玖币
侵权投诉
Better Few-Shot Relation Extraction with Label Prompt Dropout
Peiyuan Zhang and Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
peiyuan_zhang@sutd.edu.sg, luwei@sutd.edu.sg
Abstract
Few-shot relation extraction aims to learn
to identify the relation between two entities
based on very limited training examples. Re-
cent efforts found that textual labels (i.e., rela-
tion names and relation descriptions) could be
extremely useful for learning class representa-
tions, which will benefit the few-shot learning
task. However, what is the best way to lever-
age such label information in the learning pro-
cess is an important research question. Exist-
ing works largely assume such textual labels
are always present during both learning and
prediction. In this work, we argue that such
approaches may not always lead to optimal re-
sults. Instead, we present a novel approach
called label prompt dropout, which randomly
removes label descriptions in the learning pro-
cess. Our experiments show that our approach
is able to lead to improved class representa-
tions, yielding significantly better results on
the few-shot relation extraction task.1
1 Introduction
Enabling machines to comprehend sentences and
extract relations between entities has been a cru-
cial task in Natural Language Processing (NLP).
Conventional methods frame this task as a multi-
class classification problem, trying to solve it
through large-scale supervised training with LSTM
(Hochreiter and Schmidhuber,1997) or BERT (De-
vlin et al.,2019) as the backbone (Zhou et al.,2016;
Zhang et al.,2017;Yamada et al.,2020). Such an
approach has shown great effectiveness. However,
one problem left unsolved is to identify novel re-
lations with only a handful of training examples.
Therefore, recent studies (Han et al.,2018;Gao
et al.,2019b) introduce the task of few-shot rela-
tion extraction (FSRE) to study this data scarcity
problem.
Aligned with the success of few shot learn-
ing in Computer Vision (Sung et al.,2018;Sator-
1
Code available at
https://github.com/jzhang38/LPD
Figure 1: An example of 2-way-1-shot learning using
label prompt dropout (LPD). Top: Instead of assuming
textual labels are always present for support instances,
LPD randomly drops out such textual labels. Here the
textual label “country of origin” for the second instance
is droppoed out. Bottom: LPD directly concatenates
the textual label and the context sentence. The textual
label serves as a prompt to guide BERT to derive a bet-
ter class prototype. Note that for simplicity we use the
relation names here, while in our implementation we
use relation descriptions, which are lengthier and more
complex.
ras and Estrach,2018), most attempts in FSRE
adopt a meta learning framework (Santoro et al.,
2016;Vinyals et al.,2016) that randomly samples
episodes with different label sets from the training
data to mimic the few shot scenario in the testing
phase. As a meta learning approach, prototypi-
cal network (Snell et al.,2017) aims to learn a
class-agnostic metric space. A query instance is
classified as the class that has the nearest prototype
during inference.
While the BERT-based prototypical networks
(Baldini Soares et al.,2019;Peng et al.,2020a)
have shown impressive performance on FSRE, the
class prototypes are only constructed through the
average representation of support instances of each
class, neglecting the textual labels that may provide
additional useful information. Therefore, recent ef-
forts try to modify the prototypical network such
that it can use the label information as well. Yang
arXiv:2210.13733v1 [cs.CL] 25 Oct 2022
et al. (2020) insert both entity type information
and relation descriptions to the model. Dong et al.
(2021) use a relation encoder to generate relation
representation besides the sentence encoder. Han
et al. (2021a) propose a hybrid prototypical net-
work that can generate hybrid prototypes from con-
text sentences and relation descriptions. Nonethe-
less, these methods largely assume that every sup-
port instance is provided with a corresponding tex-
tual label in the support set during both learning
and prediction. We argue that injecting textual la-
bels to all support instances may render the training
task unchallenging, because the model can largely
rely on the textual labels during training, and thus
results in poor performance during testing when
faced with unseen relations and textual labels. Ide-
ally, textual labels should be treated as additional
source of information, such that the model can
work with or without the textual labels, as shown
in the top part in Figure 1.
In this work, we propose a novel approach called
Label Prompt Dropout (LPD). We directly con-
catenate the textual label and the context sentence,
and feed them together to the Transformer encoder
(Vaswani et al.,2017). The textual label serves as
alabel prompt
2
to guide and regularize the Trans-
former encoder to output a label-aware relation
representation through self-attention. During train-
ing, we randomly drop out the prompt tokens to
create a more challenging scenario, such that the
model has to learn to work with and without the
relation descriptions. Experiments show our ap-
proach achieves significant improvement on two
standard FSRE datasets. Extensive ablation stud-
ies are conducted to demonstrate the effectiveness
of our approach. Furthermore, we highlight a po-
tential issue with the evaluation setup of previous
research efforts, in which the pre-training data con-
tains relation types that actually overlap with those
in the test set. We argue that this may not be a de-
sirable setup for few-shot learning, and show that
the performance gain of existing efforts may be
partly due to this “knowledge leakage” issue. We
2
In this work, we use the terms label prompt,relation de-
scription, and textual label interchangeably. However, our
method differs from the conventional prompt-based model
in which a verbalizer (Schick and Schütze,2021) is needed.
We use relation description to construct a natural language
sentence for each instance to better make use of implicit
knowledge acquired by language models during pre-training.
This goal is similar to that of the conventional prompt-based
method. This is why we call call our method label prompt
dropout.
propose to filter out all the overlapping relation
types in the pre-training data and conduct more rig-
orous few-shot evaluation. In summary, we make
the following contributions:
We present LPD, a novel label prompt dropout
approach that makes better use of the textual
labels in FSRE. This simple design has signif-
icantly outperformed previous attempts that
fuse the textual label and the context sentence
using complex network structures.
We identify the limitation of the previous ex-
perimental setup in the literature and propose
a stricter setup for evaluation in FSRE. For
both setups, we show strong improvements
over the previous state of the art.
2 Related Work
2.1 Few-Shot Relation Extraction
Few-shot relation extraction (FSRE) aims to train
a model that can classify instances into novel re-
lations with only a handful of training examples.
Han et al. (2018) are the first to introduce a large
scale benchmark for FSRE, in which they evalu-
ate a model in
N
-way-
K
-shot settings. Gao et al.
(2019a) propose a hybrid attention-based proto-
typical network to handle the diversity and noise
problem of text data. Qu et al. (2020) model the re-
lationship between different relations via Bayesian
meta-learning on relation graphs. Han et al. (2021a)
apply an adaptive focal loss and hybrid networks
to model the different difficulties of different rela-
tions.
Another line of work focuses on further training
pre-trained language models (PLMs) on the task of
relation extraction (RE). Based on the hypothesis
that sentences with the same entity pairs are likely
to express the same relation, Baldini Soares et al.
(2019) collect a large-scale pre-training dataset
and propose a “matching the blanks” pre-training
paradigm. Peng et al. (2020a) present an entity-
masked contrastive pre-training framework for re-
lation extraction. Dong et al. (2021) introduce a
semantic mapping approach to include relation de-
scriptions in the pre-training phase. Inspired by
these works, we propose a contrastive pre-training
with label prompt dropout approach to use relation
descriptions during pre-training while creating a
more difficult setup by dropping out the relation
descriptions.
Figure 2: The framework of LPD. We prepend label prompt at the front of context sentences, and dropout the label
prompt with probability α(αpre-train,αtrain,αtest for the pre-training, training, and testing stage, respectively). Top:
we follow Peng et al. (2020b) to use a knowledge graph to distantly annotate the pre-training corpus. Bottom Left:
during training, label prompt in the support set is randomly dropped out, while there is no label prompt for the
query instance. Bottom right: during testing,αtest is set to zero, meaning that all support instances are equipped
with label prompts.
2.2 Prompt-Based Fine-Tuning
Prompt-based models have shown promising per-
formance in few-shot and zero-shot learning in
many recent studies (Brown et al.,2020;Schick
and Schütze,2021;Shin et al.,2020). Models in
this line of research try to align the downstream
fine-tuning task with the pre-training masked lan-
guage modeling objective (Devlin et al.,2019) to
better use the pre-trained language model’s latent
knowledge. Han et al. (2021b) use prompt tuning
with rules to perform relation classification. Liu
et al. (2022) introduce “Multi-Choice Matching
Networks” that construct prompts by concatenat-
ing multiple relation descriptions.
However, unlike many other tasks in NLP where
the label semantics are straightforward, such as
positive/negative” in binary sentiment analysis,
the relation types in relation extraction can be
quite complex, often requiring lengthy sentences as
their descriptions. For example, relation P2094 in
FewRel is described to be “official classification by
a regulating body under which the subject (events,
teams, participants, or equipment) qualifies for in-
clusion”. Prompt-based models struggle in this
case because they require the template to be fixed
(e.g., the number of
[MASK]
tokens in the prompt
template has to be fixed). Previous approaches had
to rely on manually designed prompt templates and
use relation names instead of relation descriptions.
To tackle this problem, we propose to directly use
the entire relation description as the prompt without
any mask tokens. While in conventional prompt-
based models, prompts are used to create natural
descriptions such that the model can perform bet-
ter prediction at the
[MASK]
positions, the label
prompt used in this work uses natural descriptions
to help regularize the model to output a better class
representation.
3 Task Definition
For an FSRE task, each instance
(x, e, y)
is composed of a context sentence
x=
{x1, x2, x3, ..., xm}
, where
xi
stands for the in-
put token of position
i
; entity positions
e=
{ehead, etail}
, where
ehead
refers to the head entity
span and
etail
refers to the tail entity span; and a
label
y={ytext, ynum}
, where
ytext
is the textual
label and ynum is the numerical label.
Let
Etrain,Eval,Etest
be the training, validation,
and test dataset with mutually exclusive label sets.
Under the meta-learning paradigm, each dataset
consists of multiple episodes, each with a support
set
S
and query set
Q
. For
N
-way-
K
-shot learn-
ing, the support set
S={sn
k;n= 1, ..., N, k =
1, ..., K}
contains
N
different classes. Inside each
class there are
K
different support instances. Our
job is to predict the correct label
y∈ {y1, ..., yN}
for each query instance
q
in the query set. In this
work, we will follow the continued pre-training
setup (Peng et al.,2020a), so there is another
摘要:

BetterFew-ShotRelationExtractionwithLabelPromptDropoutPeiyuanZhangandWeiLuStatNLPResearchGroupSingaporeUniversityofTechnologyandDesignpeiyuan_zhang@sutd.edu.sg,luwei@sutd.edu.sgAbstractFew-shotrelationextractionaimstolearntoidentifytherelationbetweentwoentitiesbasedonverylimitedtrainingexamples.Re-c...

展开>> 收起<<
Better Few-Shot Relation Extraction with Label Prompt Dropout Peiyuan Zhang and Wei Lu StatNLP Research Group.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.83MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注