Better Few-Shot Relation Extraction with Label Prompt Dropout Peiyuan Zhang and Wei Lu StatNLP Research Group

2025-05-06 0 0 2.83MB 11 页 10玖币

侵权投诉

Better Few-Shot Relation Extraction with Label Prompt Dropout

Peiyuan Zhang and Wei Lu

StatNLP Research Group

Singapore University of Technology and Design

peiyuan_zhang@sutd.edu.sg, luwei@sutd.edu.sg

Abstract

Few-shot relation extraction aims to learn

to identify the relation between two entities

based on very limited training examples. Re-

cent efforts found that textual labels (i.e., rela-

tion names and relation descriptions) could be

extremely useful for learning class representa-

tions, which will beneﬁt the few-shot learning

task. However, what is the best way to lever-

age such label information in the learning pro-

cess is an important research question. Exist-

ing works largely assume such textual labels

are always present during both learning and

prediction. In this work, we argue that such

approaches may not always lead to optimal re-

sults. Instead, we present a novel approach

called label prompt dropout, which randomly

removes label descriptions in the learning pro-

cess. Our experiments show that our approach

is able to lead to improved class representa-

tions, yielding signiﬁcantly better results on

the few-shot relation extraction task.1

1 Introduction

Enabling machines to comprehend sentences and

extract relations between entities has been a cru-

cial task in Natural Language Processing (NLP).

Conventional methods frame this task as a multi-

class classiﬁcation problem, trying to solve it

through large-scale supervised training with LSTM

(Hochreiter and Schmidhuber,1997) or BERT (De-

vlin et al.,2019) as the backbone (Zhou et al.,2016;

Zhang et al.,2017;Yamada et al.,2020). Such an

approach has shown great effectiveness. However,

one problem left unsolved is to identify novel re-

lations with only a handful of training examples.

Therefore, recent studies (Han et al.,2018;Gao

et al.,2019b) introduce the task of few-shot rela-

tion extraction (FSRE) to study this data scarcity

problem.

Aligned with the success of few shot learn-

ing in Computer Vision (Sung et al.,2018;Sator-

Code available at

https://github.com/jzhang38/LPD

Figure 1: An example of 2-way-1-shot learning using

label prompt dropout (LPD). Top: Instead of assuming

textual labels are always present for support instances,

LPD randomly drops out such textual labels. Here the

textual label “country of origin” for the second instance

is droppoed out. Bottom: LPD directly concatenates

the textual label and the context sentence. The textual

label serves as a prompt to guide BERT to derive a bet-

ter class prototype. Note that for simplicity we use the

relation names here, while in our implementation we

use relation descriptions, which are lengthier and more

complex.

ras and Estrach,2018), most attempts in FSRE

adopt a meta learning framework (Santoro et al.,

2016;Vinyals et al.,2016) that randomly samples

episodes with different label sets from the training

data to mimic the few shot scenario in the testing

phase. As a meta learning approach, prototypi-

cal network (Snell et al.,2017) aims to learn a

class-agnostic metric space. A query instance is

classiﬁed as the class that has the nearest prototype

during inference.

While the BERT-based prototypical networks

(Baldini Soares et al.,2019;Peng et al.,2020a)

have shown impressive performance on FSRE, the

class prototypes are only constructed through the

average representation of support instances of each

class, neglecting the textual labels that may provide

additional useful information. Therefore, recent ef-

forts try to modify the prototypical network such

that it can use the label information as well. Yang

arXiv:2210.13733v1 [cs.CL] 25 Oct 2022

et al. (2020) insert both entity type information

and relation descriptions to the model. Dong et al.

(2021) use a relation encoder to generate relation

representation besides the sentence encoder. Han

et al. (2021a) propose a hybrid prototypical net-

work that can generate hybrid prototypes from con-

text sentences and relation descriptions. Nonethe-

less, these methods largely assume that every sup-

port instance is provided with a corresponding tex-

tual label in the support set during both learning

and prediction. We argue that injecting textual la-

bels to all support instances may render the training

task unchallenging, because the model can largely

rely on the textual labels during training, and thus

results in poor performance during testing when

faced with unseen relations and textual labels. Ide-

ally, textual labels should be treated as additional

source of information, such that the model can

work with or without the textual labels, as shown

in the top part in Figure 1.

In this work, we propose a novel approach called

Label Prompt Dropout (LPD). We directly con-

catenate the textual label and the context sentence,

and feed them together to the Transformer encoder

(Vaswani et al.,2017). The textual label serves as

alabel prompt

to guide and regularize the Trans-

former encoder to output a label-aware relation

representation through self-attention. During train-

ing, we randomly drop out the prompt tokens to

create a more challenging scenario, such that the

model has to learn to work with and without the

relation descriptions. Experiments show our ap-

proach achieves signiﬁcant improvement on two

standard FSRE datasets. Extensive ablation stud-

ies are conducted to demonstrate the effectiveness

of our approach. Furthermore, we highlight a po-

tential issue with the evaluation setup of previous

research efforts, in which the pre-training data con-

tains relation types that actually overlap with those

in the test set. We argue that this may not be a de-

sirable setup for few-shot learning, and show that

the performance gain of existing efforts may be

partly due to this “knowledge leakage” issue. We

In this work, we use the terms label prompt,relation de-

scription, and textual label interchangeably. However, our

method differs from the conventional prompt-based model

in which a verbalizer (Schick and Schütze,2021) is needed.

We use relation description to construct a natural language

sentence for each instance to better make use of implicit

knowledge acquired by language models during pre-training.

This goal is similar to that of the conventional prompt-based

method. This is why we call call our method label prompt

dropout.

propose to ﬁlter out all the overlapping relation

types in the pre-training data and conduct more rig-

orous few-shot evaluation. In summary, we make

the following contributions:

•

We present LPD, a novel label prompt dropout

approach that makes better use of the textual

labels in FSRE. This simple design has signif-

icantly outperformed previous attempts that

fuse the textual label and the context sentence

using complex network structures.

•

We identify the limitation of the previous ex-

perimental setup in the literature and propose

a stricter setup for evaluation in FSRE. For

both setups, we show strong improvements

over the previous state of the art.

2 Related Work

2.1 Few-Shot Relation Extraction

Few-shot relation extraction (FSRE) aims to train

a model that can classify instances into novel re-

lations with only a handful of training examples.

Han et al. (2018) are the ﬁrst to introduce a large

scale benchmark for FSRE, in which they evalu-

ate a model in

-way-

-shot settings. Gao et al.

(2019a) propose a hybrid attention-based proto-

typical network to handle the diversity and noise

problem of text data. Qu et al. (2020) model the re-

lationship between different relations via Bayesian

meta-learning on relation graphs. Han et al. (2021a)

apply an adaptive focal loss and hybrid networks

to model the different difﬁculties of different rela-

tions.

Another line of work focuses on further training

pre-trained language models (PLMs) on the task of

relation extraction (RE). Based on the hypothesis

that sentences with the same entity pairs are likely

to express the same relation, Baldini Soares et al.

(2019) collect a large-scale pre-training dataset

and propose a “matching the blanks” pre-training

paradigm. Peng et al. (2020a) present an entity-

masked contrastive pre-training framework for re-

lation extraction. Dong et al. (2021) introduce a

semantic mapping approach to include relation de-

scriptions in the pre-training phase. Inspired by

these works, we propose a contrastive pre-training

with label prompt dropout approach to use relation

descriptions during pre-training while creating a

more difﬁcult setup by dropping out the relation

descriptions.

Figure 2: The framework of LPD. We prepend label prompt at the front of context sentences, and dropout the label

prompt with probability α(αpre-train,αtrain,αtest for the pre-training, training, and testing stage, respectively). Top:

we follow Peng et al. (2020b) to use a knowledge graph to distantly annotate the pre-training corpus. Bottom Left:

during training, label prompt in the support set is randomly dropped out, while there is no label prompt for the

query instance. Bottom right: during testing,αtest is set to zero, meaning that all support instances are equipped

with label prompts.

2.2 Prompt-Based Fine-Tuning

Prompt-based models have shown promising per-

formance in few-shot and zero-shot learning in

many recent studies (Brown et al.,2020;Schick

and Schütze,2021;Shin et al.,2020). Models in

this line of research try to align the downstream

ﬁne-tuning task with the pre-training masked lan-

guage modeling objective (Devlin et al.,2019) to

better use the pre-trained language model’s latent

knowledge. Han et al. (2021b) use prompt tuning

with rules to perform relation classiﬁcation. Liu

et al. (2022) introduce “Multi-Choice Matching

Networks” that construct prompts by concatenat-

ing multiple relation descriptions.

However, unlike many other tasks in NLP where

the label semantics are straightforward, such as

“positive/negative” in binary sentiment analysis,

the relation types in relation extraction can be

quite complex, often requiring lengthy sentences as

their descriptions. For example, relation P2094 in

FewRel is described to be “ofﬁcial classiﬁcation by

a regulating body under which the subject (events,

teams, participants, or equipment) qualiﬁes for in-

clusion”. Prompt-based models struggle in this

case because they require the template to be ﬁxed

(e.g., the number of

[MASK]

tokens in the prompt

template has to be ﬁxed). Previous approaches had

to rely on manually designed prompt templates and

use relation names instead of relation descriptions.

To tackle this problem, we propose to directly use

the entire relation description as the prompt without

any mask tokens. While in conventional prompt-

based models, prompts are used to create natural

descriptions such that the model can perform bet-

ter prediction at the

[MASK]

positions, the label

prompt used in this work uses natural descriptions

to help regularize the model to output a better class

representation.

3 Task Deﬁnition

For an FSRE task, each instance

(x, e, y)

is composed of a context sentence

{x1, x2, x3, ..., xm}

, where

stands for the in-

put token of position

; entity positions

{ehead, etail}

, where

ehead

refers to the head entity

span and

etail

refers to the tail entity span; and a

label

y={ytext, ynum}

, where

ytext

is the textual

label and ynum is the numerical label.

Let

Etrain,Eval,Etest

be the training, validation,

and test dataset with mutually exclusive label sets.

Under the meta-learning paradigm, each dataset

consists of multiple episodes, each with a support

set

and query set

. For

-way-

-shot learn-

ing, the support set

S={sn

k;n= 1, ..., N, k =

1, ..., K}

contains

different classes. Inside each

class there are

different support instances. Our

job is to predict the correct label

y∈ {y1, ..., yN}

for each query instance

in the query set. In this

work, we will follow the continued pre-training

setup (Peng et al.,2020a), so there is another

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BetterFew-ShotRelationExtractionwithLabelPromptDropoutPeiyuanZhangandWeiLuStatNLPResearchGroupSingaporeUniversityofTechnologyandDesignpeiyuan_zhang@sutd.edu.sg,luwei@sutd.edu.sgAbstractFew-shotrelationextractionaimstolearntoidentifytherelationbetweentwoentitiesbasedonverylimitedtrainingexamples.Re-c...

展开>> 收起<<

Better Few-Shot Relation Extraction with Label Prompt Dropout Peiyuan Zhang and Wei Lu StatNLP Research Group.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Better Few-Shot Relation Extraction with Label Prompt Dropout Peiyuan Zhang and Wei Lu StatNLP Research Group

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: