Learning Robust Representations for Continual Relation Extraction via Adversarial Class Augmentation Peiyi Wang1Yifan Song1Tianyu Liu2Binghuai Lin2

2025-04-29 0 0 1.03MB 15 页 10玖币

侵权投诉

Learning Robust Representations for Continual Relation

Extraction via Adversarial Class Augmentation

Peiyi Wang1∗Yifan Song1∗Tianyu Liu2Binghuai Lin2

Yunbo Cao2Sujian Li1Zhifang Sui1

1MOE Key Laboratory of Computational Linguistics, Peking University, China

2Tencent Cloud Xiaowei

wangpeiyi9979@gmail.com; {yfsong, lisujian, szf}@pku.edu.cn

{rogertyliu, binghuailin, yunbocao}@tencent.com;

Abstract

Continual relation extraction (CRE) aims to

continually learn new relations from a class-

incremental data stream. CRE model usu-

ally suffers from catastrophic forgetting prob-

lem, i.e., the performance of old relations se-

riously degrades when the model learns new

relations. Most previous work attributes catas-

trophic forgetting to the corruption of the

learned representations as new relations come,

with an implicit assumption that the CRE

models have adequately learned the old rela-

tions. In this paper, through empirical stud-

ies we argue that this assumption may not

hold, and an important reason for catastrophic

forgetting is that the learned representations

do not have good robustness against the ap-

pearance of analogous relations in the subse-

quent learning process. To address this is-

sue, we encourage the model to learn more

precise and robust representations through a

simple yet effective adversarial class augmen-

tation mechanism (ACA), which is easy to

implement and model-agnostic. Experimen-

tal results show that ACA can consistently

improve the performance of state-of-the-art

CRE models on two popular benchmarks.

Our code is available at https://github.

com/Wangpeiyi9979/ACA.

1 Introduction

Relation extraction (RE) aims to detect the relation

of two given entities in a sentence. Traditional RE

models are trained on a ﬁxed dataset with a pre-

deﬁned relation set, which cannot handle the real-

life situation where new relations are constantly

emerging. To this end, continual relation extraction

(CRE) (Wang et al.,2019;Han et al.,2020;Cui

et al.,2021;Zhao et al.,2022;Wang et al.,2022) is

introduced. As shown in Figure 1, CRE is formu-

lated as a class-incremental problem, which trains

the model on a sequence of tasks. In each task,

*Equal contribution.

R1,R2

task1

R1,R2 R1,R2

R3,R4

R1,R2

R3,R4

R5,R6

R3,R4 R5,R6

R1: child

R2: architecture

R3: location

R4: employee

R5: father

R6: characters

task2 task3

train

test

easy dataset

continually learn new relations hard dataset

Figure 1: A demonstration for continual relation extrac-

tion with three tasks where each task involves two new

relations. The learned representations from the easy

training task can not handle the hard testing data, which

contains analogous relations inherently hard to distin-

guish, e.g., “child” and “father”.

the model needs to learn some new relations and

is evaluated on all seen relations. Like other con-

tinual learning systems, CRE models also suffer

from catastrophic forgetting, i.e., the performance

of previously learned relations seriously degrades

when learning new relations.

The mainstream research in CRE (Han et al.,

2020;Cui et al.,2021;Zhao et al.,2022;Wang

et al.,2022) mainly attributes catastrophic forget-

ting to the corruption of the learned knowledge as

new tasks come. To this end, a variety of sophisti-

cated rehearsal-based mechanisms are introduced

to better retain or recover the knowledge, such as

relation prototypes (Han et al.,2020;Cui et al.,

2021), curriculum-meta learning (Wu et al.,2021),

contrastive replay and knowledge distillation (Zhao

et al.,2022). All these methods implicitly assume

that the model has adequately learned old relations.

However, in this paper, we ﬁnd that this assumption

may not hold.

With a series of empirical studies, we observe

that catastrophic forgetting mostly happens on

some speciﬁc relations, and signiﬁcant perfor-

mance degradation tends to occur when their anal-

ogous relations appear. Based on our observations,

we ﬁnd another reason for catastrophic forgetting,

arXiv:2210.04497v1 [cs.CL] 10 Oct 2022

i.e., CRE models do not learn sufﬁciently robust

representations of relations in the ﬁrst place due

to the relatively easy training task. Taking “child”

in Figure 1as an example, because of the absence

of hard negative classes in task 1, the CRE model

tends to rely on spurious shortcuts, such as entity

types, to identify “child”. Although the learned

imprecise representations can handle the testing set

of task 1 and task 2, they are not robust enough

to distinguish “child” from its analogous relation

(“father”) in task 3. Therefore, the performance

of “child” will decrease signiﬁcantly when “father”

appears. In contrast, relations such as “architecture”

still perform well in task 3 because their analogous

relations have not yet appeared.

Recently, adversarial data augmentation emerges

as a strong baseline to prevent models from learn-

ing spurious shortcuts from the easy dataset (Volpi

et al.,2018;Zhao et al.,2020;Hendrycks et al.,

2020). Inspired by these work, we introduce a

simple yet effective Adversarial Class Augmenta-

tion (ACA) mechanism to improve the robustness

of the CRE model. Concretely, ACA utilizes two

class augmentation methods, namely hybrid-class

augmentation and reversed-class augmentation, to

build hard negative classes for new tasks. When a

task arrives, ACA jointly trains new relations with

adversarial augmented classes to learn robust rep-

resentations. Note that our method is orthogonal to

all previous work: ACA focuses on learning knowl-

edge of newly emerging relations better, while pre-

vious methods are proposed to retain or recover

learned knowledge of old relations

. Therefore,

incorporating ACA into previous CRE models can

further improve their performance.

We summarize our contributions as:

we con-

duct a series of empirical studies on two strong

CRE methods and observe that catastrophic forget-

ting is strongly related with the existence of analo-

gous relations.

we ﬁnd an important reason for

catastrophic forgetting in CRE which is overlooked

in all previous work: the CRE models suffer from

learning shortcuts to identify new relations, which

are not robust enough against the appearance of

their analogous relations.

we propose an adver-

sarial class augmentation mechanism to help CRE

models learn more robust representations. Exper-

The proposed method can be viewed as a “precaution”

that takes place in the current task to mitigate the catastrophic

forgetting on the analogous relations in the subsequent tasks.

While the prior work is more like a “remedy” for the current

task to recall the already learned knowledge in the past tasks.

imental results on two benchmarks show that our

method can consistently improve the performance

of two state-of-the-art methods.

2 Related Work

Relation Extraction

Conventional Relation Ex-

traction (RE) focuses on extracting the predeﬁned

relation of two given entities in a sentence. Re-

cently, a variety of deep neural networks (DNN)

have been proposed for RE, mainly including:

Convolutional or Recurrent neural network (CNN

or RNN) based methods (dos Santos et al.,2015;

Wang et al.,2016;Xiao and Liu,2016;Liu et al.,

2019), which can effectively extract textual fea-

tures.

Graph neural network (GNN) based

methods (Xu et al.,2015,2016;Cai et al.,2016;

Mandya et al.,2020), which jointly encode the sen-

tence with lexical features.

Pre-trained language

model (PLM) based methods (Baldini Soares et al.,

2019;Peng et al.,2020), which achieve state-of-

the-arts on RE task.

Continual Learning

Continual Learning (CL)

aims to continually accumulate knowledge from a

sequence of tasks (De Lange et al.,2019). A ma-

jor challenge of CL is catastrophic forgetting, i.e.,

the performance of previously learned tasks seri-

ously drops when learning new tasks. To this end,

prior CL methods can be roughly divided into three

groups:

Rehearsal-based methods (Rebufﬁ et al.,

2017;Wu et al.,2019) maintain a memory to save

instances of previous tasks and replay them dur-

ing training of new tasks.

Regularization-based

methods (Kirkpatrick et al.,2017;Aljundi et al.,

2018) add constraints on the weights important to

old tasks.

Architecture-based methods (Mallya

and Lazebnik,2018;Qin et al.,2021) dynamically

change model architectures to learn new tasks and

prevent forgetting old tasks. Recently, rehearsal-

based methods have been proven to be the most

effective for NLP tasks, including relation extrac-

tion. We focus on the rehearsal-based methods for

CRE in this paper.

Shortcuts Learning Phenomenon

Shortcuts

learning phenomenon denotes that DNN models

tend to learn unreliable shortcuts in datasets, lead-

ing to poor generalization ability in real-world ap-

plications (Lai et al.,2021). Recently, researchers

have revealed the shortcut learning phenomenon for

different kinds of language tasks, such as natural

language inference (He et al.,2019), information

extraction (Wang et al.,2021), reading comprehen-

sion (Lai et al.,2021) and question answering (Mu-

drakarta et al.,2018). Geirhos et al. (2020) point

out that shortcuts learning phenomenon happens

because the “Principle of Least Effort” (Kingsley,

1972), i.e., people (also animal and machine) will

naturally minimize the amount of effort to solve

tasks. Recently, data augmentation (Tu et al.,2020)

and adversarial training (Stacey et al.,2020) are

used to alleviate shortcuts learning phenomenon

with synthesized data. To the best of our knowl-

edge, we are the ﬁrst work to analyze the catas-

trophic forgetting in CRE from the perspective of

shortcuts learning, and propose an adversarial data

augmentation method to alleviate it.

3 Task Formulation

In CRE, the model is trained on a sequence of tasks

(T1, T2, ..., Tk)

. Each task

can be represented as

a triplet

(Ri, Di, Qi)

, where

is the set of new

relations,

and

are the training and testing set,

respectively. Every instance

(xj, yj)∈Di∪Qi

belongs to a speciﬁc relation

yj∈Ri

. The goal of

CRE is to continually train the model on new tasks

to learn new relations, while avoiding forgetting of

previously learned ones. More formally, in the

-th

task, the model learns new relations

from

and should be able to identify all seen relations, i.e.,

the model will be evaluated on the all seen testing

sets

j=1 Qj

. To alleviate catastrophic forgetting

in CRE, previous work (Cui et al.,2021;Han et al.,

2020;Zhao et al.,2022;Wang et al.,2022) adopts

a memory to store a few typical instances (e.g.,

)

for each old relation. In the subsequent training

process, instances in the memory will be replayed

to alleviate the catastrophic forgetting.

4 Catastrophic Forgetting in CRE:

Characteristics and Cause

In this section, we conduct a series of empirical

studies on two state-of-the-art CRE models, namely

EMAR (Han et al.,2020) and RP-CRE (Cui et al.,

2021), and two benchmarks, namely FewRel and

TACRED, to analyze catastrophic forgetting in

CRE. Please refer to Section 6.1 for details of these

two benchmarks and two CRE models.

4.1 Characteristics of Catastrophic

Forgetting

We use Forgetting Rate (FR) (Chaudhry et al.,

2018a,b) to measure the average forgetting of a

Model ID FR (%) MS F1 F1∗(∆)

EMAR

G1 1.3 0.42 95.4 97.4 (↑2.0)

G2 4.5 0.53 84.6 90.9 (↑6.3)

G3 9.4 0.62 69.8 81.5 (↑11.7)

RP-CRE

G1 1.2 0.42 95.5 97.4 (↑1.9)

G2 4.7 0.53 83.8 90.8 (↑7.0)

G3 9.9 0.63 69.5 81.5 (↑12.0)

Table 1: We divide relations of FewRel into three

groups according to their forgetting rate (FR). “MS” is

short for max similarity. F1 and F1∗are the Macro-F1

scores of EMAR/RP-CRE and the supervised model

which trains all data together, respectively. ∆is the

performance gap between two CRE models and the su-

pervised model.

relation. Assuming that relation

appears in task

the FR of

after the model has ﬁnished the tasks

sequence (T1, ..., Ti, ..., Tk)is deﬁned as:

F Rr=1

k−i

j=i+1

pdj

r(1)

pdj

r= max

l∈{i,...,j−1}F1l

r−F1j

r,(2)

where

pdj

and

F1j

are the performance degrada-

tion and F1 score of

after the model trains on task

, respectively. The sequence length

for

both FewRel and TACRED.

We divide all relations into three equal-sized

groups based on their FR from small to large. As

shown in Table 1, relations in G1 hardly suffer from

forgetting as FR is just

1.3%

and

1.2%

on EMAR

and RP-CRE, respectively. In contrast, relations

in G3 have catastrophic forgetting and the FR is

close to

10%

for both models. A similar tendency

is observed on TACRED (please refer to Appendix

Afor details). To explore why FR varies widely

among different relations, we dive into the results

of two CRE models and ask two questions.

Where catastrophic forgetting happens?

With

careful comparison between G1 and G3, we ﬁnd

that relations in G3 seem to have analogous rela-

tions in the dataset. For example, “mother” belongs

to G3, and there are its semantically analogous re-

lations, such as “spouse”, in the dataset. To con-

ﬁrm our ﬁnding, we ﬁrst deﬁne the similarity for

a pair of relations as the cosine distance of their

prototypes, i.e., the mean vanilla BERT sentence

embedding of all corresponding instances. Then,

for a certain relation, we compute its max similarity

(MS) to all the other relations in the dataset. As

shown in Table 1, MS of G3 is signiﬁcantly greater

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningRobustRepresentationsforContinualRelationExtractionviaAdversarialClassAugmentationPeiyiWang1YifanSong1TianyuLiu2BinghuaiLin2YunboCao2SujianLi1ZhifangSui11MOEKeyLaboratoryofComputationalLinguistics,PekingUniversity,China2TencentCloudXiaoweiwangpeiyi9979@gmail.com;{yfsong,lisujian,szf}@pku.e...

展开>> 收起<<

Learning Robust Representations for Continual Relation Extraction via Adversarial Class Augmentation Peiyi Wang1Yifan Song1Tianyu Liu2Binghuai Lin2.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Robust Representations for Continual Relation Extraction via Adversarial Class Augmentation Peiyi Wang1Yifan Song1Tianyu Liu2Binghuai Lin2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: