Learning Robust Representations for Continual Relation Extraction via Adversarial Class Augmentation Peiyi Wang1Yifan Song1Tianyu Liu2Binghuai Lin2

2025-04-29 0 0 1.03MB 15 页 10玖币
侵权投诉
Learning Robust Representations for Continual Relation
Extraction via Adversarial Class Augmentation
Peiyi Wang1Yifan Song1Tianyu Liu2Binghuai Lin2
Yunbo Cao2Sujian Li1Zhifang Sui1
1MOE Key Laboratory of Computational Linguistics, Peking University, China
2Tencent Cloud Xiaowei
wangpeiyi9979@gmail.com; {yfsong, lisujian, szf}@pku.edu.cn
{rogertyliu, binghuailin, yunbocao}@tencent.com;
Abstract
Continual relation extraction (CRE) aims to
continually learn new relations from a class-
incremental data stream. CRE model usu-
ally suffers from catastrophic forgetting prob-
lem, i.e., the performance of old relations se-
riously degrades when the model learns new
relations. Most previous work attributes catas-
trophic forgetting to the corruption of the
learned representations as new relations come,
with an implicit assumption that the CRE
models have adequately learned the old rela-
tions. In this paper, through empirical stud-
ies we argue that this assumption may not
hold, and an important reason for catastrophic
forgetting is that the learned representations
do not have good robustness against the ap-
pearance of analogous relations in the subse-
quent learning process. To address this is-
sue, we encourage the model to learn more
precise and robust representations through a
simple yet effective adversarial class augmen-
tation mechanism (ACA), which is easy to
implement and model-agnostic. Experimen-
tal results show that ACA can consistently
improve the performance of state-of-the-art
CRE models on two popular benchmarks.
Our code is available at https://github.
com/Wangpeiyi9979/ACA.
1 Introduction
Relation extraction (RE) aims to detect the relation
of two given entities in a sentence. Traditional RE
models are trained on a fixed dataset with a pre-
defined relation set, which cannot handle the real-
life situation where new relations are constantly
emerging. To this end, continual relation extraction
(CRE) (Wang et al.,2019;Han et al.,2020;Cui
et al.,2021;Zhao et al.,2022;Wang et al.,2022) is
introduced. As shown in Figure 1, CRE is formu-
lated as a class-incremental problem, which trains
the model on a sequence of tasks. In each task,
*Equal contribution.
R1,R2
task1
R1,R2 R1,R2
R3,R4
R1,R2
R3,R4
R5,R6
R3,R4 R5,R6
R1: child
R2: architecture
R3: location
R4: employee
R5: father
R6: characters
task2 task3
train
test
easy dataset
continually learn new relations hard dataset
Figure 1: A demonstration for continual relation extrac-
tion with three tasks where each task involves two new
relations. The learned representations from the easy
training task can not handle the hard testing data, which
contains analogous relations inherently hard to distin-
guish, e.g., “child” and “father”.
the model needs to learn some new relations and
is evaluated on all seen relations. Like other con-
tinual learning systems, CRE models also suffer
from catastrophic forgetting, i.e., the performance
of previously learned relations seriously degrades
when learning new relations.
The mainstream research in CRE (Han et al.,
2020;Cui et al.,2021;Zhao et al.,2022;Wang
et al.,2022) mainly attributes catastrophic forget-
ting to the corruption of the learned knowledge as
new tasks come. To this end, a variety of sophisti-
cated rehearsal-based mechanisms are introduced
to better retain or recover the knowledge, such as
relation prototypes (Han et al.,2020;Cui et al.,
2021), curriculum-meta learning (Wu et al.,2021),
contrastive replay and knowledge distillation (Zhao
et al.,2022). All these methods implicitly assume
that the model has adequately learned old relations.
However, in this paper, we find that this assumption
may not hold.
With a series of empirical studies, we observe
that catastrophic forgetting mostly happens on
some specific relations, and significant perfor-
mance degradation tends to occur when their anal-
ogous relations appear. Based on our observations,
we find another reason for catastrophic forgetting,
arXiv:2210.04497v1 [cs.CL] 10 Oct 2022
i.e., CRE models do not learn sufficiently robust
representations of relations in the first place due
to the relatively easy training task. Taking “child”
in Figure 1as an example, because of the absence
of hard negative classes in task 1, the CRE model
tends to rely on spurious shortcuts, such as entity
types, to identify “child”. Although the learned
imprecise representations can handle the testing set
of task 1 and task 2, they are not robust enough
to distinguish “child” from its analogous relation
(“father”) in task 3. Therefore, the performance
of “child” will decrease significantly when “father”
appears. In contrast, relations such as “architecture”
still perform well in task 3 because their analogous
relations have not yet appeared.
Recently, adversarial data augmentation emerges
as a strong baseline to prevent models from learn-
ing spurious shortcuts from the easy dataset (Volpi
et al.,2018;Zhao et al.,2020;Hendrycks et al.,
2020). Inspired by these work, we introduce a
simple yet effective Adversarial Class Augmenta-
tion (ACA) mechanism to improve the robustness
of the CRE model. Concretely, ACA utilizes two
class augmentation methods, namely hybrid-class
augmentation and reversed-class augmentation, to
build hard negative classes for new tasks. When a
task arrives, ACA jointly trains new relations with
adversarial augmented classes to learn robust rep-
resentations. Note that our method is orthogonal to
all previous work: ACA focuses on learning knowl-
edge of newly emerging relations better, while pre-
vious methods are proposed to retain or recover
learned knowledge of old relations
1
. Therefore,
incorporating ACA into previous CRE models can
further improve their performance.
We summarize our contributions as:
1)
we con-
duct a series of empirical studies on two strong
CRE methods and observe that catastrophic forget-
ting is strongly related with the existence of analo-
gous relations.
2)
we find an important reason for
catastrophic forgetting in CRE which is overlooked
in all previous work: the CRE models suffer from
learning shortcuts to identify new relations, which
are not robust enough against the appearance of
their analogous relations.
3)
we propose an adver-
sarial class augmentation mechanism to help CRE
models learn more robust representations. Exper-
1
The proposed method can be viewed as a “precaution”
that takes place in the current task to mitigate the catastrophic
forgetting on the analogous relations in the subsequent tasks.
While the prior work is more like a “remedy” for the current
task to recall the already learned knowledge in the past tasks.
imental results on two benchmarks show that our
method can consistently improve the performance
of two state-of-the-art methods.
2 Related Work
Relation Extraction
Conventional Relation Ex-
traction (RE) focuses on extracting the predefined
relation of two given entities in a sentence. Re-
cently, a variety of deep neural networks (DNN)
have been proposed for RE, mainly including:
1)
Convolutional or Recurrent neural network (CNN
or RNN) based methods (dos Santos et al.,2015;
Wang et al.,2016;Xiao and Liu,2016;Liu et al.,
2019), which can effectively extract textual fea-
tures.
2)
Graph neural network (GNN) based
methods (Xu et al.,2015,2016;Cai et al.,2016;
Mandya et al.,2020), which jointly encode the sen-
tence with lexical features.
3)
Pre-trained language
model (PLM) based methods (Baldini Soares et al.,
2019;Peng et al.,2020), which achieve state-of-
the-arts on RE task.
Continual Learning
Continual Learning (CL)
aims to continually accumulate knowledge from a
sequence of tasks (De Lange et al.,2019). A ma-
jor challenge of CL is catastrophic forgetting, i.e.,
the performance of previously learned tasks seri-
ously drops when learning new tasks. To this end,
prior CL methods can be roughly divided into three
groups:
1)
Rehearsal-based methods (Rebuffi et al.,
2017;Wu et al.,2019) maintain a memory to save
instances of previous tasks and replay them dur-
ing training of new tasks.
2)
Regularization-based
methods (Kirkpatrick et al.,2017;Aljundi et al.,
2018) add constraints on the weights important to
old tasks.
3)
Architecture-based methods (Mallya
and Lazebnik,2018;Qin et al.,2021) dynamically
change model architectures to learn new tasks and
prevent forgetting old tasks. Recently, rehearsal-
based methods have been proven to be the most
effective for NLP tasks, including relation extrac-
tion. We focus on the rehearsal-based methods for
CRE in this paper.
Shortcuts Learning Phenomenon
Shortcuts
learning phenomenon denotes that DNN models
tend to learn unreliable shortcuts in datasets, lead-
ing to poor generalization ability in real-world ap-
plications (Lai et al.,2021). Recently, researchers
have revealed the shortcut learning phenomenon for
different kinds of language tasks, such as natural
language inference (He et al.,2019), information
extraction (Wang et al.,2021), reading comprehen-
sion (Lai et al.,2021) and question answering (Mu-
drakarta et al.,2018). Geirhos et al. (2020) point
out that shortcuts learning phenomenon happens
because the “Principle of Least Effort” (Kingsley,
1972), i.e., people (also animal and machine) will
naturally minimize the amount of effort to solve
tasks. Recently, data augmentation (Tu et al.,2020)
and adversarial training (Stacey et al.,2020) are
used to alleviate shortcuts learning phenomenon
with synthesized data. To the best of our knowl-
edge, we are the first work to analyze the catas-
trophic forgetting in CRE from the perspective of
shortcuts learning, and propose an adversarial data
augmentation method to alleviate it.
3 Task Formulation
In CRE, the model is trained on a sequence of tasks
(T1, T2, ..., Tk)
. Each task
Ti
can be represented as
a triplet
(Ri, Di, Qi)
, where
Ri
is the set of new
relations,
Di
and
Qi
are the training and testing set,
respectively. Every instance
(xj, yj)DiQi
belongs to a specific relation
yjRi
. The goal of
CRE is to continually train the model on new tasks
to learn new relations, while avoiding forgetting of
previously learned ones. More formally, in the
i
-th
task, the model learns new relations
Ri
from
Di
,
and should be able to identify all seen relations, i.e.,
the model will be evaluated on the all seen testing
sets
Si
j=1 Qj
. To alleviate catastrophic forgetting
in CRE, previous work (Cui et al.,2021;Han et al.,
2020;Zhao et al.,2022;Wang et al.,2022) adopts
a memory to store a few typical instances (e.g.,
10
)
for each old relation. In the subsequent training
process, instances in the memory will be replayed
to alleviate the catastrophic forgetting.
4 Catastrophic Forgetting in CRE:
Characteristics and Cause
In this section, we conduct a series of empirical
studies on two state-of-the-art CRE models, namely
EMAR (Han et al.,2020) and RP-CRE (Cui et al.,
2021), and two benchmarks, namely FewRel and
TACRED, to analyze catastrophic forgetting in
CRE. Please refer to Section 6.1 for details of these
two benchmarks and two CRE models.
4.1 Characteristics of Catastrophic
Forgetting
We use Forgetting Rate (FR) (Chaudhry et al.,
2018a,b) to measure the average forgetting of a
Model ID FR (%) MS F1 F1()
EMAR
G1 1.3 0.42 95.4 97.4 (2.0)
G2 4.5 0.53 84.6 90.9 (6.3)
G3 9.4 0.62 69.8 81.5 (11.7)
RP-CRE
G1 1.2 0.42 95.5 97.4 (1.9)
G2 4.7 0.53 83.8 90.8 (7.0)
G3 9.9 0.63 69.5 81.5 (12.0)
Table 1: We divide relations of FewRel into three
groups according to their forgetting rate (FR). “MS” is
short for max similarity. F1 and F1are the Macro-F1
scores of EMAR/RP-CRE and the supervised model
which trains all data together, respectively. is the
performance gap between two CRE models and the su-
pervised model.
relation. Assuming that relation
r
appears in task
i
,
the FR of
r
after the model has finished the tasks
sequence (T1, ..., Ti, ..., Tk)is defined as:
F Rr=1
ki
k
X
j=i+1
pdj
r(1)
pdj
r= max
l∈{i,...,j1}F1l
rF1j
r,(2)
where
pdj
r
and
F1j
r
are the performance degrada-
tion and F1 score of
r
after the model trains on task
j
, respectively. The sequence length
k
is
10
for
both FewRel and TACRED.
We divide all relations into three equal-sized
groups based on their FR from small to large. As
shown in Table 1, relations in G1 hardly suffer from
forgetting as FR is just
1.3%
and
1.2%
on EMAR
and RP-CRE, respectively. In contrast, relations
in G3 have catastrophic forgetting and the FR is
close to
10%
for both models. A similar tendency
is observed on TACRED (please refer to Appendix
Afor details). To explore why FR varies widely
among different relations, we dive into the results
of two CRE models and ask two questions.
Where catastrophic forgetting happens?
With
careful comparison between G1 and G3, we find
that relations in G3 seem to have analogous rela-
tions in the dataset. For example, “mother” belongs
to G3, and there are its semantically analogous re-
lations, such as “spouse”, in the dataset. To con-
firm our finding, we first define the similarity for
a pair of relations as the cosine distance of their
prototypes, i.e., the mean vanilla BERT sentence
embedding of all corresponding instances. Then,
for a certain relation, we compute its max similarity
(MS) to all the other relations in the dataset. As
shown in Table 1, MS of G3 is significantly greater
摘要:

LearningRobustRepresentationsforContinualRelationExtractionviaAdversarialClassAugmentationPeiyiWang1YifanSong1TianyuLiu2BinghuaiLin2YunboCao2SujianLi1ZhifangSui11MOEKeyLaboratoryofComputationalLinguistics,PekingUniversity,China2TencentCloudXiaoweiwangpeiyi9979@gmail.com;{yfsong,lisujian,szf}@pku.e...

展开>> 收起<<
Learning Robust Representations for Continual Relation Extraction via Adversarial Class Augmentation Peiyi Wang1Yifan Song1Tianyu Liu2Binghuai Lin2.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.03MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注