
i.e., CRE models do not learn sufficiently robust
representations of relations in the first place due
to the relatively easy training task. Taking “child”
in Figure 1as an example, because of the absence
of hard negative classes in task 1, the CRE model
tends to rely on spurious shortcuts, such as entity
types, to identify “child”. Although the learned
imprecise representations can handle the testing set
of task 1 and task 2, they are not robust enough
to distinguish “child” from its analogous relation
(“father”) in task 3. Therefore, the performance
of “child” will decrease significantly when “father”
appears. In contrast, relations such as “architecture”
still perform well in task 3 because their analogous
relations have not yet appeared.
Recently, adversarial data augmentation emerges
as a strong baseline to prevent models from learn-
ing spurious shortcuts from the easy dataset (Volpi
et al.,2018;Zhao et al.,2020;Hendrycks et al.,
2020). Inspired by these work, we introduce a
simple yet effective Adversarial Class Augmenta-
tion (ACA) mechanism to improve the robustness
of the CRE model. Concretely, ACA utilizes two
class augmentation methods, namely hybrid-class
augmentation and reversed-class augmentation, to
build hard negative classes for new tasks. When a
task arrives, ACA jointly trains new relations with
adversarial augmented classes to learn robust rep-
resentations. Note that our method is orthogonal to
all previous work: ACA focuses on learning knowl-
edge of newly emerging relations better, while pre-
vious methods are proposed to retain or recover
learned knowledge of old relations
1
. Therefore,
incorporating ACA into previous CRE models can
further improve their performance.
We summarize our contributions as:
1)
we con-
duct a series of empirical studies on two strong
CRE methods and observe that catastrophic forget-
ting is strongly related with the existence of analo-
gous relations.
2)
we find an important reason for
catastrophic forgetting in CRE which is overlooked
in all previous work: the CRE models suffer from
learning shortcuts to identify new relations, which
are not robust enough against the appearance of
their analogous relations.
3)
we propose an adver-
sarial class augmentation mechanism to help CRE
models learn more robust representations. Exper-
1
The proposed method can be viewed as a “precaution”
that takes place in the current task to mitigate the catastrophic
forgetting on the analogous relations in the subsequent tasks.
While the prior work is more like a “remedy” for the current
task to recall the already learned knowledge in the past tasks.
imental results on two benchmarks show that our
method can consistently improve the performance
of two state-of-the-art methods.
2 Related Work
Relation Extraction
Conventional Relation Ex-
traction (RE) focuses on extracting the predefined
relation of two given entities in a sentence. Re-
cently, a variety of deep neural networks (DNN)
have been proposed for RE, mainly including:
1)
Convolutional or Recurrent neural network (CNN
or RNN) based methods (dos Santos et al.,2015;
Wang et al.,2016;Xiao and Liu,2016;Liu et al.,
2019), which can effectively extract textual fea-
tures.
2)
Graph neural network (GNN) based
methods (Xu et al.,2015,2016;Cai et al.,2016;
Mandya et al.,2020), which jointly encode the sen-
tence with lexical features.
3)
Pre-trained language
model (PLM) based methods (Baldini Soares et al.,
2019;Peng et al.,2020), which achieve state-of-
the-arts on RE task.
Continual Learning
Continual Learning (CL)
aims to continually accumulate knowledge from a
sequence of tasks (De Lange et al.,2019). A ma-
jor challenge of CL is catastrophic forgetting, i.e.,
the performance of previously learned tasks seri-
ously drops when learning new tasks. To this end,
prior CL methods can be roughly divided into three
groups:
1)
Rehearsal-based methods (Rebuffi et al.,
2017;Wu et al.,2019) maintain a memory to save
instances of previous tasks and replay them dur-
ing training of new tasks.
2)
Regularization-based
methods (Kirkpatrick et al.,2017;Aljundi et al.,
2018) add constraints on the weights important to
old tasks.
3)
Architecture-based methods (Mallya
and Lazebnik,2018;Qin et al.,2021) dynamically
change model architectures to learn new tasks and
prevent forgetting old tasks. Recently, rehearsal-
based methods have been proven to be the most
effective for NLP tasks, including relation extrac-
tion. We focus on the rehearsal-based methods for
CRE in this paper.
Shortcuts Learning Phenomenon
Shortcuts
learning phenomenon denotes that DNN models
tend to learn unreliable shortcuts in datasets, lead-
ing to poor generalization ability in real-world ap-
plications (Lai et al.,2021). Recently, researchers
have revealed the shortcut learning phenomenon for
different kinds of language tasks, such as natural
language inference (He et al.,2019), information