distinguishable is crucial to achieve better CRE.
To address above issue, in this paper we pro-
pose a novel
C
ontinual
R
elation
E
xtraction frame-
work with
C
ontrastive
L
earning, namely
CRECL
,
which is built with a classification network and a
contrastive network. In order to fully leverage the
information of negative relations to make the data
distributions of all tasks more distinguishable, we
design a prototypical contrastive learning scheme.
Specifically, in the contrastive network of CRECL,
a given instance is contrasted with the prototype of
each candidate relation stored in the memory mod-
ule. Such sufficient comparisons ensure the align-
ment and uniformity between the data distributions
of old and new tasks. Therefore, the catastrophic
forgetting in CRECL is alleviated more thoroughly,
resulting in enhanced CRE performance. In addi-
tion, different to the classification for a fixed (rela-
tion) class set as (Han et al.,2020;Cui et al.,2021),
CRECL achieves an incremental-class learning of
CRE which is more feasible to real-world CRE
scenarios.
Our contributions in this paper are summarized
as follows:
1. We propose a novel CRE framework CRECL
that combines a classification network and a pro-
totypical contrastive network to fully alleviate the
problem of catastrophic forgetting.
2. With the contrasting-based mechanism,
our CRECL can effectively achieve the class-
incremental learning which is more practical in
real-world CRE scenarios.
3. Our extensive experiments justify our
CRECL’s advantage over the state-of-the-art
(SOTA) models on two benchmark datasets, TA-
CRED and FewRel. Furthermore, we provide our
deep insights into the reasons of the compared mod-
els’ distinct performance.
2 Related Work
In this section, we briefly introduce continual learn-
ing and contrastive learning which are both related
to our work.
Continual learning (Delange et al.,2021;Parisi
et al.,2019) focuses on the learning from a con-
tinuous stream of data. The models of continual
learning are able to accumulate knowledge across
different tasks without retraining from scratch. The
major challenge in continual learning is to allevi-
ate catastrophic forgetting which refers to that the
performance on previous tasks should not signif-
icantly decline over time as new tasks come in.
For overcoming catastrophic forgetting, most re-
cent works can be divided into three categories.
1) Regularized-based methods impose constraints
on the update of parameters. For example, LwF
approach (Li and Hoiem,2016) enforces the net-
work of previously learned tasks to be similar to the
network of current task by knowledge distillation.
However, LwF depends heavily on the data in new
task and its relatedness to prior tasks. EWC (Kirk-
patrick et al.,2016) adopts a quadratic penalty on
the difference between the parameters for old and
new tasks. It models the parameter relevance with
respect to training data as a posterior distribution,
which is estimated by Laplace approximation with
the precision determined by the Fisher Information
Matrix. WA (Zhao et al.,2020) maintains discrimi-
nation and fairness among the new and old task by
adjust the parameters of the last layer. 2) Dynamic
architecture methods change models’ architectural
properties upon new data by dynamically accom-
modating new neural resources, such as increased
number of neurons. For example, PackNet (Mallya
and Lazebnik,2017) iteratively assigns parameter
subsets to consecutive tasks by constituting prun-
ing masks, which fixes the task parameter subset
for future tasks. DER (Yan et al.,2021) proposes
a novel two-stage learning approach to get more
effective dynamically expandable representation.
3) Memory-based methods explicitly retrain the
models on a limited subset of stored samples dur-
ing the training on new tasks. For example, iCaRL
(Rebuffi et al.,2017) focuses on learning in a class-
incremental way, which selects and stores the sam-
ples most close to the feature mean of each class.
During training, distillation loss between targets ob-
tained from previous and current model predictions
is added into overall loss, to preserve previously
learned knowledge. RP-CRE (Cui et al.,2021)
introduces a novel pluggable attention-based mem-
ory module to automatically calculate old tasks’
weights when learning new tasks.
Since classification-based approaches require
relation schema in the classification layer,
classification-based models have an unignorable
drawback on class-incremental learning. Many
researchers leverage metric learning to solve this
problem. (Wang et al.,2019;Wu et al.,2021) uti-
lize sentence alignment model based on Margin
Ranking Loss (Nayyeri et al.,2019), while lack the
intrinsic ability to perform hard positive/negative