Improving Continual Relation Extraction through Prototypical Contrastive Learning Chengwei Hu1y Deqing Yang1y Haoliang Jin1z Zhen Chen1z Yanghua Xiao2y

2025-05-08 0 0 742.56KB 11 页 10玖币
侵权投诉
Improving Continual Relation Extraction through
Prototypical Contrastive Learning
Chengwei Hu1, Deqing Yang1, Haoliang Jin1, Zhen Chen1, Yanghua Xiao2
1School of Data Science, Fudan University, Shanghai, China
2School of Computer Science, Fudan University, Shanghai, China
{cwhu20, yangdeqing, shawyh}@fudan.edu.cn
{hljin21, zhenchen21}@m.fudan.edu.cn
Abstract
Continual relation extraction (CRE) aims to
extract relations towards the continuous and it-
erative arrival of new data, of which the ma-
jor challenge is the catastrophic forgetting of
old tasks. In order to alleviate this critical
problem for enhanced CRE performance, we
propose a novel Continual Relation Extraction
framework with Contrastive Learning, namely
CRECL, which is built with a classification
network and a prototypical contrastive net-
work to achieve the incremental-class learning
of CRE. Specifically, in the contrastive net-
work a given instance is contrasted with the
prototype of each candidate relations stored in
the memory module. Such contrastive learn-
ing scheme ensures the data distributions of
all tasks more distinguishable, so as to alle-
viate the catastrophic forgetting further. Our
experiment results not only demonstrate our
CRECLs advantage over the state-of-the-art
baselines on two public datasets, but also ver-
ify the effectiveness of CRECLs contrastive
learning on improving CRE performance.
1 Introduction
In some scenarios of relation extraction (RE), mas-
sive new data including new relations emerges con-
tinuously, which can not be solved by traditional
RE methods. To handle such situation, continual
relation extraction (CRE) (Wang et al.,2019) was
proposed. Due to the limited storage and comput-
ing resources, it is impractical to store all training
data of previous tasks. As new tasks are learned
where new relations emerge constantly, the model
tends to forget the existing knowledge about old re-
lations. Therefore, the problem of catastrophic for-
getting damages CRE performance severely (Hass-
abis et al.,2017;Thrun and Mitchell,1995).
In recent years, some efforts have focused on the
alleviating catastrophic forgetting in CRE, which
This work is supported by Shanghai Science and Tech-
nology Innovation Action Plan (China) No.21511100401.
Figure 1: The data distribution map (better viewed in
color) after training a classification model for an old
task and then a new task. Many different relation data
(different colors) of the old (dots) and new (crosses)
task are mixed due to the catastrophic forgetting, mak-
ing it hard to distinguish the new task’s relations from
the old task’s relations.
can be divided into consolidation-based methods
(Kirkpatrick et al.,2017), dynamic architecture
methods (Chen et al.,2015;Fernando et al.,2017)
and Memory-based methods (Chaudhry et al.,
2018;Han et al.,2020;Cui et al.,2021). Despite
these methods’ effectiveness on CRE, most of them
have not taken full advantage of the negative rela-
tion information in all tasks to alleviate catastrophic
forgetting more thoroughly, result in suboptimal
CRE performance.
Through our empirical studies, we found that
the catastrophic forgetting of a model results in the
indistinguishability between the data (instances)
distributions of all tasks, making it hard to distin-
guish the relations of all tasks. We illustrate it with
the data distribution map after training a relation
classification model for a new task, as shown in
Figure 1where the dots and crosses represent the
data of the old and new task respectively, and dif-
ferent colors represent different relations. It shows
that the data points of different colors in either
dot group (old task) or cross group (new task) are
distinguishable. However, many dots and crosses
are mixed, making it hard to discriminate the new
task’s relations from the old task’s relations. There-
fore, making the data distributions of all tasks more
arXiv:2210.04513v1 [cs.IR] 10 Oct 2022
distinguishable is crucial to achieve better CRE.
To address above issue, in this paper we pro-
pose a novel
C
ontinual
R
elation
E
xtraction frame-
work with
C
ontrastive
L
earning, namely
CRECL
,
which is built with a classification network and a
contrastive network. In order to fully leverage the
information of negative relations to make the data
distributions of all tasks more distinguishable, we
design a prototypical contrastive learning scheme.
Specifically, in the contrastive network of CRECL,
a given instance is contrasted with the prototype of
each candidate relation stored in the memory mod-
ule. Such sufficient comparisons ensure the align-
ment and uniformity between the data distributions
of old and new tasks. Therefore, the catastrophic
forgetting in CRECL is alleviated more thoroughly,
resulting in enhanced CRE performance. In addi-
tion, different to the classification for a fixed (rela-
tion) class set as (Han et al.,2020;Cui et al.,2021),
CRECL achieves an incremental-class learning of
CRE which is more feasible to real-world CRE
scenarios.
Our contributions in this paper are summarized
as follows:
1. We propose a novel CRE framework CRECL
that combines a classification network and a pro-
totypical contrastive network to fully alleviate the
problem of catastrophic forgetting.
2. With the contrasting-based mechanism,
our CRECL can effectively achieve the class-
incremental learning which is more practical in
real-world CRE scenarios.
3. Our extensive experiments justify our
CRECLs advantage over the state-of-the-art
(SOTA) models on two benchmark datasets, TA-
CRED and FewRel. Furthermore, we provide our
deep insights into the reasons of the compared mod-
els’ distinct performance.
2 Related Work
In this section, we briefly introduce continual learn-
ing and contrastive learning which are both related
to our work.
Continual learning (Delange et al.,2021;Parisi
et al.,2019) focuses on the learning from a con-
tinuous stream of data. The models of continual
learning are able to accumulate knowledge across
different tasks without retraining from scratch. The
major challenge in continual learning is to allevi-
ate catastrophic forgetting which refers to that the
performance on previous tasks should not signif-
icantly decline over time as new tasks come in.
For overcoming catastrophic forgetting, most re-
cent works can be divided into three categories.
1) Regularized-based methods impose constraints
on the update of parameters. For example, LwF
approach (Li and Hoiem,2016) enforces the net-
work of previously learned tasks to be similar to the
network of current task by knowledge distillation.
However, LwF depends heavily on the data in new
task and its relatedness to prior tasks. EWC (Kirk-
patrick et al.,2016) adopts a quadratic penalty on
the difference between the parameters for old and
new tasks. It models the parameter relevance with
respect to training data as a posterior distribution,
which is estimated by Laplace approximation with
the precision determined by the Fisher Information
Matrix. WA (Zhao et al.,2020) maintains discrimi-
nation and fairness among the new and old task by
adjust the parameters of the last layer. 2) Dynamic
architecture methods change models’ architectural
properties upon new data by dynamically accom-
modating new neural resources, such as increased
number of neurons. For example, PackNet (Mallya
and Lazebnik,2017) iteratively assigns parameter
subsets to consecutive tasks by constituting prun-
ing masks, which fixes the task parameter subset
for future tasks. DER (Yan et al.,2021) proposes
a novel two-stage learning approach to get more
effective dynamically expandable representation.
3) Memory-based methods explicitly retrain the
models on a limited subset of stored samples dur-
ing the training on new tasks. For example, iCaRL
(Rebuffi et al.,2017) focuses on learning in a class-
incremental way, which selects and stores the sam-
ples most close to the feature mean of each class.
During training, distillation loss between targets ob-
tained from previous and current model predictions
is added into overall loss, to preserve previously
learned knowledge. RP-CRE (Cui et al.,2021)
introduces a novel pluggable attention-based mem-
ory module to automatically calculate old tasks’
weights when learning new tasks.
Since classification-based approaches require
relation schema in the classification layer,
classification-based models have an unignorable
drawback on class-incremental learning. Many
researchers leverage metric learning to solve this
problem. (Wang et al.,2019;Wu et al.,2021) uti-
lize sentence alignment model based on Margin
Ranking Loss (Nayyeri et al.,2019), while lack the
intrinsic ability to perform hard positive/negative
Memory
Dropout layer
[E11] Charlie Chaplin [E12] was born in [E21] London[E22]
𝒆i
𝒉𝑖
BERT-based encoder
K-means
Instance
Embedding
Similarity
Compared
instance
L"=𝜆#L$%& + (1 − 𝜆#)L'()
𝒑𝑟
𝒔𝑟
Projector
L#
GELU
Linear
Norm
Linear
Current task Tk
P
+
Relation
prototype
Projector
𝒔𝑖
𝒉𝑖
Typical instances
Encoding layer
GELU
Linear
Softmax
Classification network Contrastive network
Instance i:
Figure 2: The overall structure of our proposed CRECL. The framework is built with a shared encoding layer, a
classification network and a contrastive network.
mining, resulting in poor performance. Recently,
contrastive learning has been widely imported into
self-supervised learning frameworks in many fields
including computer vision, natural language pro-
cessing and so on. Contrastive learning is a dis-
criminative scheme that aims to group similar sam-
ples more closer and diverse samples far from each
other. (Wang and Liu,2021) proves that contrastive
learning can promote the alignment and stability of
data distribution, and (Khosla et al.,2020) verifies
that using modern batch contrastive approaches,
such as InfoNCE loss (Oord et al.,2018), outper-
forms traditional contrastive losses, such as margin
ranking loss, and also achieves good results in su-
pervised contrastive learning tasks.
3 Methodology
3.1 Task Formalization
The CRE task aims to identify the relation be-
tween two entities expressed by one sentence in
the task sequence. Formally, given a sequence
of
K
tasks
{T1, T2, . . . , TK}
, suppose
Dk
and
Rk
denote the instance set and relation class set of
the
k
-th task
Tk
, respectively.
Dk
contains
Nk
in-
stances
{(x1, t1, y1),...,(xNk, tNk, yNk)}
where
instance
(xi, ti, yi),1iNk
represents that
the relation of entity pair
ti
in sentence
xi
is
yiRk
. One CRE model should perform
well on all historical tasks up to
Tk
, denoted as
˜
Tk=k
i=1Ti
, of which the relation class set is
˜
Rk=k
i=1Ri
. We also adopt an episodic mem-
ory module
Mr={(x1, t1, r),...,(xL, tL, r)}
to
store typical instances of relation
r
, similar to (Han
et al.,2020;Cui et al.,2021), where
L
is the mem-
ory size (typical instance number). The overall
episodic memory for the observed relations in all
tasks is ˜
Mk=r˜
RkMr.
3.2 Framework Overview
The overall structure of our CRECL is depicted
in Figure 2, which has two major components,
i.e., a classification network and a contrastive net-
work. The procedure of learning the current task in
CRECL is described by the algorithm in Alg. 1.
At first, suppose the current task is
Tk
, the repre-
sentation of each instance in
Tk
is obtained through
the encoder and dropout layer shared by the two
networks. In the classification network, each in-
stance’s relation is predicted based on its represen-
tation (line 1-3). Then, we apply K-means algo-
rithm over the instance representations to select
L
typical instances for each relation in
Tk
, which are
used to generate the relation prototypes and stored
into memory
˜
Mk
for the subsequent contrast (line
4-13). There are two training processes in the con-
trastive network. The first is to compare current
task instances with the stored relation prototypes
of
˜
Tk
(line 14-17). The second is to compare each
typical instance with all relation prototypes which
are both stored in
˜
Mk
(line 18-24). These two
training procedures ensure each compared instance
keep distance from sufficient negative relations in
˜
Rk
. Therefore, the data distributions of
˜
Rk
are
distinguishable enough to alleviate CRECLs catas-
trophic forgetting of old tasks. Next, we detail the
operations in CRECL.
摘要:

ImprovingContinualRelationExtractionthroughPrototypicalContrastiveLearningChengweiHu1y,DeqingYang1y,HaoliangJin1z,ZhenChen1z,YanghuaXiao2y1SchoolofDataScience,FudanUniversity,Shanghai,China2SchoolofComputerScience,FudanUniversity,Shanghai,Chinay{cwhu20,yangdeqing,shawyh}@fudan.edu.cnz{hljin21,zhenc...

收起<<
Improving Continual Relation Extraction through Prototypical Contrastive Learning Chengwei Hu1y Deqing Yang1y Haoliang Jin1z Zhen Chen1z Yanghua Xiao2y.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:742.56KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注