Learning O Helps for Learning More Handling the Unlabeled Entity Problem for Class-incremental NER Ruotian Ma1 Xuanting Chen1 Lin Zhang1 Xin Zhou1

2025-04-29 0 0 1.12MB 19 页 10玖币
侵权投诉
Learning "O" Helps for Learning More:
Handling the Unlabeled Entity Problem for Class-incremental NER
Ruotian Ma1
, Xuanting Chen1
, Lin Zhang1, Xin Zhou1,
Junzhe Wang1, Tao Gui2, Qi Zhang1, Xiang Gao3, Yunwen Chen3
1School of Computer Science, Fudan University, Shanghai, China
2Institute of Modern Languages and Linguistics, Fudan University, Shanghai, China
3DataGrand Information Technology (Shanghai) Co., Ltd.
{rtma19,xuantingchen21,tgui,qz}@fudan.edu.cn
Abstract
As the categories of named entities rapidly
increase, the deployed NER models are re-
quired to keep updating toward recognizing
more entity types, creating a demand for class-
incremental learning for NER. Considering the
privacy concerns and storage constraints, the
standard paradigm for class-incremental NER
updates the models with training data only
annotated with the new classes, yet the enti-
ties from other entity classes are unlabeled, re-
garded as "Non-entity" (or "O"). In this work,
we conduct an empirical study on the "Unla-
beled Entity Problem" and find that it leads
to severe confusion between "O" and entities,
decreasing class discrimination of old classes
and declining the model’s ability to learn new
classes. To solve the Unlabeled Entity Prob-
lem, we propose a novel representation learning
method to learn discriminative representations
for the entity classes and "O". Specifically, we
propose an entity-aware contrastive learning
method that adaptively detects entity clusters
in "O". Furthermore, we propose two effective
distance-based relabeling strategies for better
learning the old classes. We introduce a more
realistic and challenging benchmark for class-
incremental NER, and the proposed method
achieves up to 10.62% improvement over the
baseline methods.
1 Introduction
Existing Named Entity Recognition systems are
typically trained on a large-scale dataset with pre-
defined entity classes, then deployed for entity
recognition on the test data without further adap-
tation or refinement (Li et al.,2020;Wang et al.,
2022;Liu et al.,2021;Ma et al.,2022a). In prac-
tice, the newly-arriving test data may include new
entity classes, and the user’s required entity class
set might keep expanding. Therefore, it is in de-
mand that the NER model can be incrementally
Equal contribution.
Corresponding authors.
"O"
... ...
LOC
Step 1
Step 2
PER
LOC
FILM
Interstellar , directed by Nolan, premiered on in 2014 in California.
PER
Interstellar , directed by Nolan, premiered on in 2014 in California.
"O""O"
"O" "O"
... ...
Step K
"O"
Interstellar , directed by Nolan, premiered on in 2014 in California.
FILM "O"
DATE
"O"
DATE
...
"O" "O" "O"
Potential class
Old class
Step 1 Step 2 Step K
... ...
Figure 1: Problems of class-incremental NER. In each
incremental step, the data is only labeled with current
classes, so the "O" class actually contains entities from
old classes and entities from potential classes.
updated for recognizing new entity classes. How-
ever, one challenge is that the training data of old
entity classes may not be available due to privacy
concerns or memory limitations (Li and Hoiem,
2017;Zhang et al.,2020). Also, it is expensive
and time-consuming to re-annotate all the old en-
tity classes whenever we update the model (De-
lange et al.,2021;Bang et al.,2021). To solve
the problem, Monaikul et al. (2021) proposes to
incrementally update the model with new datasets
only covering the new entity classes, adopted by
following studies as standard class-incremental
NER paradigm.
However, as NER is a sequence labeling task,
annotating only the new classes means entities from
other entity classes are regarded as "Non-entity"
(or "O") in the dataset. For example, in step 2
in Fig.1, the training data for model updating is
only annotated with "LOC" and "DATE", while
the entities from "PER" and "FILM" are unlabeled
and regarded as "O" during training. We refer to
this problem as the "Unlabeled Entity Problem" in
class-incremental NER, which includes two types
of unlabeled entities: (1) old entity classes (e.g.,
"PER" in step 2) that the model learned in previous
arXiv:2210.04676v2 [cs.CL] 24 Jul 2023
steps are unlabeled in the current step, causing the
model catastrophically forgetting these old classes.
(Lopez-Paz and Ranzato,2017;Castro et al.,2018)
(2) potential entity classes that are not annotated
till the current step, yet might be required in a
future step. For example, the "FILM" class is not
annotated till step 2, yet is required in step K.
In this work, we conduct an empirical study to
demonstrate the significance of the "Unlabeled En-
tity Problem" on class-incremental NER. We ob-
serve that: (1) The majority of prediction errors
come from the confusion between entities and "O".
(2) Mislabeled as "O" leads to the reduction of
class discrimination of old entities during incre-
mental learning. (3) The model’s ability to learn
new classes also declines as the potential classes
are unlabeled during incremental training. These
problems attribute to the serious performance drop
of incremental learning with the steps increasing.
To tackle the Unlabeled Entity Problem, we pro-
pose a novel representation learning method for
learning discriminative representations for the un-
labeled entity classes and "O". Specifically, we
propose an entity-aware contrastive learning ap-
proach, which adaptively detects entity clusters
from "O" and learns discriminative representations
for these entity clusters. To further maintain the
class discrimination of old classes, we propose two
distance-based relabeling strategies. By relabeling
the entities from old classes with high accuracy,
this practice not only keeps the performance of
old classes, but also benefits the model’s ability to
separate new classes from "O".
We also argue that the experimental setting of
previous works Monaikul et al. (2021) is less re-
alistic. Specifically, they introduce only one or
two entity classes in each incremental step, and
the number of total steps is limited. In real-world
applications, it is more common that a set of new
categories is introduced in each step (e.g., a set of
product types), and the incremental learning steps
can keep increasing. In this work, we provide a
more realistic and challenging benchmark based on
the Few-NERD dataset (Ding et al.,2021), follow-
ing the settings of previous studies (Rebuffi et al.,
2017;Li and Hoiem,2017). We conduct intensive
experiments on the proposed methods and other
comparable baselines, verifying the effectiveness
of the proposed method 1.
1
Our code is publicly available at
https://github.
com/rtmaww/O_CILNER.
To summarize the contribution of this work:
We conduct an empirical study to demonstrate
the significance of the "Unlabeled Entity Prob-
lem" in class-incremental NER.
Based on our observations, we propose a
novel representation learning approach for bet-
ter learning the unlabeled entities and "O",
and verify the effectiveness of our method
with extensive experiments.
We provide a more realistic and challenging
benchmark for class-incremental NER.
2 Class-incremental NER
In this work, we focus on class-incremental
learning on NER. Formally, there are
N
in-
cremental steps, corresponding to a series
of tasks
{T1,T2,...,TN}
. Here,
Tt=
(Dtr
t,Ddev
t,Dtest
t,Ct,new,Ct,old)
is the task at the
tth
step.
Ct,new
is the label set of the current task,
containing only the new classes introduced in the
current step (e.g., {"LOC", "DATE"} in Fig.1, step
2).
Ct,old =
t1
S
i=1
Ci,new ∪ {O}
is the label set of
old classes, containing all classes in previous tasks
and the class "O" (e.g., {"PER", "O"} in Fig.1, step
2).
Dtr
t={Xj
t, Y j
t}n
j=1
is the training set of task
t
, where each sentence
Xj
t={xj,1
t, . . . , xj,l
t}
and
Yj
t={yj,1
t, . . . , yj,l
t}, yj,k
t∈ Ct,new
is annotated
with only the new classes. In each step
t
, the model
At1
from the last step needs to be updated with
only the data
Dtr
t
from the current step, and is ex-
pected to perform well on the test set covering all
learnt entity types Call
t=Ct,new ∪ Ct,old.
3 The Importance of Unlabeled Entity
Problem in Class-incremental NER
In this section, we demonstrate the importance of
the Unlabeled Entity Problem in Class-incremental
NER with empirical studies. We conduct ex-
periments on a challenging dataset, the Few-
NERD dataset, to investigate the problems in class-
incremental NER. We conduct experiments with
two existing methods: (1) iCaRL (Rebuffi et al.,
2017), a typical and well-performed method in
class-incremental image classification. (2) Con-
tinual NER (Monaikul et al.,2021), the previous
state-of-the-art method in class-incremental NER.
More details of the dataset and the baseline meth-
ods can be found in Section 5.
(a) iCaRL (b) Continual NER
Figure 2: Distributions of prediction errors of different
models in step 6. The first row represents the number of
samples belonging to the "O" class wrongly recognized
as entities, which shows the severe confusion between
"O" and entity classes.
(a) Continual NER, step 2 (b) Continual NER, step 5
Figure 3: Visualization of the representation variation
of the old classes during incremental learning. The class
discrimination seriously decrease in step 5.
Observation 1: The majority of prediction er-
rors come from the confusion between entities
and "O". In Fig.2, we show the distributions
of prediction errors of different models in step 6,
where the y-axis denotes samples belonging to "O"
or the classes of different tasks. The x-axis de-
notes the samples are wrongly predicted as "O" or
as classes from different tasks. Each number in a
grid denotes the number of error predictions. From
the results, we can see that the majority of error
predictions are samples belonging to "O" wrongly
predicted as entities (the first row of each model),
indicating serious confusion between "O" and en-
tity classes, especially the old entity classes. As ex-
plained in Section 1, the training data of each new
task is only annotated with the new entity classes
and the entities from old classes are labeled as
"O". As the training proceeds, the class variance
between the true "O" and old entity classes will
decrease, leading to serious confusion of their rep-
resentations.
Observation 2: Old entity classes become
less discriminative during incremental learning.
We further investigate the representation variation
of old classes during incremental learning. As
shown in Fig.3, we select similar classes from
step 0 and step 1, and visualize their representa-
tions after step 2 and step 5. The results show that
the representations of these classes are discrimi-
native enough in step 2. However, after a series
of incremental steps, the representations of these
old classes become less discriminative, leading to
decreasing performance of old classes. This phe-
nomenon also indicates the influence of the unla-
beled entity problem on the unlabeled old classes.
Steps 0 1 2 3 4 5 6
Full Data 72.7 69.2 68.3 67.0 67.3 69.1 68.8
iCaRL 71.3 56.9 52.6 48.8 53.4 48.1 39.6
Con. NER 72.4 63.5 56.9 52.5 56.8 51.8 42.2
Table 1: Performance of the new classes on dev set (only
containing the new classes) keep decreasing during in-
cremental learning. Here, Full Data is the model trained
with datasets labeled with both old and new classes.
Observation 3: The model’s ability to learn
new classes declines during incremental learn-
ing. Finally, we conduct an experiment to inves-
tigate the model’s ability to learn new classes. In
Table 1, we test the results of new classes in each
step on dev sets that only contain these new classes.
Here, Full Data is a baseline that trains on datasets
that both old and new classes are annotated. Sur-
prisingly, we find that the performance of the new
classes of iCaRL and Continual NER keeps de-
creasing during incremental learning, compared
to the stable performance of Full Data. This phe-
nomenon is also related to the Unlabeled Entity
Problem. As explained in the introduction, the po-
tential entity classes (i.e., the entity classes that
might be needed in a future step) are also unlabeled
and regarded as "O" during incremental learning.
As a result, the representations of these classes be-
come less separable from similar old classes (also
labeled as "O"), thus hindering the model’s ability
to learn new classes.
Conclusion to the Observations: Based on
above observations, we propose that appropriate
representation learning are required to tackle the
Unlabeled Entity Problems. The representations of
entity and "O" are expected to meet the following
requirements: (1) The "O" representations are ex-
pected to be distinct from the entity representations,
so as to decline the confusion between "O" and en-
𝑻𝒉𝒆𝒏𝒕𝒊𝒕𝒚 = (𝑬𝒒. 𝟔)
𝑻𝒉𝒆𝒏𝒕𝒊𝒕𝒚
Learning an entity-oriented
feature space with Supcon Loss
Calculating entity threshold
for selecting anchors and positives Entity-aware Contrastive Learning for “O
Re-labeling the unlabeled old entities.
Entity-aware Contrastive Learning for adaptively learning discriminative representations.
Prototype-based Method
𝑻𝒉𝒑𝒓𝒐𝒕𝒐 𝑻𝒉𝒑𝒓𝒐𝒕𝒐
Calculate prototype & threshold Re-label
𝑻𝒉𝑵𝑵
𝑻𝒉𝑵𝑵
𝑻𝒉𝑵𝑵
𝑻𝒉𝑵𝑵
Nearest Neighbor-based Method
Calculate threshold Re-label
Bill Gates , an American investor , stepped down as CEO of Microsoft.
PER O O O (NORP) OO O O O O O ORG
Non-linear mapping
PER
(Section 4.1)
(Section 4.2)
Figure 4: Overview of the proposed representation learning method: (1) We propose an entity-aware contrastive
learning method to adaptively detect entity clusters from "O" and learn discriminative representations for these
entities. (2) We propose two distance-based relabeling strategies to further maintain the performance of old classes.
tities (Observation 1). (2) The representations of
old entity classes are expected to keep discrimina-
tive in spite of being labeled as "O" (Observation
2). (3) The potential entity class are expected to
be detected and separated from "O", and also be
discriminative to other entity classes (Observation
3). These observations and conclusions contribute
to the motivation of the proposed method.
4
Handling the Unlabeled Entity Problem
In order to learn discriminative representations for
unlabeled entity classes and the true "O" (con-
nected to Observations 1, 2, 3), we propose entity-
aware contrastive learning, which adaptively de-
tects entity clusters in "O" during contrastive learn-
ing. To further maintain the class discrimination
of old classes (connected to Observation 2), we
propose two distance-based relabeling strategies
to relabel the unlabeled entities from old classes
in "O". Additionally, we propose the use of the
Nearest Class Mean classifier based on learnt rep-
resentations in order to avoid the prediction bias of
linear classifier.
Rehearsal-based task formulation To better learn
representations for entities and "O", in this work,
we follow the memory replay (rehearsal) setting
adopted by most of the previous works (Rebuffi
et al.,2017;Mai et al.,2021;Verwimp et al.,
2021). Formally, we retain a set of exemplars
Mc={xi
c, yi
c, Xi
c}K
i=1
for each class
c
, where
xi
c
refers to one token
x
labeled as class
c
and
X
is the
context of
x
labeled as "O". In all our experiments,
we set K= 5 2.
2
We set a small
K
because the class number can be large.
4.1 Entity-aware Contrastive Learning
In this section, we introduce the entity-aware con-
trastive learning, which dynamically learns entity
clusters in "O". To this aim, we first learn an entity-
oriented feature space, where the representations
of entities are distinctive from "O". This entity-
oriented feature space is learnt through contrastive
learning on the labeled entity classes in the first
M
epochs of each step. Based on the entity-oriented
feature space, we further conduct contrastive learn-
ing on "O", with the anchors and positive samples
dynamically selected based on an entity threshold.
Learning an Entity-oriented Feature Space.
Firstly, we are to learn an entity-oriented feature
space, where the distance between representations
reflects entity semantic similarity, i.e., representa-
tions from the same entity class have higher simi-
larity while keeping the distance from other classes.
This feature space is realized by learning a non-
linear mapping
F(·)
on the output representations
h
of PLM. We adopt cosine similarity as the sim-
ilarity metric and train with the Supervised Con-
trastive Loss (Khosla et al.,2020):
LSCL =X
iI
1
|P(i)|X
pP(i)
log es(zi,zp)
PaA(i)es(zi,za)
(1)
where
z=F(h)
denotes the representation after
the mapping and s(·)is the cosine similarity.
Here, we apply contrastive learning only on the
entity classes, thus we define:
I={i|iIndex(Dtr
t), yi̸= “O}
A(i) = {j|jIndex(Dtr
t), j ̸=i}
P(i) = {p|pA(i), yp=yi}
(2)
摘要:

Learning"O"HelpsforLearningMore:HandlingtheUnlabeledEntityProblemforClass-incrementalNERRuotianMa1∗,XuantingChen1∗,LinZhang1,XinZhou1,JunzheWang1,TaoGui2†,QiZhang1†,XiangGao3,YunwenChen31SchoolofComputerScience,FudanUniversity,Shanghai,China2InstituteofModernLanguagesandLinguistics,FudanUniversity,S...

展开>> 收起<<
Learning O Helps for Learning More Handling the Unlabeled Entity Problem for Class-incremental NER Ruotian Ma1 Xuanting Chen1 Lin Zhang1 Xin Zhou1.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:1.12MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注