
steps are unlabeled in the current step, causing the
model catastrophically forgetting these old classes.
(Lopez-Paz and Ranzato,2017;Castro et al.,2018)
(2) potential entity classes that are not annotated
till the current step, yet might be required in a
future step. For example, the "FILM" class is not
annotated till step 2, yet is required in step K.
In this work, we conduct an empirical study to
demonstrate the significance of the "Unlabeled En-
tity Problem" on class-incremental NER. We ob-
serve that: (1) The majority of prediction errors
come from the confusion between entities and "O".
(2) Mislabeled as "O" leads to the reduction of
class discrimination of old entities during incre-
mental learning. (3) The model’s ability to learn
new classes also declines as the potential classes
are unlabeled during incremental training. These
problems attribute to the serious performance drop
of incremental learning with the steps increasing.
To tackle the Unlabeled Entity Problem, we pro-
pose a novel representation learning method for
learning discriminative representations for the un-
labeled entity classes and "O". Specifically, we
propose an entity-aware contrastive learning ap-
proach, which adaptively detects entity clusters
from "O" and learns discriminative representations
for these entity clusters. To further maintain the
class discrimination of old classes, we propose two
distance-based relabeling strategies. By relabeling
the entities from old classes with high accuracy,
this practice not only keeps the performance of
old classes, but also benefits the model’s ability to
separate new classes from "O".
We also argue that the experimental setting of
previous works Monaikul et al. (2021) is less re-
alistic. Specifically, they introduce only one or
two entity classes in each incremental step, and
the number of total steps is limited. In real-world
applications, it is more common that a set of new
categories is introduced in each step (e.g., a set of
product types), and the incremental learning steps
can keep increasing. In this work, we provide a
more realistic and challenging benchmark based on
the Few-NERD dataset (Ding et al.,2021), follow-
ing the settings of previous studies (Rebuffi et al.,
2017;Li and Hoiem,2017). We conduct intensive
experiments on the proposed methods and other
comparable baselines, verifying the effectiveness
of the proposed method 1.
1
Our code is publicly available at
https://github.
com/rtmaww/O_CILNER.
To summarize the contribution of this work:
•
We conduct an empirical study to demonstrate
the significance of the "Unlabeled Entity Prob-
lem" in class-incremental NER.
•
Based on our observations, we propose a
novel representation learning approach for bet-
ter learning the unlabeled entities and "O",
and verify the effectiveness of our method
with extensive experiments.
•
We provide a more realistic and challenging
benchmark for class-incremental NER.
2 Class-incremental NER
In this work, we focus on class-incremental
learning on NER. Formally, there are
N
in-
cremental steps, corresponding to a series
of tasks
{T1,T2,...,TN}
. Here,
Tt=
(Dtr
t,Ddev
t,Dtest
t,Ct,new,Ct,old)
is the task at the
tth
step.
Ct,new
is the label set of the current task,
containing only the new classes introduced in the
current step (e.g., {"LOC", "DATE"} in Fig.1, step
2).
Ct,old =
t−1
S
i=1
Ci,new ∪ {“O”}
is the label set of
old classes, containing all classes in previous tasks
and the class "O" (e.g., {"PER", "O"} in Fig.1, step
2).
Dtr
t={Xj
t, Y j
t}n
j=1
is the training set of task
t
, where each sentence
Xj
t={xj,1
t, . . . , xj,l
t}
and
Yj
t={yj,1
t, . . . , yj,l
t}, yj,k
t∈ Ct,new
is annotated
with only the new classes. In each step
t
, the model
At−1
from the last step needs to be updated with
only the data
Dtr
t
from the current step, and is ex-
pected to perform well on the test set covering all
learnt entity types Call
t=Ct,new ∪ Ct,old.
3 The Importance of Unlabeled Entity
Problem in Class-incremental NER
In this section, we demonstrate the importance of
the Unlabeled Entity Problem in Class-incremental
NER with empirical studies. We conduct ex-
periments on a challenging dataset, the Few-
NERD dataset, to investigate the problems in class-
incremental NER. We conduct experiments with
two existing methods: (1) iCaRL (Rebuffi et al.,
2017), a typical and well-performed method in
class-incremental image classification. (2) Con-
tinual NER (Monaikul et al.,2021), the previous
state-of-the-art method in class-incremental NER.
More details of the dataset and the baseline meth-
ods can be found in Section 5.