Learning O Helps for Learning More Handling the Unlabeled Entity Problem for Class-incremental NER Ruotian Ma1 Xuanting Chen1 Lin Zhang1 Xin Zhou1

2025-04-29 0 0 1.12MB 19 页 10玖币

侵权投诉

Learning "O" Helps for Learning More:

Handling the Unlabeled Entity Problem for Class-incremental NER

Ruotian Ma1∗

, Xuanting Chen1∗

, Lin Zhang1, Xin Zhou1,

Junzhe Wang1, Tao Gui2†, Qi Zhang1†, Xiang Gao3, Yunwen Chen3

1School of Computer Science, Fudan University, Shanghai, China

2Institute of Modern Languages and Linguistics, Fudan University, Shanghai, China

3DataGrand Information Technology (Shanghai) Co., Ltd.

{rtma19,xuantingchen21,tgui,qz}@fudan.edu.cn

Abstract

As the categories of named entities rapidly

increase, the deployed NER models are re-

quired to keep updating toward recognizing

more entity types, creating a demand for class-

incremental learning for NER. Considering the

privacy concerns and storage constraints, the

standard paradigm for class-incremental NER

updates the models with training data only

annotated with the new classes, yet the enti-

ties from other entity classes are unlabeled, re-

garded as "Non-entity" (or "O"). In this work,

we conduct an empirical study on the "Unla-

beled Entity Problem" and ﬁnd that it leads

to severe confusion between "O" and entities,

decreasing class discrimination of old classes

and declining the model’s ability to learn new

classes. To solve the Unlabeled Entity Prob-

lem, we propose a novel representation learning

method to learn discriminative representations

for the entity classes and "O". Speciﬁcally, we

propose an entity-aware contrastive learning

method that adaptively detects entity clusters

in "O". Furthermore, we propose two effective

distance-based relabeling strategies for better

learning the old classes. We introduce a more

realistic and challenging benchmark for class-

incremental NER, and the proposed method

achieves up to 10.62% improvement over the

baseline methods.

1 Introduction

Existing Named Entity Recognition systems are

typically trained on a large-scale dataset with pre-

deﬁned entity classes, then deployed for entity

recognition on the test data without further adap-

tation or reﬁnement (Li et al.,2020;Wang et al.,

2022;Liu et al.,2021;Ma et al.,2022a). In prac-

tice, the newly-arriving test data may include new

entity classes, and the user’s required entity class

set might keep expanding. Therefore, it is in de-

mand that the NER model can be incrementally

∗Equal contribution.

†Corresponding authors.

"O"

... ...

LOC

Step 1

Step 2

PER

LOC

FILM

Interstellar , directed by Nolan, premiered on in 2014 in California.

PER

Interstellar , directed by Nolan, premiered on in 2014 in California.

"O""O"

"O" "O"

... ...

Step K

"O"

Interstellar , directed by Nolan, premiered on in 2014 in California.

FILM "O"

DATE

"O"

DATE

...

"O" "O" "O"

Potential class

Old class

Step 1 Step 2 Step K

... ...

Figure 1: Problems of class-incremental NER. In each

incremental step, the data is only labeled with current

classes, so the "O" class actually contains entities from

old classes and entities from potential classes.

updated for recognizing new entity classes. How-

ever, one challenge is that the training data of old

entity classes may not be available due to privacy

concerns or memory limitations (Li and Hoiem,

2017;Zhang et al.,2020). Also, it is expensive

and time-consuming to re-annotate all the old en-

tity classes whenever we update the model (De-

lange et al.,2021;Bang et al.,2021). To solve

the problem, Monaikul et al. (2021) proposes to

incrementally update the model with new datasets

only covering the new entity classes, adopted by

following studies as standard class-incremental

NER paradigm.

However, as NER is a sequence labeling task,

annotating only the new classes means entities from

other entity classes are regarded as "Non-entity"

(or "O") in the dataset. For example, in step 2

in Fig.1, the training data for model updating is

only annotated with "LOC" and "DATE", while

the entities from "PER" and "FILM" are unlabeled

and regarded as "O" during training. We refer to

this problem as the "Unlabeled Entity Problem" in

class-incremental NER, which includes two types

of unlabeled entities: (1) old entity classes (e.g.,

"PER" in step 2) that the model learned in previous

arXiv:2210.04676v2 [cs.CL] 24 Jul 2023

steps are unlabeled in the current step, causing the

model catastrophically forgetting these old classes.

(Lopez-Paz and Ranzato,2017;Castro et al.,2018)

(2) potential entity classes that are not annotated

till the current step, yet might be required in a

future step. For example, the "FILM" class is not

annotated till step 2, yet is required in step K.

In this work, we conduct an empirical study to

demonstrate the signiﬁcance of the "Unlabeled En-

tity Problem" on class-incremental NER. We ob-

serve that: (1) The majority of prediction errors

come from the confusion between entities and "O".

(2) Mislabeled as "O" leads to the reduction of

class discrimination of old entities during incre-

mental learning. (3) The model’s ability to learn

new classes also declines as the potential classes

are unlabeled during incremental training. These

problems attribute to the serious performance drop

of incremental learning with the steps increasing.

To tackle the Unlabeled Entity Problem, we pro-

pose a novel representation learning method for

learning discriminative representations for the un-

labeled entity classes and "O". Speciﬁcally, we

propose an entity-aware contrastive learning ap-

proach, which adaptively detects entity clusters

from "O" and learns discriminative representations

for these entity clusters. To further maintain the

class discrimination of old classes, we propose two

distance-based relabeling strategies. By relabeling

the entities from old classes with high accuracy,

this practice not only keeps the performance of

old classes, but also beneﬁts the model’s ability to

separate new classes from "O".

We also argue that the experimental setting of

previous works Monaikul et al. (2021) is less re-

alistic. Speciﬁcally, they introduce only one or

two entity classes in each incremental step, and

the number of total steps is limited. In real-world

applications, it is more common that a set of new

categories is introduced in each step (e.g., a set of

product types), and the incremental learning steps

can keep increasing. In this work, we provide a

more realistic and challenging benchmark based on

the Few-NERD dataset (Ding et al.,2021), follow-

ing the settings of previous studies (Rebufﬁ et al.,

2017;Li and Hoiem,2017). We conduct intensive

experiments on the proposed methods and other

comparable baselines, verifying the effectiveness

of the proposed method 1.

Our code is publicly available at

https://github.

com/rtmaww/O_CILNER.

To summarize the contribution of this work:

•

We conduct an empirical study to demonstrate

the signiﬁcance of the "Unlabeled Entity Prob-

lem" in class-incremental NER.

•

Based on our observations, we propose a

novel representation learning approach for bet-

ter learning the unlabeled entities and "O",

and verify the effectiveness of our method

with extensive experiments.

•

We provide a more realistic and challenging

benchmark for class-incremental NER.

2 Class-incremental NER

In this work, we focus on class-incremental

learning on NER. Formally, there are

in-

cremental steps, corresponding to a series

of tasks

{T1,T2,...,TN}

. Here,

Tt=

(Dtr

t,Ddev

t,Dtest

t,Ct,new,Ct,old)

is the task at the

tth

step.

Ct,new

is the label set of the current task,

containing only the new classes introduced in the

current step (e.g., {"LOC", "DATE"} in Fig.1, step

2).

Ct,old =

t−1

i=1

Ci,new ∪ {“O”}

is the label set of

old classes, containing all classes in previous tasks

and the class "O" (e.g., {"PER", "O"} in Fig.1, step

2).

Dtr

t={Xj

t, Y j

t}n

j=1

is the training set of task

, where each sentence

t={xj,1

t, . . . , xj,l

and

t={yj,1

t, . . . , yj,l

t}, yj,k

t∈ Ct,new

is annotated

with only the new classes. In each step

, the model

At−1

from the last step needs to be updated with

only the data

Dtr

from the current step, and is ex-

pected to perform well on the test set covering all

learnt entity types Call

t=Ct,new ∪ Ct,old.

3 The Importance of Unlabeled Entity

Problem in Class-incremental NER

In this section, we demonstrate the importance of

the Unlabeled Entity Problem in Class-incremental

NER with empirical studies. We conduct ex-

periments on a challenging dataset, the Few-

NERD dataset, to investigate the problems in class-

incremental NER. We conduct experiments with

two existing methods: (1) iCaRL (Rebufﬁ et al.,

2017), a typical and well-performed method in

class-incremental image classiﬁcation. (2) Con-

tinual NER (Monaikul et al.,2021), the previous

state-of-the-art method in class-incremental NER.

More details of the dataset and the baseline meth-

ods can be found in Section 5.

(a) iCaRL (b) Continual NER

Figure 2: Distributions of prediction errors of different

models in step 6. The ﬁrst row represents the number of

samples belonging to the "O" class wrongly recognized

as entities, which shows the severe confusion between

"O" and entity classes.

(a) Continual NER, step 2 (b) Continual NER, step 5

Figure 3: Visualization of the representation variation

of the old classes during incremental learning. The class

discrimination seriously decrease in step 5.

Observation 1: The majority of prediction er-

rors come from the confusion between entities

and "O". In Fig.2, we show the distributions

of prediction errors of different models in step 6,

where the y-axis denotes samples belonging to "O"

or the classes of different tasks. The x-axis de-

notes the samples are wrongly predicted as "O" or

as classes from different tasks. Each number in a

grid denotes the number of error predictions. From

the results, we can see that the majority of error

predictions are samples belonging to "O" wrongly

predicted as entities (the ﬁrst row of each model),

indicating serious confusion between "O" and en-

tity classes, especially the old entity classes. As ex-

plained in Section 1, the training data of each new

task is only annotated with the new entity classes

and the entities from old classes are labeled as

"O". As the training proceeds, the class variance

between the true "O" and old entity classes will

decrease, leading to serious confusion of their rep-

resentations.

Observation 2: Old entity classes become

less discriminative during incremental learning.

We further investigate the representation variation

of old classes during incremental learning. As

shown in Fig.3, we select similar classes from

step 0 and step 1, and visualize their representa-

tions after step 2 and step 5. The results show that

the representations of these classes are discrimi-

native enough in step 2. However, after a series

of incremental steps, the representations of these

old classes become less discriminative, leading to

decreasing performance of old classes. This phe-

nomenon also indicates the inﬂuence of the unla-

beled entity problem on the unlabeled old classes.

Steps 0 1 2 3 4 5 6

Full Data 72.7 69.2 68.3 67.0 67.3 69.1 68.8

iCaRL 71.3 56.9 52.6 48.8 53.4 48.1 39.6

Con. NER 72.4 63.5 56.9 52.5 56.8 51.8 42.2

Table 1: Performance of the new classes on dev set (only

containing the new classes) keep decreasing during in-

cremental learning. Here, Full Data is the model trained

with datasets labeled with both old and new classes.

Observation 3: The model’s ability to learn

new classes declines during incremental learn-

ing. Finally, we conduct an experiment to inves-

tigate the model’s ability to learn new classes. In

Table 1, we test the results of new classes in each

step on dev sets that only contain these new classes.

Here, Full Data is a baseline that trains on datasets

that both old and new classes are annotated. Sur-

prisingly, we ﬁnd that the performance of the new

classes of iCaRL and Continual NER keeps de-

creasing during incremental learning, compared

to the stable performance of Full Data. This phe-

nomenon is also related to the Unlabeled Entity

Problem. As explained in the introduction, the po-

tential entity classes (i.e., the entity classes that

might be needed in a future step) are also unlabeled

and regarded as "O" during incremental learning.

As a result, the representations of these classes be-

come less separable from similar old classes (also

labeled as "O"), thus hindering the model’s ability

to learn new classes.

Conclusion to the Observations: Based on

above observations, we propose that appropriate

representation learning are required to tackle the

Unlabeled Entity Problems. The representations of

entity and "O" are expected to meet the following

requirements: (1) The "O" representations are ex-

pected to be distinct from the entity representations,

so as to decline the confusion between "O" and en-

𝑻𝒉𝒆𝒏𝒕𝒊𝒕𝒚 = (𝑬𝒒. 𝟔)

𝑻𝒉𝒆𝒏𝒕𝒊𝒕𝒚

Learning an entity-oriented

feature space with Supcon Loss

Calculating entity threshold

for selecting anchors and positives Entity-aware Contrastive Learning for “O”

Re-labeling the unlabeled old entities.

Entity-aware Contrastive Learning for adaptively learning discriminative representations.

Prototype-based Method

𝑻𝒉𝒑𝒓𝒐𝒕𝒐 𝑻𝒉𝒑𝒓𝒐𝒕𝒐

Calculate prototype & threshold Re-label

𝑻𝒉𝑵𝑵

Nearest Neighbor-based Method

Calculate threshold Re-label

Bill Gates , an American investor , stepped down as CEO of Microsoft.

PER O O O (NORP) OO O O O O O ORG

Non-linear mapping

PER

(Section 4.1)

(Section 4.2)

Figure 4: Overview of the proposed representation learning method: (1) We propose an entity-aware contrastive

learning method to adaptively detect entity clusters from "O" and learn discriminative representations for these

entities. (2) We propose two distance-based relabeling strategies to further maintain the performance of old classes.

tities (Observation 1). (2) The representations of

old entity classes are expected to keep discrimina-

tive in spite of being labeled as "O" (Observation

2). (3) The potential entity class are expected to

be detected and separated from "O", and also be

discriminative to other entity classes (Observation

3). These observations and conclusions contribute

to the motivation of the proposed method.

Handling the Unlabeled Entity Problem

In order to learn discriminative representations for

unlabeled entity classes and the true "O" (con-

nected to Observations 1, 2, 3), we propose entity-

aware contrastive learning, which adaptively de-

tects entity clusters in "O" during contrastive learn-

ing. To further maintain the class discrimination

of old classes (connected to Observation 2), we

propose two distance-based relabeling strategies

to relabel the unlabeled entities from old classes

in "O". Additionally, we propose the use of the

Nearest Class Mean classiﬁer based on learnt rep-

resentations in order to avoid the prediction bias of

linear classiﬁer.

Rehearsal-based task formulation To better learn

representations for entities and "O", in this work,

we follow the memory replay (rehearsal) setting

adopted by most of the previous works (Rebufﬁ

et al.,2017;Mai et al.,2021;Verwimp et al.,

2021). Formally, we retain a set of exemplars

Mc={xi

c, yi

c, Xi

c}K

i=1

for each class

, where

refers to one token

labeled as class

and

is the

context of

labeled as "O". In all our experiments,

we set K= 5 2.

We set a small

because the class number can be large.

4.1 Entity-aware Contrastive Learning

In this section, we introduce the entity-aware con-

trastive learning, which dynamically learns entity

clusters in "O". To this aim, we ﬁrst learn an entity-

oriented feature space, where the representations

of entities are distinctive from "O". This entity-

oriented feature space is learnt through contrastive

learning on the labeled entity classes in the ﬁrst

epochs of each step. Based on the entity-oriented

feature space, we further conduct contrastive learn-

ing on "O", with the anchors and positive samples

dynamically selected based on an entity threshold.

Learning an Entity-oriented Feature Space.

Firstly, we are to learn an entity-oriented feature

space, where the distance between representations

reﬂects entity semantic similarity, i.e., representa-

tions from the same entity class have higher simi-

larity while keeping the distance from other classes.

This feature space is realized by learning a non-

linear mapping

F(·)

on the output representations

of PLM. We adopt cosine similarity as the sim-

ilarity metric and train with the Supervised Con-

trastive Loss (Khosla et al.,2020):

LSCL =X

i∈I

−1

|P(i)|X

p∈P(i)

log es(zi,zp)/τ

Pa∈A(i)es(zi,za)/τ

(1)

where

z=F(h)

denotes the representation after

the mapping and s(·)is the cosine similarity.

Here, we apply contrastive learning only on the

entity classes, thus we deﬁne:

I={i|i∈Index(Dtr

t), yi̸= “O”}

A(i) = {j|j∈Index(Dtr

t), j ̸=i}

P(i) = {p|p∈A(i), yp=yi}

(2)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Learning"O"HelpsforLearningMore:HandlingtheUnlabeledEntityProblemforClass-incrementalNERRuotianMa1∗,XuantingChen1∗,LinZhang1,XinZhou1,JunzheWang1,TaoGui2†,QiZhang1†,XiangGao3,YunwenChen31SchoolofComputerScience,FudanUniversity,Shanghai,China2InstituteofModernLanguagesandLinguistics,FudanUniversity,S...

展开>> 收起<<

Learning O Helps for Learning More Handling the Unlabeled Entity Problem for Class-incremental NER Ruotian Ma1 Xuanting Chen1 Lin Zhang1 Xin Zhou1.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning O Helps for Learning More Handling the Unlabeled Entity Problem for Class-incremental NER Ruotian Ma1 Xuanting Chen1 Lin Zhang1 Xin Zhou1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: