SEE-Few Seed Expand and Entail for Few-shot Named Entity Recognition Zeng Yang andLinhai Zhang andDeyu Zhou

2025-04-24 0 0 535.33KB 11 页 10玖币

侵权投诉

SEE-Few: Seed, Expand and Entail for Few-shot Named Entity

Recognition

Zeng Yang and Linhai Zhang and Deyu Zhou∗

School of Computer Science and Engineering, Key Laboratory of Computer Network

and Information Integration, Ministry of Education, Southeast University, China

{yangzeng, lzhang472, d.zhou}@seu.edu.cn

Abstract

Few-shot named entity recognition (NER)

aims at identifying named entities based on

only few labeled instances. Current few-shot

NER methods focus on leveraging existing

datasets in the rich-resource domains which

might fail in a training-from-scratch setting

where no source-domain data is used. To

tackle training-from-scratch setting, it is cru-

cial to make full use of the annotation informa-

tion (the boundaries and entity types). There-

fore, in this paper, we propose a novel multi-

task (Seed, Expand and Entail) learning frame-

work, SEE-Few, for Few-shot NER without

using source domain data. The seeding and

expanding modules are responsible for provid-

ing as accurate candidate spans as possible

for the entailing module. The entailing mod-

ule reformulates span classiﬁcation as a tex-

tual entailment task, leveraging both the con-

textual clues and entity type information. All

the three modules share the same text encoder

and are jointly learned. Experimental results

on four benchmark datasets under the training-

from-scratch setting show that the proposed

method outperformed state-of-the-art few-shot

NER methods with a large margin. Our code

is available at https://github.com/

unveiled-the-red-hat/SEE-Few.

1 Introduction

Named entity recognition (NER), focusing on iden-

tifying mention spans in text inputs and classifying

them into the pre-deﬁned entity categories, is a

fundamental task in natural language processing

and widely used in downstream tasks (Wang et al.,

2019;Zhou et al.,2021;Peng et al.,2022). Super-

vised NER has been intensively studied and yielded

signiﬁcant progress, especially with the aid of pre-

trained language models (Devlin et al.,2019;Li

et al.,2020;Mengge et al.,2020;Yu et al.,2020;

Shen et al.,2021;Li et al.,2021a;Chen and Kong,

∗Corresponding author.

2021). However, supervised NER relies on plenty

of training data, which is not suitable for some

speciﬁc situations with few training data.

Few-shot NER, aiming at recognizing entities

based on few labeled instances, has attracted much

attention in the research ﬁled. Approaches for few-

shot NER can be roughly divided into two cate-

gories, span-based and sequence-labeling-based

methods. Span-based approaches enumerate text

spans in input texts and classify each span based

on its corresponding template score (Cui et al.,

2021). Sequence-labeling-based approaches treat

NER as a sequence labeling problem which as-

signs a tag for each token using the BIO or IO tag-

ging scheme (Yang and Katiyar,2020;Hou et al.,

2020;Huang et al.,2021). Most of these span-

based and sequence-labeling-based methods focus

on leveraging existing datasets in the rich-resource

domains to improve their performance in the low-

resource domains. Unfortunately, the gap between

the source domains and the target domains may

hinder the performance of these methods (Pan and

Yang,2009;Cui et al.,2021). Moreover, these ap-

proaches might fail under the training-from-scratch

setting where no source domain data is available.

Therefore, it is crucial to make full use of the in-

domain annotations, which consist of two types of

information: boundary information and entity type

information. However, most the approaches men-

tioned above fail to fully utilize these information.

(1) Most span-based methods simply enumerate all

possible spans, ignoring the boundary information

of named entities. As a large number of negative

spans are generated, these approaches suffer from

the bias, the tendency to classify named entities

as non-entities. (2) Most sequence-labeling-based

methods simply employ one-hot vectors to repre-

sent entity types while ignoring the prior knowl-

edge of entity types.

To overcome the disadvantages mentioned

above, ﬁrstly, inspired by three principles for

arXiv:2210.05632v1 [cs.CL] 11 Oct 2022

weakly-supervised image segmentation, i.e. seed,

expand and constrain (Kolesnikov and Lampert,

2016), we seed with relatively high-quality uni-

grams and bigrams in the texts, then expand them

to extract the candidate spans as accurately as possi-

ble. Secondly, we cast span classiﬁcation as textual

entailment to naturally incorporate the entity type

information. For example, to determine whether “J.

K. Rowling” in “J. K. Rowling is a British author.”

is a

PERSON

entity or a non-entity, we treat “J. K.

Rowling is a British author.” as a premise, then

construct “J. K. Rowling is a person.” and “J. K.

Rowling is not an entity.” as hypotheses. In such

way, span classiﬁcation is converted into determin-

ing which hypothesis is true. Moreover, the size of

training data is increased by such converting which

is beneﬁcial for few-shot settings.

In this paper, we propose SEE-Few, a novel

multi-task learning framework (Seed, Expand and

Entail) for Few-shot NER. The seeding and ex-

panding modules are responsible for providing

as accurate candidate spans as possible for the

entailing module. Speciﬁcally, the seed selector

chooses some unigrams and bigrams as seeds based

on some metrics, e.g., the Intersection over Fore-

ground. The expanding module takes a seed and

the window around it into account and expands it

to a candidate span. Compared with enumerating

all possible

n-gram spans, seeding and expanding

can signiﬁcantly reduce the number of candidate

spans and alleviate the impact of negative spans

in the subsequent span classiﬁcation stage. The

entailing module reformulates a span classiﬁcation

task as a textual entailment task, leveraging contex-

tual clues and entity type information to determine

whether a candidate span is an entity and what

type of entity it is. All the three modules share

the same text encoder and are jointly learned. Ex-

periments were conducted on four NER datasets

under training-from-scratch few-shot setting. Ex-

perimental results show that the proposed approach

outperforms several state-of-the-art baselines.

The main contributions can be summarized as

follows:

•

A novel multi-task learning framework (Seed,

Expand and Entail), SEE-Few, is proposed for

few-shot NER without using source domain

data. In speciﬁc, the seeding and expanding

modules provide as accurate candidate spans

as possible for the entailing module. The en-

tailing module reformulates span classiﬁca-

tion as a textual entailment task, leveraging

contextual clues and entity type information.

•

Experiments were conducted on four NER

datasets in training-from-scratch few-shot set-

ting. Experimental results show that the pro-

posed approach outperforms the state-of-the-

art baselines by signiﬁcant margins.

2 Related Work

2.1 Few-shot NER

Few-shot NER aims at recognizing entities based

on only few labeled instances from each category.

A few approaches have been proposed for few-

shot NER. Methods based on prototypical network

(Snell et al.,2017) require complex episode train-

ing (Fritzler et al.,2019;Hou et al.,2020). Yang

and Katiyar (2020) abandon the complex meta-

training and propose NNShot, a distance-based

method with a simple nearest neighbor classiﬁer.

Huang et al. (2021) investigate three orthogonal

schemes to improve the model generalization abil-

ity for few-shot NER. TemplateNER (Cui et al.,

2021) enumerates all possible text spans in input

text as candidate spans and classiﬁes each span

based on its corresponding template score. Ma

et al. (2021) propose a template-free method to re-

formulate NER tasks as language modeling (LM)

problems without any templates. Tong et al. (2021)

propose to mine the undeﬁned classes from miscel-

laneous other-class words, which also beneﬁts few-

shot NER. Ding et al. (2021) present Few-NERD, a

large-scale human-annotated few-shot NER dataset

to facilitate the research.

However, most of these studies follow the man-

ner of episode training (Fritzler et al.,2019;Hou

et al.,2020;Tong et al.,2021;Ding et al.,2021)

or assume an available rich-resource source do-

main (Yang and Katiyar,2020;Cui et al.,2021),

which is in contrast to the real word application

scenarios that only very limited labeled data is

available for training and validation (Ma et al.,

2021). EntLM (Ma et al.,2021) is implemented

on training-from-scratch few-shot setting, but still

needs distant supervision datasets for label word

searching. The construction of distant supervi-

sion datasets requires additional expert knowledge.

Some works study generating NER datasets au-

tomatically to reduce labeling costs (Kim et al.,

2021;Li et al.,2021b). In this paper, we focus on

the few-shot setting without source domain data

Label Annotations

PER: person entities

LOC: location entities

ORG: organization entities

MISC: miscellaneous entities

NONE: non-entities

Candidate Spans

Rui Gomez Pereira

GT entities (during training )

Rui Gomez Pereira in his book …

Premise

Rui Gomez Pereira is a person.

Hypotheses

Rui Gomez Pereira is not an entity.

…

Encoder

Rui Gomez Pereira is a person.

Rui Gomez Pereira is not an entity.

…

√

Entailing

Encoder

Rui Gomez Gomez Pereira

Rui Gomez

book …

Rui Gomez Pereira

Gomez PereiraRui

unigrams

& bigrams

discard

Seeding

Pereira …

in his

𝛼

Expanding

Input

Rui Gomez Pereira in his

book …(label: PER)

shared

Figure 1: The architecture of the proposed approach, SEE-Few, which consists of three main modules: seeding,

expanding, and entailing.

which makes minimal assumptions about available

resources.

2.2 Three Principles for Weakly-Supervised

Image Segmentation

Semantic image segmentation is a computer vision

technique which aims at assigning a semantic class

label to each pixel of an image. Kolesnikov and

Lampert (2016) introduce three guiding principles

for weakly-supervised semantic image segmenta-

tion: to seed with weak localization cues, to expand

objects based on the information of possible classes

in the image, and to constrain the segmentation

with object boundaries.

3 Methodologies

3.1 Problem Setting

We decompose NER to two subtasks: span ex-

traction and span classiﬁcation. Given an input

text

X={x1, . . . , xn}

as a sequence of tokens,

a span starting from

and ending with

(i.e.,

{xl, . . . , xr}

) can be denote as

s= (l, r)

, where

1≤l≤r≤n

. The span extraction task is to

obtain a candidate span set

C={c1, . . . , cm}

from the input text. Given an entity type set

T+={t1, . . . , tv−1}

and the candidate span set

produced by span extraction, the target of span

classiﬁcation is assign an entity category

t∈T+

the non-entity category to each candidate span. For

convenience, we denote an entity type set including

the non-entity type as

T={t1, . . . , tv−1, tnone}

where

tnone

represents the non-entity type and

the size of T.

3.2 The Architecture

Figure 1illustrates the architecture of the proposed

approach, SEE-Few, which consists of three main

modules: seeding, expanding, and entailing. The

input text will ﬁrst be sent to the seeding module

to generate informative seeds, then the seeds will

be expanded to candidate spans in the expanding

module, ﬁnally the candidate spans will be classi-

ﬁed with an entailment task in the entailing module.

We will discuss the details of each modules in the

following sections.

3.2.1 Seeding

Given an input text

X={x1, . . . , xn}

consisting

tokens, a unigram consists of one token and

a bigram consists of two consecutive tokens. We

denote the set of unigrams and bigrams in the input

text as

S={s1, . . . , s2n−1}

, where

si= (li, ri)

denotes

-th span, and

denote the left and right

boundaries of the span respectively.

Seeding is to ﬁnd the unigrams and bigrams that

overlap with entities and have the potential to be

expanded to named entities, which is important for

the following seed expansion. It can be accom-

plished by constructing a seeding model and pre-

dicting the seed score for each candidate unigram

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SEE-Few:Seed,ExpandandEntailforFew-shotNamedEntityRecognitionZengYangandLinhaiZhangandDeyuZhouSchoolofComputerScienceandEngineering,KeyLaboratoryofComputerNetworkandInformationIntegration,MinistryofEducation,SoutheastUniversity,China{yangzeng,lzhang472,d.zhou}@seu.edu.cnAbstractFew-shotnamedentityr...

展开>> 收起<<

SEE-Few Seed Expand and Entail for Few-shot Named Entity Recognition Zeng Yang andLinhai Zhang andDeyu Zhou.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SEE-Few Seed Expand and Entail for Few-shot Named Entity Recognition Zeng Yang andLinhai Zhang andDeyu Zhou

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: