SEE-Few Seed Expand and Entail for Few-shot Named Entity Recognition Zeng Yang andLinhai Zhang andDeyu Zhou

2025-04-24 0 0 535.33KB 11 页 10玖币
侵权投诉
SEE-Few: Seed, Expand and Entail for Few-shot Named Entity
Recognition
Zeng Yang and Linhai Zhang and Deyu Zhou
School of Computer Science and Engineering, Key Laboratory of Computer Network
and Information Integration, Ministry of Education, Southeast University, China
{yangzeng, lzhang472, d.zhou}@seu.edu.cn
Abstract
Few-shot named entity recognition (NER)
aims at identifying named entities based on
only few labeled instances. Current few-shot
NER methods focus on leveraging existing
datasets in the rich-resource domains which
might fail in a training-from-scratch setting
where no source-domain data is used. To
tackle training-from-scratch setting, it is cru-
cial to make full use of the annotation informa-
tion (the boundaries and entity types). There-
fore, in this paper, we propose a novel multi-
task (Seed, Expand and Entail) learning frame-
work, SEE-Few, for Few-shot NER without
using source domain data. The seeding and
expanding modules are responsible for provid-
ing as accurate candidate spans as possible
for the entailing module. The entailing mod-
ule reformulates span classification as a tex-
tual entailment task, leveraging both the con-
textual clues and entity type information. All
the three modules share the same text encoder
and are jointly learned. Experimental results
on four benchmark datasets under the training-
from-scratch setting show that the proposed
method outperformed state-of-the-art few-shot
NER methods with a large margin. Our code
is available at https://github.com/
unveiled-the-red-hat/SEE-Few.
1 Introduction
Named entity recognition (NER), focusing on iden-
tifying mention spans in text inputs and classifying
them into the pre-defined entity categories, is a
fundamental task in natural language processing
and widely used in downstream tasks (Wang et al.,
2019;Zhou et al.,2021;Peng et al.,2022). Super-
vised NER has been intensively studied and yielded
significant progress, especially with the aid of pre-
trained language models (Devlin et al.,2019;Li
et al.,2020;Mengge et al.,2020;Yu et al.,2020;
Shen et al.,2021;Li et al.,2021a;Chen and Kong,
Corresponding author.
2021). However, supervised NER relies on plenty
of training data, which is not suitable for some
specific situations with few training data.
Few-shot NER, aiming at recognizing entities
based on few labeled instances, has attracted much
attention in the research filed. Approaches for few-
shot NER can be roughly divided into two cate-
gories, span-based and sequence-labeling-based
methods. Span-based approaches enumerate text
spans in input texts and classify each span based
on its corresponding template score (Cui et al.,
2021). Sequence-labeling-based approaches treat
NER as a sequence labeling problem which as-
signs a tag for each token using the BIO or IO tag-
ging scheme (Yang and Katiyar,2020;Hou et al.,
2020;Huang et al.,2021). Most of these span-
based and sequence-labeling-based methods focus
on leveraging existing datasets in the rich-resource
domains to improve their performance in the low-
resource domains. Unfortunately, the gap between
the source domains and the target domains may
hinder the performance of these methods (Pan and
Yang,2009;Cui et al.,2021). Moreover, these ap-
proaches might fail under the training-from-scratch
setting where no source domain data is available.
Therefore, it is crucial to make full use of the in-
domain annotations, which consist of two types of
information: boundary information and entity type
information. However, most the approaches men-
tioned above fail to fully utilize these information.
(1) Most span-based methods simply enumerate all
possible spans, ignoring the boundary information
of named entities. As a large number of negative
spans are generated, these approaches suffer from
the bias, the tendency to classify named entities
as non-entities. (2) Most sequence-labeling-based
methods simply employ one-hot vectors to repre-
sent entity types while ignoring the prior knowl-
edge of entity types.
To overcome the disadvantages mentioned
above, firstly, inspired by three principles for
arXiv:2210.05632v1 [cs.CL] 11 Oct 2022
weakly-supervised image segmentation, i.e. seed,
expand and constrain (Kolesnikov and Lampert,
2016), we seed with relatively high-quality uni-
grams and bigrams in the texts, then expand them
to extract the candidate spans as accurately as possi-
ble. Secondly, we cast span classification as textual
entailment to naturally incorporate the entity type
information. For example, to determine whether “J.
K. Rowling” in “J. K. Rowling is a British author.
is a
PERSON
entity or a non-entity, we treat “J. K.
Rowling is a British author. as a premise, then
construct “J. K. Rowling is a person. and “J. K.
Rowling is not an entity. as hypotheses. In such
way, span classification is converted into determin-
ing which hypothesis is true. Moreover, the size of
training data is increased by such converting which
is beneficial for few-shot settings.
In this paper, we propose SEE-Few, a novel
multi-task learning framework (Seed, Expand and
Entail) for Few-shot NER. The seeding and ex-
panding modules are responsible for providing
as accurate candidate spans as possible for the
entailing module. Specifically, the seed selector
chooses some unigrams and bigrams as seeds based
on some metrics, e.g., the Intersection over Fore-
ground. The expanding module takes a seed and
the window around it into account and expands it
to a candidate span. Compared with enumerating
all possible
ˆ
n-gram spans, seeding and expanding
can significantly reduce the number of candidate
spans and alleviate the impact of negative spans
in the subsequent span classification stage. The
entailing module reformulates a span classification
task as a textual entailment task, leveraging contex-
tual clues and entity type information to determine
whether a candidate span is an entity and what
type of entity it is. All the three modules share
the same text encoder and are jointly learned. Ex-
periments were conducted on four NER datasets
under training-from-scratch few-shot setting. Ex-
perimental results show that the proposed approach
outperforms several state-of-the-art baselines.
The main contributions can be summarized as
follows:
A novel multi-task learning framework (Seed,
Expand and Entail), SEE-Few, is proposed for
few-shot NER without using source domain
data. In specific, the seeding and expanding
modules provide as accurate candidate spans
as possible for the entailing module. The en-
tailing module reformulates span classifica-
tion as a textual entailment task, leveraging
contextual clues and entity type information.
Experiments were conducted on four NER
datasets in training-from-scratch few-shot set-
ting. Experimental results show that the pro-
posed approach outperforms the state-of-the-
art baselines by significant margins.
2 Related Work
2.1 Few-shot NER
Few-shot NER aims at recognizing entities based
on only few labeled instances from each category.
A few approaches have been proposed for few-
shot NER. Methods based on prototypical network
(Snell et al.,2017) require complex episode train-
ing (Fritzler et al.,2019;Hou et al.,2020). Yang
and Katiyar (2020) abandon the complex meta-
training and propose NNShot, a distance-based
method with a simple nearest neighbor classifier.
Huang et al. (2021) investigate three orthogonal
schemes to improve the model generalization abil-
ity for few-shot NER. TemplateNER (Cui et al.,
2021) enumerates all possible text spans in input
text as candidate spans and classifies each span
based on its corresponding template score. Ma
et al. (2021) propose a template-free method to re-
formulate NER tasks as language modeling (LM)
problems without any templates. Tong et al. (2021)
propose to mine the undefined classes from miscel-
laneous other-class words, which also benefits few-
shot NER. Ding et al. (2021) present Few-NERD, a
large-scale human-annotated few-shot NER dataset
to facilitate the research.
However, most of these studies follow the man-
ner of episode training (Fritzler et al.,2019;Hou
et al.,2020;Tong et al.,2021;Ding et al.,2021)
or assume an available rich-resource source do-
main (Yang and Katiyar,2020;Cui et al.,2021),
which is in contrast to the real word application
scenarios that only very limited labeled data is
available for training and validation (Ma et al.,
2021). EntLM (Ma et al.,2021) is implemented
on training-from-scratch few-shot setting, but still
needs distant supervision datasets for label word
searching. The construction of distant supervi-
sion datasets requires additional expert knowledge.
Some works study generating NER datasets au-
tomatically to reduce labeling costs (Kim et al.,
2021;Li et al.,2021b). In this paper, we focus on
the few-shot setting without source domain data
Label Annotations
PER: person entities
LOC: location entities
ORG: organization entities
MISC: miscellaneous entities
NONE: non-entities
Candidate Spans
Rui Gomez Pereira
GT entities (during training )
Rui Gomez Pereira in his book …
Premise
Rui Gomez Pereira is a person.
Hypotheses
Rui Gomez Pereira is not an entity.
Encoder
Rui Gomez Pereira is a person.
Rui Gomez Pereira is not an entity.
×
Entailing
Encoder
Rui Gomez Gomez Pereira
Rui Gomez
book
Rui Gomez Pereira
Gomez PereiraRui
unigrams
& bigrams
discard
Seeding
Pereira
in his
𝛼
Expanding
Input
Rui Gomez Pereira in his
book …(label: PER)
shared
Figure 1: The architecture of the proposed approach, SEE-Few, which consists of three main modules: seeding,
expanding, and entailing.
which makes minimal assumptions about available
resources.
2.2 Three Principles for Weakly-Supervised
Image Segmentation
Semantic image segmentation is a computer vision
technique which aims at assigning a semantic class
label to each pixel of an image. Kolesnikov and
Lampert (2016) introduce three guiding principles
for weakly-supervised semantic image segmenta-
tion: to seed with weak localization cues, to expand
objects based on the information of possible classes
in the image, and to constrain the segmentation
with object boundaries.
3 Methodologies
3.1 Problem Setting
We decompose NER to two subtasks: span ex-
traction and span classification. Given an input
text
X={x1, . . . , xn}
as a sequence of tokens,
a span starting from
xl
and ending with
xr
(i.e.,
{xl, . . . , xr}
) can be denote as
s= (l, r)
, where
1lrn
. The span extraction task is to
obtain a candidate span set
C={c1, . . . , cm}
from the input text. Given an entity type set
T+={t1, . . . , tv1}
and the candidate span set
C
produced by span extraction, the target of span
classification is assign an entity category
tT+
or
the non-entity category to each candidate span. For
convenience, we denote an entity type set including
the non-entity type as
T={t1, . . . , tv1, tnone}
,
where
tnone
represents the non-entity type and
v
is
the size of T.
3.2 The Architecture
Figure 1illustrates the architecture of the proposed
approach, SEE-Few, which consists of three main
modules: seeding, expanding, and entailing. The
input text will first be sent to the seeding module
to generate informative seeds, then the seeds will
be expanded to candidate spans in the expanding
module, finally the candidate spans will be classi-
fied with an entailment task in the entailing module.
We will discuss the details of each modules in the
following sections.
3.2.1 Seeding
Given an input text
X={x1, . . . , xn}
consisting
of
n
tokens, a unigram consists of one token and
a bigram consists of two consecutive tokens. We
denote the set of unigrams and bigrams in the input
text as
S={s1, . . . , s2n1}
, where
si= (li, ri)
denotes
i
-th span, and
li
,
ri
denote the left and right
boundaries of the span respectively.
Seeding is to find the unigrams and bigrams that
overlap with entities and have the potential to be
expanded to named entities, which is important for
the following seed expansion. It can be accom-
plished by constructing a seeding model and pre-
dicting the seed score for each candidate unigram
摘要:

SEE-Few:Seed,ExpandandEntailforFew-shotNamedEntityRecognitionZengYangandLinhaiZhangandDeyuZhouSchoolofComputerScienceandEngineering,KeyLaboratoryofComputerNetworkandInformationIntegration,MinistryofEducation,SoutheastUniversity,China{yangzeng,lzhang472,d.zhou}@seu.edu.cnAbstractFew-shotnamedentityr...

展开>> 收起<<
SEE-Few Seed Expand and Entail for Few-shot Named Entity Recognition Zeng Yang andLinhai Zhang andDeyu Zhou.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:535.33KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注