TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY RECOGNITION

2025-05-06 0 0 1.16MB 16 页 10玖币
侵权投诉
TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE
HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY
RECOGNITION
Xueru Wen
College of Computer Science and Technology
Jilin University
Changchun
wenxr2119@mails.jlu.edu.cn
Changjiang Zhou
College of Computer Science and Technology
Jilin University
Changchun
Haotian Tang
College of Computer Science and Technology
Jilin University
Changchun
Luguang Liang
College of Computer Science and Technology
Jilin University
Changchun
Yu Jiang
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education
Jilin University
jiangyu2011@jlu.edu.cn
Hong Qi
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education
Jilin University
ABSTRACT
Named entity recognition is a fundamental task in natural language processing, identifying the span
and category of entities in unstructured texts. The traditional sequence labeling methodology ignores
the nested entities, i.e. entities included in other entity mentions. Many approaches attempt to
address this scenario, most of which rely on complex structures or have high computation complexity.
The representation learning of the heterogeneous star graph containing text nodes and type nodes
is investigated in this paper. In addition, we revise the graph attention mechanism into a hybrid
form to address its unreasonableness in specific topologies. The model performs the type-supervised
sequence labeling after updating nodes in the graph. The annotation scheme is an extension of the
single-layer sequence labeling and is able to cope with the vast majority of nested entities. Extensive
experiments on public NER datasets reveal the effectiveness of our model in extracting both flat and
nested entities. The method achieved state-of-the-art performance on both flat and nested datasets.
The significant improvement in accuracy reflects the superiority of the multi-layer labeling strategy.
Keywords Named Entity Recognition ·Sequence Labeling ·Heterogeneous Graph
arXiv:2210.10240v2 [cs.CL] 21 Oct 2022
Running Title for Header
1 Introduction
Named Entity Recognition is an essential task in natural
language processing that aims to recognize the boundaries
and types of entities with specific meanings in the text,
including names of people, places, institutions, etc. The
Named Entity Recognition task is not only a vital tool for
information extraction, but also a crucial component in
many downstream tasks, such as text understanding [1].
Named entity recognition is usually modeled as a sequence
labeling problem and can be efficiently solved by an RNN-
based approach [
2
]. The sequence labeling modeling ap-
proach simplifies the problem based on the assumption
that entities never nested with each other. However, en-
tities may be overlapping or deeply nested in real-world
language environments, as in Figure 1. More and more
studies are exploring modified models to deal with this
more complex situation.
ME: Chronic diseases identified: Hypertension.
NDT: Cytomegalovirus modulates interleukin-6 gene expression.
DIS
ABBR
NST: Characterization of the human elk-1 promoter.
DNA
DNA
DNA
PRO
ME: Chronic diseases identified: Hypertension.
NDT: Cytomegalovirus modulates interleukin-6 gene expression.
DIS
ABBR
NST: Characterization of the human elk-1 promoter.
DNA
DNA
DNA
PRO
Figure 1: Example of entity nesting from GENIA [
3
] and
Chilean Waiting List [
4
]. The colored arrows indicate
the category and span of the entities. The bolded black
abbreviations denote the type of entity nesting.
Some works like [
5
] employ a layered model to handle
entities nesting, which iteratively utilizes the result of the
previous layer to be further annotated until reaches the
maximum number of iterations or generate no more new
entities. Nevertheless, these models suffer from the prob-
lem of interlayer disarrangement, that is, the model may
output a nested entity from a wrong layer and pass the
error to the subsequent iterations. The main reason for this
phenomenon is that the target layer to generate the nested
entity is determined by its nesting levels rather than its
semantics or structure.
Some other work like [
6
,
7
] identifies nested entities by
enumerating entity proposals. Although these methods
are theoretically perfect, they still confront difficulties in
model training, high complexity, and negative samples.
These obstacles stem from the fact that the enumeration
approach does not take into account the a priori structural
nature of nested entities.
In recent years, graph neural networks have received a lot
of attention. Most early graph neural networks like [
8
] are
homogeneous graphs. But the graphs encountered in prac-
tical applications are generally heterogeneous graphs with
nodes and edges of multiple types. An increasing number
of studies are dedicated to applying graph models in NLP
tasks. Among them, [
9
] introduces a heterogeneous doc-
ument entity graph for multi-hop reading comprehension
containing information at multiple granularities. And [
10
]
proposes a neural network for summary extraction based
on heterogeneous graphs with semantic nodes of different
granularity levels, including sentences.
In this paper, we design a multi-layer decoder for the NER
task. To address the interlayer disarrangement, the model
groups entities directly according to their categories, in-
stead of grouping entities based on the nesting depth. Each
layer individually recognizes entities of the same cate-
gory. This method extends the traditional sequence label-
ing method and eases the problem of nested entities to a
certain extent. Meanwhile, this annotation method can
recognize multi-label entities overlooked by most models
targeting the nested NER task. This nesting scenario is first
mentioned in [
11
], and is very common in some datasets
like [
12
]. In addition, to deal with the case of the nested
entities of the same type, this paper designs an extended la-
beling and decoding scheme that further recognize nested
entities in a single recognition layer. The proposed type-
supervised sequence labeling model can naturally combine
with a heterogeneous graph. For this purpose, we propose
a heterogeneous star graph model.
In summary, the contributions of our work are as follows:
To the best of our knowledge, we are the first to
apply the heterogeneous graph in the NER task.
The proposed graph network efficiently learns the
representation of nodes, which can be smoothly
incorporated with the type-supervised sequence
labeling method. Our model achieved state-of-
the-art performance on flat and nested datasets
1.
We design a stacked star graph topology with type
nodes as the center and text nodes as the planetary
nodes. It greatly facilitates the exchange of local
and global information and implicitly represents
location information. This graph structure also
significantly reduces the computational complex-
ity to
O(tn)
from the
O(n2)
of general attention
mechanisms.
Our graph attention mechanism is proposed for
addressing the specific scenarios in which tra-
ditional graph attention mechanisms fail. The
favorable properties of our attention mechanism
can naturally express the edge orientation.
The proposed type-supervised labeling method
and the corresponding decoding algorithm not
only can recognize vast majority of nested entities
but also cope with the cases neglected by most
nested entity recognition models.
1
Access the code at https://github.com/Rosenberg37/GraphNER
2
Running Title for Header
2 Related Work
2.1 Named Entity Recognition
In recent years, named entity recognition models based
on deep learning have been the main direction of relevant
research. Deep learning approaches enhance the model’s
ability of the feature representation and data fitting by
automatically mining hidden features without human in-
tervention. Models like [
13
] based on recurrent neural
networks and conditional random fields have become the
dominant baseline models.
Transformer proposed in [
14
] comprehensively employs
the attention mechanism to construct an encoder-decoder
framework and shows satisfactory performance in many
NLP tasks. Star Transformer presented in [
15
] discards
the fully connected structure in the original construction
and achieves low computational complexity and implicit
representation of the position information. It’s applied to
the downstream Chinese NER task in [
16
] and obtains
outstanding results. In our work, we extend the star-
connection topology to construct a heterogeneous graph.
Since the classical NER has been comparatively sophisti-
cated, nested entities recognition has gradually become the
research hotspot. Some works like [
17
] deal with nested
entities in layered models. They predict entities in an
inside-to-outside order by dynamically stacked LSTM-
CRF layers. Nevertheless, layered models are burdened
with error propagation caused by identifying entities at
the inaccurate layer. Region-based methods such as [
18
]
identify nested entities by enumerating all possible spans
in text and classifying them. However, these methods suf-
fer from high computational complexity and difficulties in
model training. In this paper, we propose a type-supervised
sequence labeling scheme to resolve these problems.
2.2 Graph Neural Network
Graph Neural Networks like [
19
] can capture dependen-
cies through passing messages between nodes on the graph.
Due to the needs of real-world scenarios, the design and
application of heterogeneous graph neural networks has
attracted extensive interest. [
20
] proposes a graph neural
network based on heterogeneous graph iterations to resolve
the problem of relation extraction in the presence of over-
lap. [
21
] combine the lexicon with GNN and apply it in
Chinese NER.
The employment of graph neural networks in NLP tasks
has been widely explored. In this paper, the types of en-
tities are modeled as nodes on the graph to construct the
heterogeneous graph. We further utilize them in the sub-
sequent sequence labeling. In particular, the specifically
designed topology structure of the graph allows for a re-
duction in computational complexity and an improvement
in the interaction between global and local messages.
3 Task Definition
The goal of the named entity recognition task is to identify
all possible entities in the input sentence. For a given input
sentence
S= [w1, w2, ..., wL]
, where
L
is the length of
the sentence. The entity
x
is defined as a triple
(s, e, t)
,
where
s, e [1, L]
denote the start and end indices of
the entity and
t
stands for the predefined entity category.
With the definition, NER task can be expressed formally
recognize the entity set
X={x1, x2, ..., xM}
existing in
the sentence
S
. We develop the definition of nested entities
in [4] as follows:
Multi-label Entities(ME)
For two entities
x1
and
x2
,
we call them multi-label entities if
(s1=s2)(e1=
e2)(t16=t2), as in Figure 1.
Nested Entities of Same Type(NST)
For two entities
x1
and
x2
, we call them nested entities of same type if
(e1e2s2s1)(t1=t2)
, as in Figure 1. In
particular, if
(e1=e2=s2=s1)(t1=t2)
, then they
are just one entity.
Nested entities of Different Type(NDT)
For two enti-
ties
x1
and
x2
, if
(s1s2e2e1)(t16=t2)
, we
call them nested entities of different type, as in Figure 1.
However, if (s1=s2)(e1=e2), it’s actually ME.
Overlapping Entities of Same Type(OST)
For the case
(e1> e2s1> s2)(t1=t2)
, we call it overlapping
entity of same type, which is not a case addressed in this
paper.
Overlapping Entities of Different Type(ODT)
For the
case
(e1> e2s2> s1)(t16=t2)
, we call it over-
lapping entities of different type. Although our model
does not target this scenario, it is implicitly solved as the
decoding procedure is separated between different entity
types.
In this paper, two entities
x1
and
x2
are considered to be
nested entities only when they are
ME
,
NST
or
NDT
. We
model the NER task as a type-supervised sequence labeling
task and perform it with the fusion of type nodes and
text nodes generated by the heterogeneous graph neural
network.
4 Methodology
This section is going to detail our model. The general
framework is shown in Figure 2 and consists of three main
parts:
Node Representation
Given the input sentence,
the recurrent neural network is used to fuse char-
acters, tokens, words, and part-of-speech annota-
tion embeddings to produce the ultimate context
presentation. The initial representation of the text
3
Running Title for Header
... ...
Heterogeneous Star Graph Network
Char
Embedding
Context Generator
... ...
Conditional Random Field
...
EIO B OS...
EIO B OS...
Token
Embedding
Word
Embedding
POS
Embedding
Types Generator
...
BiGRU
Emission Emission Emission
Figure 2: Overall architecture. In the figure and below, text nodes are represented using blue circles and type nodes
are represented using green circles. Different colors in the Emission module and BIOES annotations indicate the
recognition of corresponding classes of entities.
nodes and type nodes are then generated from the
context representation by linear transformation
and pooling operation.
Heterogeneous Graph
The nodes update with
the iteration of the star heterogeneous atten-
tion graph network. In this paper, we alter the
concatenate-based graph attention mechanisms
and take edge direction into consideration.
Entity Extraction
After getting the representa-
tion of each node, the text nodes are combined
with the type nodes to produce the text represen-
tation under various types. To predict entity col-
lection in the input sentences, we deploy the con-
ditional random field to do the BIOES sequence
labeling on each text representation. The union
of each predicted entity set will be the ultimate
collection of predicted entities.
4.1 Node Representation
The initialization of each node representation is required
before the iteration of the graph neural network. The het-
erogeneous graph in our paper consists of two kinds of
nodes: type nodes and text nodes. The following describes
how to initialize each node’s representation.
4.1.1 Hybrid Embedding
Before initializing the nodes, it is necessary to create the
hidden representation of the context. We use a multi-
granularity hybrid embedding model to produce the context
representation.
Character
The embedded representation of characters
can be formalized as follows.
[hc
1, hc
2, ..., hc
D] = Ec([c1, c2, ..., cD]) (1)
where
ci
is the one-hot code of the characters forming the
word,
D
is the number of characters constituting the word
and
hc
i
is the embedding corresponding to
ci
. The char-
acters’ representations are then combined using recurrent
neural networks and average pooling operation as follows:
hC
i=AvgPool(BiGRU([hc
1, hc
2, ..., hc
D])) (2)
where
GRU
[
22
] is the gated recurrent unit and
hC
iRdC
is the character-level hidden presentation for wi.
Token
The token-level presentation is generated by the
pre-trained language model BERT [
23
] which uses the
Wordpiece partitioning [
24
] to convert the tokens into
subtokens. The subtokens’ representations are average
4
摘要:

TYPE-SUPERVISEDSEQUENCELABELINGBASEDONTHEHETEROGENEOUSSTARGRAPHFORNAMEDENTITYRECOGNITIONXueruWenCollegeofComputerScienceandTechnologyJilinUniversityChangchunwenxr2119@mails.jlu.edu.cnChangjiangZhouCollegeofComputerScienceandTechnologyJilinUniversityChangchunHaotianTangCollegeofComputerScienceandTech...

展开>> 收起<<
TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY RECOGNITION.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:16 页 大小:1.16MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注