TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY RECOGNITION

2025-05-06 0 0 1.16MB 16 页 10玖币

侵权投诉

TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE

HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY

RECOGNITION

Xueru Wen

College of Computer Science and Technology

Jilin University

Changchun

wenxr2119@mails.jlu.edu.cn

Changjiang Zhou

College of Computer Science and Technology

Jilin University

Changchun

Haotian Tang

College of Computer Science and Technology

Jilin University

Changchun

Luguang Liang

College of Computer Science and Technology

Jilin University

Changchun

Yu Jiang

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education

Jilin University

jiangyu2011@jlu.edu.cn

Hong Qi

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education

Jilin University

ABSTRACT

Named entity recognition is a fundamental task in natural language processing, identifying the span

and category of entities in unstructured texts. The traditional sequence labeling methodology ignores

the nested entities, i.e. entities included in other entity mentions. Many approaches attempt to

address this scenario, most of which rely on complex structures or have high computation complexity.

The representation learning of the heterogeneous star graph containing text nodes and type nodes

is investigated in this paper. In addition, we revise the graph attention mechanism into a hybrid

form to address its unreasonableness in speciﬁc topologies. The model performs the type-supervised

sequence labeling after updating nodes in the graph. The annotation scheme is an extension of the

single-layer sequence labeling and is able to cope with the vast majority of nested entities. Extensive

experiments on public NER datasets reveal the effectiveness of our model in extracting both ﬂat and

nested entities. The method achieved state-of-the-art performance on both ﬂat and nested datasets.

The signiﬁcant improvement in accuracy reﬂects the superiority of the multi-layer labeling strategy.

Keywords Named Entity Recognition ·Sequence Labeling ·Heterogeneous Graph

arXiv:2210.10240v2 [cs.CL] 21 Oct 2022

Running Title for Header

1 Introduction

Named Entity Recognition is an essential task in natural

language processing that aims to recognize the boundaries

and types of entities with speciﬁc meanings in the text,

including names of people, places, institutions, etc. The

Named Entity Recognition task is not only a vital tool for

information extraction, but also a crucial component in

many downstream tasks, such as text understanding [1].

Named entity recognition is usually modeled as a sequence

labeling problem and can be efﬁciently solved by an RNN-

based approach [

]. The sequence labeling modeling ap-

proach simpliﬁes the problem based on the assumption

that entities never nested with each other. However, en-

tities may be overlapping or deeply nested in real-world

language environments, as in Figure 1. More and more

studies are exploring modiﬁed models to deal with this

more complex situation.

ME: Chronic diseases identified: Hypertension.

NDT: Cytomegalovirus modulates interleukin-6 gene expression.

DIS

ABBR

NST: Characterization of the human elk-1 promoter.

DNA

PRO

ME: Chronic diseases identified: Hypertension.

NDT: Cytomegalovirus modulates interleukin-6 gene expression.

DIS

ABBR

NST: Characterization of the human elk-1 promoter.

DNA

PRO

Figure 1: Example of entity nesting from GENIA [

] and

Chilean Waiting List [

]. The colored arrows indicate

the category and span of the entities. The bolded black

abbreviations denote the type of entity nesting.

Some works like [

] employ a layered model to handle

entities nesting, which iteratively utilizes the result of the

previous layer to be further annotated until reaches the

maximum number of iterations or generate no more new

entities. Nevertheless, these models suffer from the prob-

lem of interlayer disarrangement, that is, the model may

output a nested entity from a wrong layer and pass the

error to the subsequent iterations. The main reason for this

phenomenon is that the target layer to generate the nested

entity is determined by its nesting levels rather than its

semantics or structure.

Some other work like [

] identiﬁes nested entities by

enumerating entity proposals. Although these methods

are theoretically perfect, they still confront difﬁculties in

model training, high complexity, and negative samples.

These obstacles stem from the fact that the enumeration

approach does not take into account the a priori structural

nature of nested entities.

In recent years, graph neural networks have received a lot

of attention. Most early graph neural networks like [

] are

homogeneous graphs. But the graphs encountered in prac-

tical applications are generally heterogeneous graphs with

nodes and edges of multiple types. An increasing number

of studies are dedicated to applying graph models in NLP

tasks. Among them, [

] introduces a heterogeneous doc-

ument entity graph for multi-hop reading comprehension

containing information at multiple granularities. And [

]

proposes a neural network for summary extraction based

on heterogeneous graphs with semantic nodes of different

granularity levels, including sentences.

In this paper, we design a multi-layer decoder for the NER

task. To address the interlayer disarrangement, the model

groups entities directly according to their categories, in-

stead of grouping entities based on the nesting depth. Each

layer individually recognizes entities of the same cate-

gory. This method extends the traditional sequence label-

ing method and eases the problem of nested entities to a

certain extent. Meanwhile, this annotation method can

recognize multi-label entities overlooked by most models

targeting the nested NER task. This nesting scenario is ﬁrst

mentioned in [

], and is very common in some datasets

like [

]. In addition, to deal with the case of the nested

entities of the same type, this paper designs an extended la-

beling and decoding scheme that further recognize nested

entities in a single recognition layer. The proposed type-

supervised sequence labeling model can naturally combine

with a heterogeneous graph. For this purpose, we propose

a heterogeneous star graph model.

In summary, the contributions of our work are as follows:

•

To the best of our knowledge, we are the ﬁrst to

apply the heterogeneous graph in the NER task.

The proposed graph network efﬁciently learns the

representation of nodes, which can be smoothly

incorporated with the type-supervised sequence

labeling method. Our model achieved state-of-

the-art performance on ﬂat and nested datasets

•

We design a stacked star graph topology with type

nodes as the center and text nodes as the planetary

nodes. It greatly facilitates the exchange of local

and global information and implicitly represents

location information. This graph structure also

signiﬁcantly reduces the computational complex-

ity to

O(tn)

from the

O(n2)

of general attention

mechanisms.

•

Our graph attention mechanism is proposed for

addressing the speciﬁc scenarios in which tra-

ditional graph attention mechanisms fail. The

favorable properties of our attention mechanism

can naturally express the edge orientation.

•

The proposed type-supervised labeling method

and the corresponding decoding algorithm not

only can recognize vast majority of nested entities

but also cope with the cases neglected by most

nested entity recognition models.

Access the code at https://github.com/Rosenberg37/GraphNER

Running Title for Header

2 Related Work

2.1 Named Entity Recognition

In recent years, named entity recognition models based

on deep learning have been the main direction of relevant

research. Deep learning approaches enhance the model’s

ability of the feature representation and data ﬁtting by

automatically mining hidden features without human in-

tervention. Models like [

] based on recurrent neural

networks and conditional random ﬁelds have become the

dominant baseline models.

Transformer proposed in [

] comprehensively employs

the attention mechanism to construct an encoder-decoder

framework and shows satisfactory performance in many

NLP tasks. Star Transformer presented in [

] discards

the fully connected structure in the original construction

and achieves low computational complexity and implicit

representation of the position information. It’s applied to

the downstream Chinese NER task in [

] and obtains

outstanding results. In our work, we extend the star-

connection topology to construct a heterogeneous graph.

Since the classical NER has been comparatively sophisti-

cated, nested entities recognition has gradually become the

research hotspot. Some works like [

] deal with nested

entities in layered models. They predict entities in an

inside-to-outside order by dynamically stacked LSTM-

CRF layers. Nevertheless, layered models are burdened

with error propagation caused by identifying entities at

the inaccurate layer. Region-based methods such as [

]

identify nested entities by enumerating all possible spans

in text and classifying them. However, these methods suf-

fer from high computational complexity and difﬁculties in

model training. In this paper, we propose a type-supervised

sequence labeling scheme to resolve these problems.

2.2 Graph Neural Network

Graph Neural Networks like [

] can capture dependen-

cies through passing messages between nodes on the graph.

Due to the needs of real-world scenarios, the design and

application of heterogeneous graph neural networks has

attracted extensive interest. [

] proposes a graph neural

network based on heterogeneous graph iterations to resolve

the problem of relation extraction in the presence of over-

lap. [

] combine the lexicon with GNN and apply it in

Chinese NER.

The employment of graph neural networks in NLP tasks

has been widely explored. In this paper, the types of en-

tities are modeled as nodes on the graph to construct the

heterogeneous graph. We further utilize them in the sub-

sequent sequence labeling. In particular, the speciﬁcally

designed topology structure of the graph allows for a re-

duction in computational complexity and an improvement

in the interaction between global and local messages.

3 Task Deﬁnition

The goal of the named entity recognition task is to identify

all possible entities in the input sentence. For a given input

sentence

S= [w1, w2, ..., wL]

, where

is the length of

the sentence. The entity

is deﬁned as a triple

(s, e, t)

where

s, e ∈[1, L]

denote the start and end indices of

the entity and

stands for the predeﬁned entity category.

With the deﬁnition, NER task can be expressed formally

recognize the entity set

X={x1, x2, ..., xM}

existing in

the sentence

. We develop the deﬁnition of nested entities

in [4] as follows:

Multi-label Entities(ME)

For two entities

and

we call them multi-label entities if

(s1=s2)∧(e1=

e2)∧(t16=t2), as in Figure 1.

Nested Entities of Same Type(NST)

For two entities

and

, we call them nested entities of same type if

(e1≥e2≥s2≥s1)∧(t1=t2)

, as in Figure 1. In

particular, if

(e1=e2=s2=s1)∧(t1=t2)

, then they

are just one entity.

Nested entities of Different Type(NDT)

For two enti-

ties

and

, if

(s1≥s2≥e2≥e1)∧(t16=t2)

, we

call them nested entities of different type, as in Figure 1.

However, if (s1=s2)∧(e1=e2), it’s actually ME.

Overlapping Entities of Same Type(OST)

For the case

(e1> e2≥s1> s2)∧(t1=t2)

, we call it overlapping

entity of same type, which is not a case addressed in this

paper.

Overlapping Entities of Different Type(ODT)

For the

case

(e1> e2≥s2> s1)∧(t16=t2)

, we call it over-

lapping entities of different type. Although our model

does not target this scenario, it is implicitly solved as the

decoding procedure is separated between different entity

types.

In this paper, two entities

and

are considered to be

nested entities only when they are

NST

NDT

. We

model the NER task as a type-supervised sequence labeling

task and perform it with the fusion of type nodes and

text nodes generated by the heterogeneous graph neural

network.

4 Methodology

This section is going to detail our model. The general

framework is shown in Figure 2 and consists of three main

parts:

•Node Representation

Given the input sentence,

the recurrent neural network is used to fuse char-

acters, tokens, words, and part-of-speech annota-

tion embeddings to produce the ultimate context

presentation. The initial representation of the text

Running Title for Header

... ...

Heterogeneous Star Graph Network

Char

Embedding

Context Generator

... ...

…

Conditional Random Field

...

EIO B OS...

Token

Embedding

Word

Embedding

POS

Embedding

Types Generator

...

BiGRU

Emission Emission Emission

Figure 2: Overall architecture. In the ﬁgure and below, text nodes are represented using blue circles and type nodes

are represented using green circles. Different colors in the Emission module and BIOES annotations indicate the

recognition of corresponding classes of entities.

nodes and type nodes are then generated from the

context representation by linear transformation

and pooling operation.

•Heterogeneous Graph

The nodes update with

the iteration of the star heterogeneous atten-

tion graph network. In this paper, we alter the

concatenate-based graph attention mechanisms

and take edge direction into consideration.

•Entity Extraction

After getting the representa-

tion of each node, the text nodes are combined

with the type nodes to produce the text represen-

tation under various types. To predict entity col-

lection in the input sentences, we deploy the con-

ditional random ﬁeld to do the BIOES sequence

labeling on each text representation. The union

of each predicted entity set will be the ultimate

collection of predicted entities.

4.1 Node Representation

The initialization of each node representation is required

before the iteration of the graph neural network. The het-

erogeneous graph in our paper consists of two kinds of

nodes: type nodes and text nodes. The following describes

how to initialize each node’s representation.

4.1.1 Hybrid Embedding

Before initializing the nodes, it is necessary to create the

hidden representation of the context. We use a multi-

granularity hybrid embedding model to produce the context

representation.

Character

The embedded representation of characters

can be formalized as follows.

[hc

1, hc

2, ..., hc

D] = Ec([c1, c2, ..., cD]) (1)

where

is the one-hot code of the characters forming the

word,

is the number of characters constituting the word

and

is the embedding corresponding to

. The char-

acters’ representations are then combined using recurrent

neural networks and average pooling operation as follows:

i=AvgPool(BiGRU([hc

1, hc

2, ..., hc

D])) (2)

where

GRU

[

] is the gated recurrent unit and

i∈RdC

is the character-level hidden presentation for wi.

Token

The token-level presentation is generated by the

pre-trained language model BERT [

] which uses the

Wordpiece partitioning [

] to convert the tokens into

subtokens. The subtokens’ representations are average

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TYPE-SUPERVISEDSEQUENCELABELINGBASEDONTHEHETEROGENEOUSSTARGRAPHFORNAMEDENTITYRECOGNITIONXueruWenCollegeofComputerScienceandTechnologyJilinUniversityChangchunwenxr2119@mails.jlu.edu.cnChangjiangZhouCollegeofComputerScienceandTechnologyJilinUniversityChangchunHaotianTangCollegeofComputerScienceandTech...

展开>> 收起<<

TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY RECOGNITION.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TYPE-SUPERVISED SEQUENCE LABELING BASED ON THE HETEROGENEOUS STAR GRAPH FOR NAMED ENTITY RECOGNITION

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: