SCIENCE CHINA Information Sciences .RESEARCH PAPER .

2025-05-03 1 0 924.45KB 14 页 10玖币

侵权投诉

SCIENCE CHINA

Information Sciences

.RESEARCH PAPER .

Span-based joint entity and relation extraction

augmented with sequence tagging mechanism

Bin JI†, Shasha LI†, Hao XU*, Jie YU, Jun MA, Huijun LIU*& Jing YANG

College of Computer, National University of Defense Technology, Changsha 410073, China

Abstract Span-based joint extraction simultaneously conducts named entity recognition (NER) and re-

lation extraction (RE) in text span form. However, since previous span-based models rely on span-level

classiﬁcations, they cannot beneﬁt from token-level label information, which has been proven advantageous

for the task. In this paper, we propose a Sequence Tagging augmented Span-based Network (STSN), a

span-based joint model that can make use of token-level label information. In STSN, we construct a core

neural architecture by deep stacking multiple attention layers, each of which consists of three basic attention

units. On the one hand, the core architecture enables our model to learn token-level label information via the

sequence tagging mechanism and then uses the information in the span-based joint extraction; on the other

hand, it establishes a bi-directional information interaction between NER and RE. Experimental results on

three benchmark datasets show that STSN consistently outperforms the strongest baselines in terms of F1,

creating new state-of-the-art results.

Keywords joint extraction, named entity recognition, relation extraction, span, sequence tagging mecha-

nism

Citation

1 Introduction

The joint entity and relation extraction task extracts both entities and semantic relations between entities

from raw texts. It acts as a stepping stone for a variety of downstream NLP tasks [1], such as question

answering. According to the classiﬁcation methods, we divide the existing models for the task into two

categories: sequence tagging-based models [2–5] and span-based models [6–10]. The former is based on

the sequence tagging mechanism and performs token-level classiﬁcations. The latter is based on the span-

based paradigm and performs span-level classiﬁcations. Since the sequence tagging mechanism and the

span-based paradigm are considered to be distinct methodologies, existing joint extraction models permit

the use of just one of them. Speciﬁcally, the span-based paradigm consists of three typical steps: it ﬁrst

splits raw texts into text spans (a.k.a. candidate entities), such as the “Jack” and “Harvard University”

in Figure 1; it then constructs ordered span pairs (a.k.a. candidate relation tuples), such as the <“Jack”,

“Harvard University”>and <“Harvard University”, “Jack”>; and ﬁnally, it jointly classiﬁes spans and

span pairs. For example, it classiﬁes the “Jack” and “Harvard University” into PER and ORG, respectively.

And it classiﬁes the <“Jack”, “Harvard University”>and <“Harvard University”, “Jack”>into WORK

and NoneType, respectively.1)

The majority of span-based models [7,8,10] use Pre-trained Language Models (PLMs) as their encoders

directly, which relies on the encoding ability of PLMs heavily, resulting in insuﬃcient span semantic repre-

sentations and poor model performance. To alleviate this problem, some span-based models [11,12] make

attempts to incorporate other related NLP tasks into this task, such as event detection and coreference

resolution. By using carefully designed neural architectures, these models enable span semantic represen-

tation to incorporate information shared from the added tasks. However, these additional tasks require

* Corresponding author (email: xuhao@nudt.edu.cn, liuhuijun@nudt.edu.cn)

†Bin Ji and Shasha Li have the same contribution to this work.

1) The span-based paradigm assigns the NoneType to spans that are not entities, as well as span pairs that do not hold relations.

arXiv:2210.12720v1 [cs.CL] 23 Oct 2022

Sci China Inf Sci 2

B-PER

O O B-ORG I-ORG O O B-ORG I-ORG I-ORG O

Jack

PER

taught at Harvard University

ORG

and the National War College

ORG

WORK

Text

Token-level

labels

Figure 1 A span-based joint extraction example, which contains three gold entities and two gold relations. Tokens in shade are

span examples, PER and ORG are entity types, WORK is a relation type. We also label the text with token-level labels via the sequence

tagging mechanism, such as B-PER,B-ORG etc.

extra data annotations such as event annotations, which are inaccessible in most datasets for the task,

such as SciERC [6], DocRED [13], TACRED [14], NYT [15], WebNLG [16], SemEval [17], CoNLL04 [18],

and ADE [19] etc.

Previous sequence tagging-based joint models [2, 4, 20, 21] demonstrate that token-level labels convey

critical information, which can be used to compensate for span-level semantic representations. For ex-

ample, if a span-based model is aware that the “Jack” is a person entity (labeled with the PER label)

and the “Harvard University” is an organization entity (labeled with the ORG label) beforehand, it may

readily infer that they have a WORK relation. Unfortunately, as far as we know, existing span-based models

neglect this critical information due to their inability to produce token-level labels. Additionally, existing

sequence tagging-based models establish a unidirectional information ﬂow from NER to RE by using the

token-level label information in the relation classiﬁcation, hence enhancing information sharing. Due to

the lack of token-level labels, previous span-based models are unable to build such an information ﬂow,

let alone a more eﬀective bi-directional information interaction.

In this paper, we explore using the token-level label information in the span-based joint extraction,

aiming to improve the performance of the span-based joint extraction. To this end, we propose a Sequence

Tagging augmented Span-based Network (STSN) where the core module is a carefully designed neural ar-

chitecture, which is achieved by deep stacking multiple attention layers. Speciﬁcally, the core architecture

ﬁrst learns three types of semantic representations: label representations for classifying token-level la-

bels, and token representations for span-based NER and RE, respectively; it then establishes information

interactions among the three learned representations. As a result, the two types of token representations

can fully incorporate label information. Thus, span representations constructed with the above token

representations are also enriched with label information. Additionally, the core architecture enables our

model to build an eﬀective bi-directional information interaction between NER and RE.

For the above purposes, each attention layer of the core architecture consists of three basic attention

units: (1) Entity&Relation to Label Attention (E&R-L-A) enables label representations to attend to the

two types of token representations. The reason for doing this is two-fold: one is that E&R-L-A enables

label representations to incorporate task-speciﬁc information eﬀectively; the other is that E&R-L-A is

essential to construct the bi-directional information interaction between NER and RE. (2) Label to Entity

Attention (L-E-A) enables token representations for NER to attend to label representations with the

goal of enriching the token representations with label information. (3) Label to Relation Attention (L-

R-A) enables token representations for RE to attend to label representations with the goal of enriching

the token representations with label information. In addition, we establish the bi-directional information

interaction by taking the label representation as a medium, enabling the two types of token representations

to attend to each other. We have validated the eﬀectiveness of the bi-directional information interaction

in Section 4.4.2. Moreover, to enable STSN to use token-level label information of overlapping entities,

we extend the BIO tagging scheme and discuss more details in Section 4.1.2.

In STSN, aiming to train token-level label information in a supervised way, we add a sequence tagging-

based NER decoder to the span-based model. And we use entities and relations extracted by the span-

based model to evaluate the model performance. Experimental results on ACE05, CoNLL04, and ADE

demonstrate that STSN consistently outperforms the strongest baselines in terms of F1, creating new

state-of-the-art performance.2)

In sum, we summarize the contributions as follows: (1) We propose an eﬀective method to augment

the span-based joint entity and relation extraction model with the sequence tagging mechanism. (2) We

carefully design the deep-stacked attention layers, enabling the span-based model to use token-level label

information and establish a bi-directional information interaction between NER and RE. (3) Experimental

results on three datasets demonstrate that STSN creates new state-of-the-art results.

2) For reproducibility, our code for this paper will be publicly available at https://github.com/jibin/STSN.

Sci China Inf Sci 3

2 Related work

2.1 Span-based joint extraction

Models for span-based joint entity and relation extraction have been widely studied. Luan et al. [6]

propose almost the ﬁrst published span-based model, which is drawn from two models for coreference

resolution [22] and semantic role labeling [23], respectively. With the advent of Pre-trained Language

Models (PLMs), span-based models directly take PLMs as their encoders, such as Dixit and Al-Onaizan [7]

propose a span-based model which takes ELMo [24] as the encoder; Eberts and Ulges [8] propose SpERT,

which takes BERT [25] as the encoder; Zhong and Chen [10] propose PURE which takes ALBERT [26] as

the encoder. However, these models rely heavily on the encoding ability of PLMs, leading to insuﬃcient

span semantic representations and ﬁnally resulting in poor model performance. Some models [11,12] make

attempts to alleviate this issue by adding additional NLP tasks to the task, such as coreference resolution

or event detection. These models enable span semantic representations to incorporate information derived

from the added tasks through complicated neural architectures. However, the added tasks need extra data

annotations (such as event annotations are required in joint entity-relation extraction datasets), which are

unavailable in most cases. Compared to these models, our model enriches span semantic representations

with token-level label information without additional data annotations.

2.2 Token-level label

Numerous work has demonstrated that token-level label information beneﬁts the joint extraction task a

lot. For example, the models reported in the literature [2–4, 20] train ﬁxed-size semantic representations

for token-level labels and use them in relation classiﬁcation by concatenating them to relation semantic

representations, delivering promising performance gains. However, Zhao et al. [21] demonstrate that

the above shallow semantic concatenation cannot make full use of the label information. Therefore,

they carefully design a deep neural architecture to capture ﬁne-grained token-label interactions and

deep infuse token-level label information into token semantic representations, delivering more promising

performance gains. Unfortunately, previous span-based joint extraction models cannot beneﬁt from the

token-level label information since they completely give up the sequence tagging mechanism. In contrast,

we propose a sequence tagging augmented span-based joint extraction model, which generates token-level

label information via the sequence tagging mechanism and further infuses the information into token

semantic representations via deep infusion.

3 Approach

In this section, we will describe the Sequence Tagging augmented Span-based Network (STSN) in detail.

As Figure 2 shows, STSN consists of three components: a BERT-based embedding layer, an encoder

composed of deep-stacked attention layers, and three separate linear decoders for sequence tagging-based

NER, span-based NER and span-based RE, respectively.

3.1 Embedding layer

In STSN, we use BERT [25] as the default embedding generator. For a given text T= (t1, t2, t3, ..., tn)

where tidenotes the i-th token, BERT ﬁrst tokenizes it with the WordPiece vocabulary [27] to obtain

an input sequence. For each element of the sequence, its representation is the element-wise addition of

WordPiece embedding, positional embedding, and segment embedding. Then a list of input embeddings

H∈Rlen∗hid are obtained, where len is the sequence length and hid is the size of hidden units. A series

of pre-trained Transformer [28] blocks are then used to project Hinto a BERT embedding sequence

(denoted as ET):

ET={e1,e2,e3, ..., elen}.(1)

BERT may tokenize one token into several sub-tokens to alleviate the Out-of-Vocabulary (OOV) prob-

lem, leading to that Tcannot align with ET, i.e., n6=len. To achieve alignment, we propose an Align

Module, which applies the max-pooling function to the BERT embeddings of tokenized sub-tokens to

obtain token embeddings. We denote the aligned embedding sequence for Tas:

ET={ˆ

e1,ˆ

e2,ˆ

e3, ..., ˆ

en},(2)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SCIENCECHINAInformationSciences.RESEARCHPAPER.Span-basedjointentityandrelationextractionaugmentedwithsequencetaggingmechanismBinJIy,ShashaLIy,HaoXU*,JieYU,JunMA,HuijunLIU*&JingYANGCollegeofComputer,NationalUniversityofDefenseTechnology,Changsha410073,ChinaAbstractSpan-basedjointextractionsimultaneou...

展开>> 收起<<

SCIENCE CHINA Information Sciences .RESEARCH PAPER ..pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SCIENCE CHINA Information Sciences .RESEARCH PAPER .

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: