SCIENCE CHINA Information Sciences .RESEARCH PAPER .

2025-05-03 0 0 924.45KB 14 页 10玖币
侵权投诉
SCIENCE CHINA
Information Sciences
.RESEARCH PAPER .
Span-based joint entity and relation extraction
augmented with sequence tagging mechanism
Bin JI, Shasha LI, Hao XU*, Jie YU, Jun MA, Huijun LIU*& Jing YANG
College of Computer, National University of Defense Technology, Changsha 410073, China
Abstract Span-based joint extraction simultaneously conducts named entity recognition (NER) and re-
lation extraction (RE) in text span form. However, since previous span-based models rely on span-level
classifications, they cannot benefit from token-level label information, which has been proven advantageous
for the task. In this paper, we propose a Sequence Tagging augmented Span-based Network (STSN), a
span-based joint model that can make use of token-level label information. In STSN, we construct a core
neural architecture by deep stacking multiple attention layers, each of which consists of three basic attention
units. On the one hand, the core architecture enables our model to learn token-level label information via the
sequence tagging mechanism and then uses the information in the span-based joint extraction; on the other
hand, it establishes a bi-directional information interaction between NER and RE. Experimental results on
three benchmark datasets show that STSN consistently outperforms the strongest baselines in terms of F1,
creating new state-of-the-art results.
Keywords joint extraction, named entity recognition, relation extraction, span, sequence tagging mecha-
nism
Citation
1 Introduction
The joint entity and relation extraction task extracts both entities and semantic relations between entities
from raw texts. It acts as a stepping stone for a variety of downstream NLP tasks [1], such as question
answering. According to the classification methods, we divide the existing models for the task into two
categories: sequence tagging-based models [2–5] and span-based models [6–10]. The former is based on
the sequence tagging mechanism and performs token-level classifications. The latter is based on the span-
based paradigm and performs span-level classifications. Since the sequence tagging mechanism and the
span-based paradigm are considered to be distinct methodologies, existing joint extraction models permit
the use of just one of them. Specifically, the span-based paradigm consists of three typical steps: it first
splits raw texts into text spans (a.k.a. candidate entities), such as the “Jack” and “Harvard University”
in Figure 1; it then constructs ordered span pairs (a.k.a. candidate relation tuples), such as the <“Jack”,
“Harvard University”>and <“Harvard University”, “Jack”>; and finally, it jointly classifies spans and
span pairs. For example, it classifies the “Jack” and “Harvard University” into PER and ORG, respectively.
And it classifies the <“Jack”, “Harvard University”>and <“Harvard University”, “Jack”>into WORK
and NoneType, respectively.1)
The majority of span-based models [7,8,10] use Pre-trained Language Models (PLMs) as their encoders
directly, which relies on the encoding ability of PLMs heavily, resulting in insufficient span semantic repre-
sentations and poor model performance. To alleviate this problem, some span-based models [11,12] make
attempts to incorporate other related NLP tasks into this task, such as event detection and coreference
resolution. By using carefully designed neural architectures, these models enable span semantic represen-
tation to incorporate information shared from the added tasks. However, these additional tasks require
* Corresponding author (email: xuhao@nudt.edu.cn, liuhuijun@nudt.edu.cn)
Bin Ji and Shasha Li have the same contribution to this work.
1) The span-based paradigm assigns the NoneType to spans that are not entities, as well as span pairs that do not hold relations.
arXiv:2210.12720v1 [cs.CL] 23 Oct 2022
Sci China Inf Sci 2
B-PER
O O B-ORG I-ORG O O B-ORG I-ORG I-ORG O
Jack
PER
taught at Harvard University
ORG
and the National War College
ORG
.
WORK
WORK
Text
Token-level
labels
Figure 1 A span-based joint extraction example, which contains three gold entities and two gold relations. Tokens in shade are
span examples, PER and ORG are entity types, WORK is a relation type. We also label the text with token-level labels via the sequence
tagging mechanism, such as B-PER,B-ORG etc.
extra data annotations such as event annotations, which are inaccessible in most datasets for the task,
such as SciERC [6], DocRED [13], TACRED [14], NYT [15], WebNLG [16], SemEval [17], CoNLL04 [18],
and ADE [19] etc.
Previous sequence tagging-based joint models [2, 4, 20, 21] demonstrate that token-level labels convey
critical information, which can be used to compensate for span-level semantic representations. For ex-
ample, if a span-based model is aware that the “Jack” is a person entity (labeled with the PER label)
and the “Harvard University” is an organization entity (labeled with the ORG label) beforehand, it may
readily infer that they have a WORK relation. Unfortunately, as far as we know, existing span-based models
neglect this critical information due to their inability to produce token-level labels. Additionally, existing
sequence tagging-based models establish a unidirectional information flow from NER to RE by using the
token-level label information in the relation classification, hence enhancing information sharing. Due to
the lack of token-level labels, previous span-based models are unable to build such an information flow,
let alone a more effective bi-directional information interaction.
In this paper, we explore using the token-level label information in the span-based joint extraction,
aiming to improve the performance of the span-based joint extraction. To this end, we propose a Sequence
Tagging augmented Span-based Network (STSN) where the core module is a carefully designed neural ar-
chitecture, which is achieved by deep stacking multiple attention layers. Specifically, the core architecture
first learns three types of semantic representations: label representations for classifying token-level la-
bels, and token representations for span-based NER and RE, respectively; it then establishes information
interactions among the three learned representations. As a result, the two types of token representations
can fully incorporate label information. Thus, span representations constructed with the above token
representations are also enriched with label information. Additionally, the core architecture enables our
model to build an effective bi-directional information interaction between NER and RE.
For the above purposes, each attention layer of the core architecture consists of three basic attention
units: (1) Entity&Relation to Label Attention (E&R-L-A) enables label representations to attend to the
two types of token representations. The reason for doing this is two-fold: one is that E&R-L-A enables
label representations to incorporate task-specific information effectively; the other is that E&R-L-A is
essential to construct the bi-directional information interaction between NER and RE. (2) Label to Entity
Attention (L-E-A) enables token representations for NER to attend to label representations with the
goal of enriching the token representations with label information. (3) Label to Relation Attention (L-
R-A) enables token representations for RE to attend to label representations with the goal of enriching
the token representations with label information. In addition, we establish the bi-directional information
interaction by taking the label representation as a medium, enabling the two types of token representations
to attend to each other. We have validated the effectiveness of the bi-directional information interaction
in Section 4.4.2. Moreover, to enable STSN to use token-level label information of overlapping entities,
we extend the BIO tagging scheme and discuss more details in Section 4.1.2.
In STSN, aiming to train token-level label information in a supervised way, we add a sequence tagging-
based NER decoder to the span-based model. And we use entities and relations extracted by the span-
based model to evaluate the model performance. Experimental results on ACE05, CoNLL04, and ADE
demonstrate that STSN consistently outperforms the strongest baselines in terms of F1, creating new
state-of-the-art performance.2)
In sum, we summarize the contributions as follows: (1) We propose an effective method to augment
the span-based joint entity and relation extraction model with the sequence tagging mechanism. (2) We
carefully design the deep-stacked attention layers, enabling the span-based model to use token-level label
information and establish a bi-directional information interaction between NER and RE. (3) Experimental
results on three datasets demonstrate that STSN creates new state-of-the-art results.
2) For reproducibility, our code for this paper will be publicly available at https://github.com/jibin/STSN.
Sci China Inf Sci 3
2 Related work
2.1 Span-based joint extraction
Models for span-based joint entity and relation extraction have been widely studied. Luan et al. [6]
propose almost the first published span-based model, which is drawn from two models for coreference
resolution [22] and semantic role labeling [23], respectively. With the advent of Pre-trained Language
Models (PLMs), span-based models directly take PLMs as their encoders, such as Dixit and Al-Onaizan [7]
propose a span-based model which takes ELMo [24] as the encoder; Eberts and Ulges [8] propose SpERT,
which takes BERT [25] as the encoder; Zhong and Chen [10] propose PURE which takes ALBERT [26] as
the encoder. However, these models rely heavily on the encoding ability of PLMs, leading to insufficient
span semantic representations and finally resulting in poor model performance. Some models [11,12] make
attempts to alleviate this issue by adding additional NLP tasks to the task, such as coreference resolution
or event detection. These models enable span semantic representations to incorporate information derived
from the added tasks through complicated neural architectures. However, the added tasks need extra data
annotations (such as event annotations are required in joint entity-relation extraction datasets), which are
unavailable in most cases. Compared to these models, our model enriches span semantic representations
with token-level label information without additional data annotations.
2.2 Token-level label
Numerous work has demonstrated that token-level label information benefits the joint extraction task a
lot. For example, the models reported in the literature [2–4, 20] train fixed-size semantic representations
for token-level labels and use them in relation classification by concatenating them to relation semantic
representations, delivering promising performance gains. However, Zhao et al. [21] demonstrate that
the above shallow semantic concatenation cannot make full use of the label information. Therefore,
they carefully design a deep neural architecture to capture fine-grained token-label interactions and
deep infuse token-level label information into token semantic representations, delivering more promising
performance gains. Unfortunately, previous span-based joint extraction models cannot benefit from the
token-level label information since they completely give up the sequence tagging mechanism. In contrast,
we propose a sequence tagging augmented span-based joint extraction model, which generates token-level
label information via the sequence tagging mechanism and further infuses the information into token
semantic representations via deep infusion.
3 Approach
In this section, we will describe the Sequence Tagging augmented Span-based Network (STSN) in detail.
As Figure 2 shows, STSN consists of three components: a BERT-based embedding layer, an encoder
composed of deep-stacked attention layers, and three separate linear decoders for sequence tagging-based
NER, span-based NER and span-based RE, respectively.
3.1 Embedding layer
In STSN, we use BERT [25] as the default embedding generator. For a given text T= (t1, t2, t3, ..., tn)
where tidenotes the i-th token, BERT first tokenizes it with the WordPiece vocabulary [27] to obtain
an input sequence. For each element of the sequence, its representation is the element-wise addition of
WordPiece embedding, positional embedding, and segment embedding. Then a list of input embeddings
HRlenhid are obtained, where len is the sequence length and hid is the size of hidden units. A series
of pre-trained Transformer [28] blocks are then used to project Hinto a BERT embedding sequence
(denoted as ET):
ET={e1,e2,e3, ..., elen}.(1)
BERT may tokenize one token into several sub-tokens to alleviate the Out-of-Vocabulary (OOV) prob-
lem, leading to that Tcannot align with ET, i.e., n6=len. To achieve alignment, we propose an Align
Module, which applies the max-pooling function to the BERT embeddings of tokenized sub-tokens to
obtain token embeddings. We denote the aligned embedding sequence for Tas:
ˆ
ET={ˆ
e1,ˆ
e2,ˆ
e3, ..., ˆ
en},(2)
摘要:

SCIENCECHINAInformationSciences.RESEARCHPAPER.Span-basedjointentityandrelationextractionaugmentedwithsequencetaggingmechanismBinJIy,ShashaLIy,HaoXU*,JieYU,JunMA,HuijunLIU*&JingYANGCollegeofComputer,NationalUniversityofDefenseTechnology,Changsha410073,ChinaAbstractSpan-basedjointextractionsimultaneou...

展开>> 收起<<
SCIENCE CHINA Information Sciences .RESEARCH PAPER ..pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:924.45KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注