
Sci China Inf Sci 2
B-PER
O O B-ORG I-ORG O O B-ORG I-ORG I-ORG O
Jack
PER
taught at Harvard University
ORG
and the National War College
ORG
.
WORK
WORK
Text
Token-level
labels
Figure 1 A span-based joint extraction example, which contains three gold entities and two gold relations. Tokens in shade are
span examples, PER and ORG are entity types, WORK is a relation type. We also label the text with token-level labels via the sequence
tagging mechanism, such as B-PER,B-ORG etc.
extra data annotations such as event annotations, which are inaccessible in most datasets for the task,
such as SciERC [6], DocRED [13], TACRED [14], NYT [15], WebNLG [16], SemEval [17], CoNLL04 [18],
and ADE [19] etc.
Previous sequence tagging-based joint models [2, 4, 20, 21] demonstrate that token-level labels convey
critical information, which can be used to compensate for span-level semantic representations. For ex-
ample, if a span-based model is aware that the “Jack” is a person entity (labeled with the PER label)
and the “Harvard University” is an organization entity (labeled with the ORG label) beforehand, it may
readily infer that they have a WORK relation. Unfortunately, as far as we know, existing span-based models
neglect this critical information due to their inability to produce token-level labels. Additionally, existing
sequence tagging-based models establish a unidirectional information flow from NER to RE by using the
token-level label information in the relation classification, hence enhancing information sharing. Due to
the lack of token-level labels, previous span-based models are unable to build such an information flow,
let alone a more effective bi-directional information interaction.
In this paper, we explore using the token-level label information in the span-based joint extraction,
aiming to improve the performance of the span-based joint extraction. To this end, we propose a Sequence
Tagging augmented Span-based Network (STSN) where the core module is a carefully designed neural ar-
chitecture, which is achieved by deep stacking multiple attention layers. Specifically, the core architecture
first learns three types of semantic representations: label representations for classifying token-level la-
bels, and token representations for span-based NER and RE, respectively; it then establishes information
interactions among the three learned representations. As a result, the two types of token representations
can fully incorporate label information. Thus, span representations constructed with the above token
representations are also enriched with label information. Additionally, the core architecture enables our
model to build an effective bi-directional information interaction between NER and RE.
For the above purposes, each attention layer of the core architecture consists of three basic attention
units: (1) Entity&Relation to Label Attention (E&R-L-A) enables label representations to attend to the
two types of token representations. The reason for doing this is two-fold: one is that E&R-L-A enables
label representations to incorporate task-specific information effectively; the other is that E&R-L-A is
essential to construct the bi-directional information interaction between NER and RE. (2) Label to Entity
Attention (L-E-A) enables token representations for NER to attend to label representations with the
goal of enriching the token representations with label information. (3) Label to Relation Attention (L-
R-A) enables token representations for RE to attend to label representations with the goal of enriching
the token representations with label information. In addition, we establish the bi-directional information
interaction by taking the label representation as a medium, enabling the two types of token representations
to attend to each other. We have validated the effectiveness of the bi-directional information interaction
in Section 4.4.2. Moreover, to enable STSN to use token-level label information of overlapping entities,
we extend the BIO tagging scheme and discuss more details in Section 4.1.2.
In STSN, aiming to train token-level label information in a supervised way, we add a sequence tagging-
based NER decoder to the span-based model. And we use entities and relations extracted by the span-
based model to evaluate the model performance. Experimental results on ACE05, CoNLL04, and ADE
demonstrate that STSN consistently outperforms the strongest baselines in terms of F1, creating new
state-of-the-art performance.2)
In sum, we summarize the contributions as follows: (1) We propose an effective method to augment
the span-based joint entity and relation extraction model with the sequence tagging mechanism. (2) We
carefully design the deep-stacked attention layers, enabling the span-based model to use token-level label
information and establish a bi-directional information interaction between NER and RE. (3) Experimental
results on three datasets demonstrate that STSN creates new state-of-the-art results.
2) For reproducibility, our code for this paper will be publicly available at https://github.com/jibin/STSN.