
mentions. Our study continues this line of work,
incorporating modern neural methods to measure
semantic similarity and adding novel heuristics to
improve candidate filtering and collective disam-
biguation.
We introduce NICE (NER
1
-enhanced Iterative
Combination of Entities), a combined entity dis-
ambiguation algorithm designed to tackle the chal-
lenge of entity overshadowing by focusing on three
aspects of context-based information: entity types,
entity-context similarity and entity coherence. The
pipeline of NICE includes a NER-enhanced candi-
date filtering module designed to improve robust-
ness on overshadowed entities (Section 2.1), a pre-
scoring module that calculates semantic similarity
between a candidate entity and a mention in con-
text, and an unsupervised iterative disambiguation
algorithm that maximises entity coherence (Section
2.3), combining the relatedness scores between can-
didate entities with the scores of the semantic simi-
larity module (Sections 2.3-2.4). To the best of our
knowledge, our study is the first attempt to build an
entity disambiguation method designed specifically
to tackle the problem of entity overshadowing.
We perform a systematic evaluation of the NICE
method, and use our experimental results to answer
the following research questions:
RQ1:
Does focusing on context information im-
prove ED performance on overshadowed entities?
RQ2:
Does focusing on context information in-
stead of relying on mention-entity priors in ED al-
low to maintain competitive performance on more
frequent entities?
RQ3:
In what ways do the different aspects of
context information contribute to ED performance
on overshadowed entities?
We hope that our work will encourage further
studies concerning overshadowed entities. The
source code of the NICE method is provided as
supplementary material and will be released pub-
licly upon acceptance.
2 The NICE method
Our method is based on the assumption that the
main challenge in disambiguating overshadowed
entities stems from over-relying on entity common-
ness, and therefore switching the focus to the con-
text (entity relatedness) can improve the perfor-
mance. We consider three main ways of extracting
1Named Entity Recognition (Yadav and Bethard,2018)
information from the context: (1) using mention-
entity similarity to predict entity types and improve
candidate filtering, (2) using word embeddings en-
hanced with entity types to measure semantic sim-
ilarity between an entity and its context, and (3)
using entity-entity similarity to make sure that the
entity disambiguating decisions within one docu-
ment are coherent (collective disambiguation).
2.1 Candidate filtering
Adding the step of filtering candidate entities
before disambiguation brings the benefits of re-
duced inference time and potential improvements
in accuracy. To perform this step in the NICE
method, we follow the work of Tedeschi et al.
(2021) by using entity type information. Given
an entity mention
m
surrounded by textual context
(contlef t, contright)
and a list of candidate entities
cands ={e1, . . . ., en}
, we use a NER classifier to
predict the top-k possible entity types of
m
. Then,
we discard all candidate entities that have an entity
type not matching any of these kclasses:
candsfiltered ={ei:type(ei)∈ˆ
T|ei∈cands},
where
ˆ
T
is the set of top-k predicted entity types. If
the confidence score of the NER classifier is above
a threshold value
t
, only one class is used instead
of
k
. In the current setup of the NICE method, the
number of top predicted classes is
k= 3
and the
confidence threshold value is
t= 1
, which means
that the classifier always outputs the top-3 entity
classes. Figure 2shows an example of NER-based
candidate filtering.
To obtain the entity types for the candidates, we
use the Wiki2NER dictionary provided by Tedeschi
et al. (2021)
2
. Then, instead of using the NER clas-
sifier as provided by Tedeschi et al. (2021), which
has been trained only on the AIDA training set and
therefore may be biased towards frequent entities
as well, we introduce a refined version of it, which
is more robust to overshadowing. Specifically, we
filter the training set of BLINK (Wu et al.,2020)
3
by discarding the entries where the ground truth
answer has the highest popularity score among all
candidate entities. Then, we use the 2M remaining
data entries to fine-tune the classifier. The moti-
vation behind fine-tuning the classifier rather than
training it from scratch is to achieve an improve-
ment in recognising overshadowed entities without
2https://github.com/Babelscape/ner4el
3
BLINK is a dataset for ED consisting of 9M entries ex-
tracted from Wikipedia.