Entity Disambiguation with Entity Definitions Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2 Sapienza NLP Group

2025-04-24 0 0 626.93KB 6 页 10玖币
侵权投诉
Entity Disambiguation with Entity Definitions
Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2
Sapienza NLP Group
Sapienza University of Rome
1{lastname}@di.uniroma1.it
2navigli@diag.uniroma1.it
Abstract
Local models have recently attained astound-
ing performances in Entity Disambiguation
(ED), with generative and extractive formula-
tions being the most promising research di-
rections. However, previous works limited
their studies to using, as the textual represen-
tation of each candidate, only its Wikipedia
title. Although certainly effective, this strat-
egy presents a few critical issues, especially
when titles are not sufficiently informative or
distinguishable from one another. In this pa-
per, we address this limitation and investigate
to what extent more expressive textual rep-
resentations can mitigate it. We thoroughly
evaluate our approach against standard bench-
marks in ED and find extractive formulations
to be particularly well-suited to these represen-
tations: we report a new state of the art on 2
out of 6benchmarks we consider and strongly
improve the generalization capability over un-
seen patterns. We release our code, data and
model checkpoints at https://github.com/
SapienzaNLP/extend.
1 Introduction
Being able to pair a mention in a given text with its
correct entity out of a set of candidates is a crucial
problem in Natural Language Processing (NLP),
referred to as Entity Disambiguation (Bunescu and
Pa¸sca,2006, ED). Indeed, since ED enables the
identification of the actors involved in human lan-
guage, it is often considered a necessary build-
ing block for a wide range of downstream appli-
cations, including Information Extraction (Ji and
Grishman,2011;Guo et al.,2013), Question An-
swering (Yin et al.,2016) and Semantic Parsing
(Bevilacqua et al.,2021;Procopio et al.,2021).
ED generally occurs as the last step in an Entity
Linking pipeline (Broscheit,2019), preceded by
Mention Detection and Candidate Generation, and
its approaches have been traditionally divided into
two groups, depending on whether co-occurring
mentions are disambiguated independently (local
methods;Shahbazi et al. (2019); Wu et al. (2020);
Tedeschi et al. (2021)) or not (global methods;Hof-
fart et al. (2011); Moro et al. (2014); Yamada et al.
(2016); Yang et al. (2018)).
Despite the limiting operational hypothesis of in-
dependence between co-occurring mentions, local
methods have nowadays achieved performances
that are either on par or above those attained by
their global counterparts, mainly thanks to the ad-
vent of large pre-trained language models. In par-
ticular, among these methods, generative (De Cao
et al.,2021) and extractive (Barba et al.,2022)
formulations are arguably the most promising di-
rections, having resulted in large performance im-
provements across multiple benchmarks. Regard-
less of their modeling differences, the key idea
behind these methods is to part away from the pre-
vious classification-based approaches and, instead,
adopt formulations that better leverage the origi-
nal pre-training of the underlying language models.
On the one hand, generative formulations tackle
ED as a text generation problem and train neural
architectures to auto-regressively generate, given
a mention and its context, a textual representation
of the correct entity. On the other hand, extrac-
tive approaches frame ED as extractive question
answering: they first concatenate a textual repre-
sentation of each entity candidate to the original
input and then train a model to extract the span
corresponding to the correct entity.
Although having admittedly attained great im-
provements, both in- and out-of-domain, to the best
of our knowledge, previous works on both these
formulations have limited their studies to a sin-
gle type of textual representation for entities, that
is, their title in Wikipedia. However, this strategy
presents a number of issues (Barba et al.,2022)
and, in particular, often results in representations
that are either insufficiently informative or even
virtually indistinguishable between one another. In
arXiv:2210.05648v1 [cs.CL] 11 Oct 2022
contrast to this trend, we address this limitation
and explore the effect of more expressive textual
representation on state-of-the-art local methods. To
this end, we propose to complement Wikipedia ti-
tles with their description in Wikidata so that, for
instance, the candidates for Ronaldo in
Ronaldo
scored two goals for Portugal would be Cristiano
Ronaldo: Portoguese association football player
and Ronaldo: Brazilian association football player,
rather than the less informative Cristiano Ronaldo
and Ronaldo. We test our novel representations on
generative and extractive formulations, and evalu-
ate against standard benchmarks in ED, both in and
out of domain, reporting statistically significant
improvements for the latter group.
2 Method
We now formally introduce ED and the textual
representation strategy we put forward. Then, we
describe the two formulations with which we im-
plement and test our proposal.
ED with Entity Definitions
Given a mention
m
occurring in a context
cm
, Entity Disambiguation
is formally defined as the task of identifying, out
of a set of candidates
e1, . . . , en
, the correct entity
e
that
m
refers to. In generative and extractive
formulations, each candidate
e
is additionally asso-
ciated with a text representation
ˆe
, which is a string
describing its meaning. Whereas previous works
have considered the title that
e
had in Wikipedia as
ˆe
, here we focus on more expressive alternatives
and leverage Wikidata to achieve this objective. In
particular, we first retrieve the Wikidata description
of
e
. Then, we define as the new representation of
e
the colon-separated concatenation of its Wikipedia
title and its Wikidata description, e.g., Ronaldo:
Brazilian association football player.
Generative Modeling
In our first formulation,
we follow De Cao et al. (2021) and frame ED as a
text generation problem. Starting from a mention
m
and its context
cm
, we first wrap the location
of
m
in
cm
between two special symbols, namely
<s> and </s>; we denote this modified sequence by
˜cm
. Then, we train a sequence-to-sequence model
to generate the textual sequence
ˆe
of the correct
entity eby learning the following probability:
p(ˆe|˜cm) =
|ˆe|
Y
j=1
p(ˆe
j|ˆe
1:j1,˜cm)
Dataset Instances Candidates Failures
AIDA
Train 18,448 905,916 /79,561 5038 /682
Validation 4791 236,193 /43,339 1360 /296
Test 4485 231,595 /46,660 1395 /323
OOD
MSNBC 656 17,895 /8336 149 /72
AQUAINT 727 23,917 /16,948 142 /121
ACE2004 257 12,292 /8045 66 /50
CWEB 11,154 462,423 /119,781 3642 /1265
WIKI 6821 222,870 /105,440 1216 /719
Table 1: Number of instances, candidates and failures
to map a Wikipedia title to its Wikidata definition in
the AIDA-CoNLL (top) and out-of-domain (bottom)
datasets. For candidates and failures, we report both
their total (base) and unique (exponent) number.
where
ˆe
j
denotes the
j
-th token of
ˆe
and
ˆe
0
is
a special start symbol. The purpose of <s> and
</s> is to signal the model that
m
is the token we
are interested in disambiguating. As in the refer-
ence work, we use BART (Lewis et al.,2020) as
our sequence-to-sequence architecture for our ex-
periments and, most importantly, adopt constraint
decoding on the candidate set at inference time.
Indeed, applying standard decoding methods such
as beam search might result in outputs that do not
match any of the original candidates; thus, to ob-
tain only valid sequences, at each generation step,
we constrain the set of tokens that can be generated
according to a prefix tree (Cormen et al.,2009)
built over the candidate set.
Extractive Modeling
Additionally, we also con-
sider the formulation recently presented by Barba
et al. (2022) that frames ED as extractive question
answering. Here,
˜cm
, defined analogously to the
previous paragraph, represents the query, whereas
the context is built by concatenating a textual rep-
resentation of each candidate
e1, . . . , en
. A model
is then trained to extract the text span that corre-
sponds to
e
. Following the efficiency reasoning
of the authors, we use as our underlying model the
Longformer (Beltagy et al.,2020), whose linear
attention better scales to this type of long-input
formulations. Compared to the above generative
method, the benefits of this approach lie in i) drop-
ping the need for a potentially slow auto-regressive
decoding process and ii) enabling full joint con-
textualization both between context and candidates
and across candidates themselves.
摘要:

EntityDisambiguationwithEntityDenitionsLuigiProcopio1SimoneConia1EdoardoBarba1RobertoNavigli2SapienzaNLPGroupSapienzaUniversityofRome1{lastname}@di.uniroma1.it2navigli@diag.uniroma1.itAbstractLocalmodelshaverecentlyattainedastound-ingperformancesinEntityDisambiguation(ED),withgenerativeandextractiv...

展开>> 收起<<
Entity Disambiguation with Entity Definitions Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2 Sapienza NLP Group.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:626.93KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注