Entity Disambiguation with Entity Deﬁnitions Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2 Sapienza NLP Group

2025-04-24 0 0 626.93KB 6 页 10玖币

侵权投诉

Entity Disambiguation with Entity Deﬁnitions

Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2

Sapienza NLP Group

Sapienza University of Rome

1{lastname}@di.uniroma1.it

2navigli@diag.uniroma1.it

Abstract

Local models have recently attained astound-

ing performances in Entity Disambiguation

(ED), with generative and extractive formula-

tions being the most promising research di-

rections. However, previous works limited

their studies to using, as the textual represen-

tation of each candidate, only its Wikipedia

title. Although certainly effective, this strat-

egy presents a few critical issues, especially

when titles are not sufﬁciently informative or

distinguishable from one another. In this pa-

per, we address this limitation and investigate

to what extent more expressive textual rep-

resentations can mitigate it. We thoroughly

evaluate our approach against standard bench-

marks in ED and ﬁnd extractive formulations

to be particularly well-suited to these represen-

tations: we report a new state of the art on 2

out of 6benchmarks we consider and strongly

improve the generalization capability over un-

seen patterns. We release our code, data and

model checkpoints at https://github.com/

SapienzaNLP/extend.

1 Introduction

Being able to pair a mention in a given text with its

correct entity out of a set of candidates is a crucial

problem in Natural Language Processing (NLP),

referred to as Entity Disambiguation (Bunescu and

Pa¸sca,2006, ED). Indeed, since ED enables the

identiﬁcation of the actors involved in human lan-

guage, it is often considered a necessary build-

ing block for a wide range of downstream appli-

cations, including Information Extraction (Ji and

Grishman,2011;Guo et al.,2013), Question An-

swering (Yin et al.,2016) and Semantic Parsing

(Bevilacqua et al.,2021;Procopio et al.,2021).

ED generally occurs as the last step in an Entity

Linking pipeline (Broscheit,2019), preceded by

Mention Detection and Candidate Generation, and

its approaches have been traditionally divided into

two groups, depending on whether co-occurring

mentions are disambiguated independently (local

methods;Shahbazi et al. (2019); Wu et al. (2020);

Tedeschi et al. (2021)) or not (global methods;Hof-

fart et al. (2011); Moro et al. (2014); Yamada et al.

(2016); Yang et al. (2018)).

Despite the limiting operational hypothesis of in-

dependence between co-occurring mentions, local

methods have nowadays achieved performances

that are either on par or above those attained by

their global counterparts, mainly thanks to the ad-

vent of large pre-trained language models. In par-

ticular, among these methods, generative (De Cao

et al.,2021) and extractive (Barba et al.,2022)

formulations are arguably the most promising di-

rections, having resulted in large performance im-

provements across multiple benchmarks. Regard-

less of their modeling differences, the key idea

behind these methods is to part away from the pre-

vious classiﬁcation-based approaches and, instead,

adopt formulations that better leverage the origi-

nal pre-training of the underlying language models.

On the one hand, generative formulations tackle

ED as a text generation problem and train neural

architectures to auto-regressively generate, given

a mention and its context, a textual representation

of the correct entity. On the other hand, extrac-

tive approaches frame ED as extractive question

answering: they ﬁrst concatenate a textual repre-

sentation of each entity candidate to the original

input and then train a model to extract the span

corresponding to the correct entity.

Although having admittedly attained great im-

provements, both in- and out-of-domain, to the best

of our knowledge, previous works on both these

formulations have limited their studies to a sin-

gle type of textual representation for entities, that

is, their title in Wikipedia. However, this strategy

presents a number of issues (Barba et al.,2022)

and, in particular, often results in representations

that are either insufﬁciently informative or even

virtually indistinguishable between one another. In

arXiv:2210.05648v1 [cs.CL] 11 Oct 2022

contrast to this trend, we address this limitation

and explore the effect of more expressive textual

representation on state-of-the-art local methods. To

this end, we propose to complement Wikipedia ti-

tles with their description in Wikidata so that, for

instance, the candidates for Ronaldo in

Ronaldo

scored two goals for Portugal would be Cristiano

Ronaldo: Portoguese association football player

and Ronaldo: Brazilian association football player,

rather than the less informative Cristiano Ronaldo

and Ronaldo. We test our novel representations on

generative and extractive formulations, and evalu-

ate against standard benchmarks in ED, both in and

out of domain, reporting statistically signiﬁcant

improvements for the latter group.

2 Method

We now formally introduce ED and the textual

representation strategy we put forward. Then, we

describe the two formulations with which we im-

plement and test our proposal.

ED with Entity Deﬁnitions

Given a mention

occurring in a context

, Entity Disambiguation

is formally deﬁned as the task of identifying, out

of a set of candidates

e1, . . . , en

, the correct entity

e∗

that

refers to. In generative and extractive

formulations, each candidate

is additionally asso-

ciated with a text representation

ˆe

, which is a string

describing its meaning. Whereas previous works

have considered the title that

had in Wikipedia as

ˆe

, here we focus on more expressive alternatives

and leverage Wikidata to achieve this objective. In

particular, we ﬁrst retrieve the Wikidata description

. Then, we deﬁne as the new representation of

the colon-separated concatenation of its Wikipedia

title and its Wikidata description, e.g., Ronaldo:

Brazilian association football player.

Generative Modeling

In our ﬁrst formulation,

we follow De Cao et al. (2021) and frame ED as a

text generation problem. Starting from a mention

and its context

, we ﬁrst wrap the location

between two special symbols, namely

<s> and </s>; we denote this modiﬁed sequence by

˜cm

. Then, we train a sequence-to-sequence model

to generate the textual sequence

ˆe∗

of the correct

entity e∗by learning the following probability:

p(ˆe∗|˜cm) =

|ˆe∗|

j=1

p(ˆe∗

j|ˆe∗

1:j−1,˜cm)

Dataset Instances Candidates Failures

AIDA

Train 18,448 905,916 /79,561 5038 /682

Validation 4791 236,193 /43,339 1360 /296

Test 4485 231,595 /46,660 1395 /323

OOD

MSNBC 656 17,895 /8336 149 /72

AQUAINT 727 23,917 /16,948 142 /121

ACE2004 257 12,292 /8045 66 /50

CWEB 11,154 462,423 /119,781 3642 /1265

WIKI 6821 222,870 /105,440 1216 /719

Table 1: Number of instances, candidates and failures

to map a Wikipedia title to its Wikidata deﬁnition in

the AIDA-CoNLL (top) and out-of-domain (bottom)

datasets. For candidates and failures, we report both

their total (base) and unique (exponent) number.

where

ˆe∗

denotes the

-th token of

ˆe∗

and

ˆe∗

a special start symbol. The purpose of <s> and

</s> is to signal the model that

is the token we

are interested in disambiguating. As in the refer-

ence work, we use BART (Lewis et al.,2020) as

our sequence-to-sequence architecture for our ex-

periments and, most importantly, adopt constraint

decoding on the candidate set at inference time.

Indeed, applying standard decoding methods such

as beam search might result in outputs that do not

match any of the original candidates; thus, to ob-

tain only valid sequences, at each generation step,

we constrain the set of tokens that can be generated

according to a preﬁx tree (Cormen et al.,2009)

built over the candidate set.

Extractive Modeling

Additionally, we also con-

sider the formulation recently presented by Barba

et al. (2022) that frames ED as extractive question

answering. Here,

˜cm

, deﬁned analogously to the

previous paragraph, represents the query, whereas

the context is built by concatenating a textual rep-

resentation of each candidate

e1, . . . , en

. A model

is then trained to extract the text span that corre-

sponds to

e∗

. Following the efﬁciency reasoning

of the authors, we use as our underlying model the

Longformer (Beltagy et al.,2020), whose linear

attention better scales to this type of long-input

formulations. Compared to the above generative

method, the beneﬁts of this approach lie in i) drop-

ping the need for a potentially slow auto-regressive

decoding process and ii) enabling full joint con-

textualization both between context and candidates

and across candidates themselves.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EntityDisambiguationwithEntityDenitionsLuigiProcopio1SimoneConia1EdoardoBarba1RobertoNavigli2SapienzaNLPGroupSapienzaUniversityofRome1{lastname}@di.uniroma1.it2navigli@diag.uniroma1.itAbstractLocalmodelshaverecentlyattainedastound-ingperformancesinEntityDisambiguation(ED),withgenerativeandextractiv...

展开>> 收起<<

Entity Disambiguation with Entity Deﬁnitions Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2 Sapienza NLP Group.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Entity Disambiguation with Entity Deﬁnitions Luigi Procopio1Simone Conia1Edoardo Barba1Roberto Navigli2 Sapienza NLP Group

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: