Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan

2025-05-03 0 0 1.93MB 13 页 10玖币

侵权投诉

Investigating the Role of Centering Theory

in the Context of Neural Coreference Resolution Systems

Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan

{yuchen.jiang,mrinmaya.sachan,ryan.cotterell}@inf.ethz.ch

Abstract

Centering theory (CT; Grosz et al.,1995)

provides a linguistic analysis of the structure

of discourse. According to the theory, lo-

cal coherence of discourse arises from the

manner and extent to which successive ut-

terances make reference to the same entities.

In this paper, we investigate the connection

between centering theory and modern coref-

erence resolution systems. We provide an

operationalization of centering and system-

atically investigate if neural coreference re-

solvers adhere to the rules of centering theory

by deﬁning various discourse metrics and de-

veloping a search-based methodology. Our

information-theoretic analysis reveals a pos-

itive dependence between coreference and

centering; but also shows that high-quality

neural coreference resolvers may not bene-

ﬁt much from explicitly modeling centering

ideas. Our analysis further shows that con-

textualized embeddings contain much of the

coherence information, which helps explain

why CT can only provide little gains to mod-

ern neural coreference resolvers which make

use of pretrained representations. Finally, we

discuss factors that contribute to coreference

which are not modeled by CT such as world

knowledge and recency bias. We formulate a

version of CT that also models recency and

show that it captures coreference information

better compared to vanilla CT.

https://github.com/

EleanorJiang/ct-coref

1 Introduction

Centering theory (CT; Grosz et al.,1995) is a well-

known theory of discourse that provides an account

of the coherence of a piece of text through the

manner in which successive utterances refer to the

same discourse entity. CT has served as a theoret-

ical foundation for many NLP applications such

as coreference resolution, machine translation, text

generation and summarization. Among them, CT

U1John has been having a lot of trouble arranging his vacation.

U2He cannot ﬁnd anyone to take over his responsibilities.

U3He called up Mike yesterday to work out a plan.

U4Mike has annoyed John a lot recently.

4He has annoyed John a lot recently.

U5He called John at 5 AM on Friday last week.

Figure 1:

An example of anaphora resolution and coher-

ence. Mentions spans are underlined, and colors repre-

sent entity clusters. An account of coherence is closely

related to the question: Why U4is better than U0

has been most well-studied in the context of coref-

erence, a task of linking referring expressions to the

entity they refer to in the text (Sidner,1979;Bren-

nan et al.,1987;Iida et al.,2003;Beaver,2004;

Kong et al.,2009;Kehler and Rohde,2013).

Previous work has shown that there are deep

connections between CT and coreference. Refer-

ring expressions often show preference to certain

linguistic forms to indicate a reference relation to

their antecedents. For example, pronouns are often

used to refer to preceding named entities, but rein-

troduction of the named entity leads to use of their

nominal form. These referring expressions thereby

can be seen to connect the various utterances in the

text and contribute to the coherence of the overall

text. Thus, it has long been believed that coherence

can impose constraints on referential accessibility.

See fig. 1for an example.

Old coreference resolution models indeed ex-

ploited this connection (Brennan et al.,1987;Sid-

ner,1979;Iida et al.,2003;Beaver,2004;Kong

et al.,2009), arguing that the constraints proposed

by CT can serve as a useful guide for coreference

resolution models (Elango,2005;van Deemter and

Kibble,2000;Chai and Strube,2022). However,

modern coreference systems are primarily based on

neural networks and are trained end to end without

any explicit linguistic bias. A natural question is,

then, whether these neural coreference resolvers

work in a similar way as CT suggests and, more

arXiv:2210.14678v1 [cs.CL] 26 Oct 2022

practically, if CT may be a useful inductive bias for

neural coreference resolution systems.

In this paper, we attempt to provide an answer to

these questions through a careful analysis of neural

coreference models using various discourse metrics

(referred to as centering metrics) and conducting

several statistical tests. Because CT, at its core is

a linguistic theory, and not a computational one,

we ﬁrst provide a computational operationalization

of CT that we can directly implement (

2). Our

operationalization requires us to concretely specify

the linguistic notations present in the original work

(Grosz et al.,1995;Poesio et al.,2004) and draw

conclusions about how well neural coreference

resolvers accord with CT.

In a series of systematic analyses (

5), we ﬁrst

show that neural coreference resolution models

achieve relatively high scores under centering met-

rics, indicating they do contain some information

about discourse coherence, even though they are

not trained by any CT signals. In addition, as

shown in Fig. 2, there is a non-trivial relationship

between CT and coreference, which we quantify

by mutual information, between the performance

of a coreference resolver and our various CT oper-

ationalizations (Chambers and Smyth,1998;Gor-

don and Hendrick,1998). However, the centering

scores taper off as we have more accurate coref-

erence models (i.e., models with higher CoNLL

F1): the dependence between CT and coreference

performance decreases when CoNLL F1 reaches

above 50%. This interval, unfortunately, is where

all modern coreference resolution models lie. This

indicates that entity coherence information is no

longer helpful in improving current neural corefer-

ence resolution systems.

Next, we turn to answering the question: Where

in their architecture do neural coreference systems

capture this CT information? Our experiments

on the well-known C2Fcoreference model with

SpanBERT embeddings (Joshi et al.,2020) (

5.3)

reveal that the contextualized SpanBERT embed-

dings contain much of the coherence information,

which explains why incorporating elements of

CT only yields minor improvements to a neural

coreference systems.

Finally, we explore what information required in

coreference resolution is not captured by CT? We

show that CT does not capture factors such as re-

cency bias and world knowledge (

6) which might

be required in the task of coreference resolution.

In order to explore the role of recency bias, we

extend our CT formulation to account for this bias

by controlling the salience of centers in the CT for-

mulation. We show that this reformulation of CT

captures coreference information better compared

to vanilla CT at the same centering score level. We

end with a summary of takeaways from our work.

2 Coreference and Centering Theory

In this section, we overview the necessary back-

ground on coreference and centering theory in

our own notation. We deﬁne a

discourse D=

[U1, . . . , UN]

of length

as a sequence of

utterances, each denoted as

. We take an

ut-

terance Un

of length

to be a string of tokens

t1· · · tM

where each token

is taken from a vo-

cabulary

Let

M(Un) = {m1, m2, . . .}

the set of mentions in the utterance

. A

men-

tion

is a subsequence of the tokens that comprise

Un=t1· · · tM

. Mentions could be pronouns, re-

peated noun phrases, and so forth, and are often

called anaphoric devices in the discourse literature.

2.1 Coreference

Next, let

be the set of entities in the world. A

coreference resolver f:M(D)→ E

imple-

ments a function from the set of mentions onto

the set of entities (henceforth also referred to as

the MENTION ENTITY MAPPING

In Table 1,

J·K

denotes

f(·)

for illustration, i.e., a mention,

e.g., Mike, is mapped to the entity

JMikeKi

. Here

we reuse the notation

M(·)

, where

M(D)def

Un∈D

M(Un)

. Rule-based or feature-based corefer-

ences resolvers (Hobbs,1978;Sidner,1979;Bren-

nan et al.,1987;Kong et al.,2009) resolve coref-

erence by explicitly combining CT constraints or

syntactic constraints. Current state-of-the-art coref-

erence resolvers are end-to-end neural models (Lee

et al.,2017;Joshi et al.,2020;Wu et al.,2020).

2.2 Centering Theory

Centering theory (CT) offers a theoretical expla-

nation of local discourse structure that models the

interaction of referential continuity and the salience

This deﬁnition of an utterance could be understood as a

textual unit as short as a clause, but it also could be understood

as a textual unit as long as multiple paragraphs; we have left it

intentionally open-ended and will revisit this point in §4.

In general, coreference resolution includes a mention de-

tection step. In our analysis, we assume the mentions to be

given. Thus,

can essentially be thought of as an implemen-

tation of the entity-linking step in coreference resolution.

Utterance; mentions (elements of M) are underlined C

CbTransition

U1John has been having a lot of trouble arranging his vacation.

JJohnKi

JtroubleKj

JvacationKk

JJohnKiε—

U2He cannot ﬁnd anyone to take over his responsibilities. JJohnKi

JresponsibilitiesKl

JJohnKiJJohnKiCONTINUE

U3He called up Mike yesterday to work out a plan.

JJohnKi

JMikeKm

JplanKn

JJohnKiJJohnKiCONTINUE

U4Mike has annoyed John a lot recently. JMikeKm

JJohnKi

JMikeKmJJohnKiRETAIN

U5He called John at 5 AM on Friday last week. JMikeKm

JJohnKi

JMikeKmJMikeKmSMOOTHSHIFT

Table 1:

An example describing centering theory with the weighting function

being GRAMMATICALROLE. Here,

[∗]i

denotes the entity

. For each utterance, a set of mentions

are detected with weights, then map to a set of

entities

(both He and his map to

JJohnKi

). We sort the entities in

by their weights for illustration.

the most weighted element in

(

JJohnKi

is an more important entity than

JresponsibilitiesKl

is chosen

from the C

of the previous utterance.

of discourse entities in the internal organization of

a text. It was one of the ﬁrst formal treatments of

discourse, and remains one of the most inﬂuential.

As the name suggests, CT revolves around the

notion of

centering

, which is, informally, the

shifting of focus from one entity to another during

the discourse. A

center

is then deﬁned as an entity

that is in the focus at a certain point in the

discourse. CT describes some preferences on: a)

the nature of the shift of the center from one entity

to another, and b) linguistic properties of mentions

referring to the center (e.g., mentions that attach

to the center are typically subjects and are prefer-

entially pronominalized compared to others). We

offer a more formal treatment later in the section.

As an example of centering theory in action,

consider the discourse given in Table 1:

[U1. . . U5]. Now, consider replacing U4with:

(1) U0

4:He has annoyed John a lot recently.

Note that the resulting discourses

[U1...,U4, U5]

and

D0= [U1...,U0

4, U5]

differ only by one utterance. CT argues that

is not as felicitous as

. This is because, in the

utterance

, the discourse entity

JJohnKi

is the

center and not

JMikeKm

, and given a preference for

pronominalizing the center of attention,

JJohnKi

should be pronominalized as well if

JMikeKm

pronominalized. We will now formally deﬁne the

key notions of centering theory.

Weighting function over Mentions.

Let

weight :Un× M(Un)→R

be a weighting

function on the set of mentions in utterance

Mentions that are assigned a higher weight are

more likely to link to a center, i.e. an entity in

focus, in the given context. For example, in

Table 1, John is assigned the highest weight since

it is the subject of the sentence, thus, is more likely

to link to the center.3

Weighting function over Entities.

Now we turn

from weighting mentions to weighting entities.

Given an utterance

, let

−1

Un(e)

be the pre-image

of e∈ E:

−1

Un(e) = nm|m∈ M(Un), f(m) = eo(1)

which maps an entity

back to a set of mentions it

links to. Now we may lift the weighting function of

weight

to an entity by having it take the highest

weight attached to the mentions that link to the

entity, i.e.,

weight (Un, e) = M

m∈f−1

Un(e)

weight(Un, m)(2)

where

⊕

is a generic aggregator over mentions;

obvious choices are ⊕= max or ⊕=P.

The weighting function

weight

is arguably the

most important component of centering. Previous

It is worth noting that the original presentation of Grosz

et al. (1995), in contrast, speciﬁes a ranking of the entities.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

InvestigatingtheRoleofCenteringTheoryintheContextofNeuralCoreferenceResolutionSystemsYuchenEleanorJiangRyanCotterellMrinmayaSachanfyuchen.jiang,mrinmaya.sachan,ryan.cotterellg@inf.ethz.chAbstractCenteringtheory(CT;Groszetal.,1995)providesalinguisticanalysisofthestructureofdiscourse.Accordingtothethe...

展开>> 收起<<

Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: