Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan

2025-05-03 0 0 1.93MB 13 页 10玖币
侵权投诉
Investigating the Role of Centering Theory
in the Context of Neural Coreference Resolution Systems
Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan
{yuchen.jiang,mrinmaya.sachan,ryan.cotterell}@inf.ethz.ch
Abstract
Centering theory (CT; Grosz et al.,1995)
provides a linguistic analysis of the structure
of discourse. According to the theory, lo-
cal coherence of discourse arises from the
manner and extent to which successive ut-
terances make reference to the same entities.
In this paper, we investigate the connection
between centering theory and modern coref-
erence resolution systems. We provide an
operationalization of centering and system-
atically investigate if neural coreference re-
solvers adhere to the rules of centering theory
by defining various discourse metrics and de-
veloping a search-based methodology. Our
information-theoretic analysis reveals a pos-
itive dependence between coreference and
centering; but also shows that high-quality
neural coreference resolvers may not bene-
fit much from explicitly modeling centering
ideas. Our analysis further shows that con-
textualized embeddings contain much of the
coherence information, which helps explain
why CT can only provide little gains to mod-
ern neural coreference resolvers which make
use of pretrained representations. Finally, we
discuss factors that contribute to coreference
which are not modeled by CT such as world
knowledge and recency bias. We formulate a
version of CT that also models recency and
show that it captures coreference information
better compared to vanilla CT.
https://github.com/
EleanorJiang/ct-coref
1 Introduction
Centering theory (CT; Grosz et al.,1995) is a well-
known theory of discourse that provides an account
of the coherence of a piece of text through the
manner in which successive utterances refer to the
same discourse entity. CT has served as a theoret-
ical foundation for many NLP applications such
as coreference resolution, machine translation, text
generation and summarization. Among them, CT
U1John has been having a lot of trouble arranging his vacation.
U2He cannot find anyone to take over his responsibilities.
U3He called up Mike yesterday to work out a plan.
U4Mike has annoyed John a lot recently.
U0
4He has annoyed John a lot recently.
U5He called John at 5 AM on Friday last week.
Figure 1:
An example of anaphora resolution and coher-
ence. Mentions spans are underlined, and colors repre-
sent entity clusters. An account of coherence is closely
related to the question: Why U4is better than U0
4?
has been most well-studied in the context of coref-
erence, a task of linking referring expressions to the
entity they refer to in the text (Sidner,1979;Bren-
nan et al.,1987;Iida et al.,2003;Beaver,2004;
Kong et al.,2009;Kehler and Rohde,2013).
Previous work has shown that there are deep
connections between CT and coreference. Refer-
ring expressions often show preference to certain
linguistic forms to indicate a reference relation to
their antecedents. For example, pronouns are often
used to refer to preceding named entities, but rein-
troduction of the named entity leads to use of their
nominal form. These referring expressions thereby
can be seen to connect the various utterances in the
text and contribute to the coherence of the overall
text. Thus, it has long been believed that coherence
can impose constraints on referential accessibility.
See fig. 1for an example.
Old coreference resolution models indeed ex-
ploited this connection (Brennan et al.,1987;Sid-
ner,1979;Iida et al.,2003;Beaver,2004;Kong
et al.,2009), arguing that the constraints proposed
by CT can serve as a useful guide for coreference
resolution models (Elango,2005;van Deemter and
Kibble,2000;Chai and Strube,2022). However,
modern coreference systems are primarily based on
neural networks and are trained end to end without
any explicit linguistic bias. A natural question is,
then, whether these neural coreference resolvers
work in a similar way as CT suggests and, more
arXiv:2210.14678v1 [cs.CL] 26 Oct 2022
practically, if CT may be a useful inductive bias for
neural coreference resolution systems.
In this paper, we attempt to provide an answer to
these questions through a careful analysis of neural
coreference models using various discourse metrics
(referred to as centering metrics) and conducting
several statistical tests. Because CT, at its core is
a linguistic theory, and not a computational one,
we first provide a computational operationalization
of CT that we can directly implement (
§
2). Our
operationalization requires us to concretely specify
the linguistic notations present in the original work
(Grosz et al.,1995;Poesio et al.,2004) and draw
conclusions about how well neural coreference
resolvers accord with CT.
In a series of systematic analyses (
§
5), we first
show that neural coreference resolution models
achieve relatively high scores under centering met-
rics, indicating they do contain some information
about discourse coherence, even though they are
not trained by any CT signals. In addition, as
shown in Fig. 2, there is a non-trivial relationship
between CT and coreference, which we quantify
by mutual information, between the performance
of a coreference resolver and our various CT oper-
ationalizations (Chambers and Smyth,1998;Gor-
don and Hendrick,1998). However, the centering
scores taper off as we have more accurate coref-
erence models (i.e., models with higher CoNLL
F1): the dependence between CT and coreference
performance decreases when CoNLL F1 reaches
above 50%. This interval, unfortunately, is where
all modern coreference resolution models lie. This
indicates that entity coherence information is no
longer helpful in improving current neural corefer-
ence resolution systems.
Next, we turn to answering the question: Where
in their architecture do neural coreference systems
capture this CT information? Our experiments
on the well-known C2Fcoreference model with
SpanBERT embeddings (Joshi et al.,2020) (
§
5.3)
reveal that the contextualized SpanBERT embed-
dings contain much of the coherence information,
which explains why incorporating elements of
CT only yields minor improvements to a neural
coreference systems.
Finally, we explore what information required in
coreference resolution is not captured by CT? We
show that CT does not capture factors such as re-
cency bias and world knowledge (
§
6) which might
be required in the task of coreference resolution.
In order to explore the role of recency bias, we
extend our CT formulation to account for this bias
by controlling the salience of centers in the CT for-
mulation. We show that this reformulation of CT
captures coreference information better compared
to vanilla CT at the same centering score level. We
end with a summary of takeaways from our work.
2 Coreference and Centering Theory
In this section, we overview the necessary back-
ground on coreference and centering theory in
our own notation. We define a
discourse D=
[U1, . . . , UN]
of length
N
as a sequence of
N
utterances, each denoted as
Un
. We take an
ut-
terance Un
of length
M
to be a string of tokens
t1· · · tM
where each token
tm
is taken from a vo-
cabulary
V
.
1
Let
M(Un) = {m1, m2, . . .}
be
the set of mentions in the utterance
Un
. A
men-
tion
is a subsequence of the tokens that comprise
Un=t1· · · tM
. Mentions could be pronouns, re-
peated noun phrases, and so forth, and are often
called anaphoric devices in the discourse literature.
2.1 Coreference
Next, let
E
be the set of entities in the world. A
coreference resolver f:M(D)→ E
imple-
ments a function from the set of mentions onto
the set of entities (henceforth also referred to as
the MENTION ENTITY MAPPING
f
).
2
In Table 1,
J·K
denotes
f(·)
for illustration, i.e., a mention,
e.g., Mike, is mapped to the entity
JMikeKi
. Here
we reuse the notation
M(·)
, where
M(D)def
=
S
UnD
M(Un)
. Rule-based or feature-based corefer-
ences resolvers (Hobbs,1978;Sidner,1979;Bren-
nan et al.,1987;Kong et al.,2009) resolve coref-
erence by explicitly combining CT constraints or
syntactic constraints. Current state-of-the-art coref-
erence resolvers are end-to-end neural models (Lee
et al.,2017;Joshi et al.,2020;Wu et al.,2020).
2.2 Centering Theory
Centering theory (CT) offers a theoretical expla-
nation of local discourse structure that models the
interaction of referential continuity and the salience
1
This definition of an utterance could be understood as a
textual unit as short as a clause, but it also could be understood
as a textual unit as long as multiple paragraphs; we have left it
intentionally open-ended and will revisit this point in §4.
2
In general, coreference resolution includes a mention de-
tection step. In our analysis, we assume the mentions to be
given. Thus,
f
can essentially be thought of as an implemen-
tation of the entity-linking step in coreference resolution.
Utterance; mentions (elements of M) are underlined C
f
C
p
CbTransition
U1John has been having a lot of trouble arranging his vacation.
JJohnKi
JtroubleKj
JvacationKk
JJohnKiε
U2He cannot find anyone to take over his responsibilities. JJohnKi
JresponsibilitiesKl
JJohnKiJJohnKiCONTINUE
U3He called up Mike yesterday to work out a plan.
JJohnKi
JMikeKm
JplanKn
JJohnKiJJohnKiCONTINUE
U4Mike has annoyed John a lot recently. JMikeKm
JJohnKi
JMikeKmJJohnKiRETAIN
U5He called John at 5 AM on Friday last week. JMikeKm
JJohnKi
JMikeKmJMikeKmSMOOTHSHIFT
Table 1:
An example describing centering theory with the weighting function
w
being GRAMMATICALROLE. Here,
[]i
denotes the entity
ei
. For each utterance, a set of mentions
M
are detected with weights, then map to a set of
entities
C
f
(both He and his map to
JJohnKi
in
U2
). We sort the entities in
C
f
by their weights for illustration.
C
p
is
the most weighted element in
C
f
(
JJohnKi
is an more important entity than
JresponsibilitiesKl
in
U2
).
C
b
is chosen
from the C
f
of the previous utterance.
of discourse entities in the internal organization of
a text. It was one of the first formal treatments of
discourse, and remains one of the most influential.
As the name suggests, CT revolves around the
notion of
centering
, which is, informally, the
shifting of focus from one entity to another during
the discourse. A
center
is then defined as an entity
in
E
that is in the focus at a certain point in the
discourse. CT describes some preferences on: a)
the nature of the shift of the center from one entity
to another, and b) linguistic properties of mentions
referring to the center (e.g., mentions that attach
to the center are typically subjects and are prefer-
entially pronominalized compared to others). We
offer a more formal treatment later in the section.
As an example of centering theory in action,
consider the discourse given in Table 1:
D=
[U1. . . U5]. Now, consider replacing U4with:
(1) U0
4:He has annoyed John a lot recently.
Note that the resulting discourses
D=
[U1...,U4, U5]
and
D0= [U1...,U0
4, U5]
differ only by one utterance. CT argues that
D0
is not as felicitous as
D
. This is because, in the
utterance
U3
, the discourse entity
JJohnKi
is the
center and not
JMikeKm
, and given a preference for
pronominalizing the center of attention,
JJohnKi
should be pronominalized as well if
JMikeKm
is
pronominalized. We will now formally define the
key notions of centering theory.
Weighting function over Mentions.
Let
weight :Un× M(Un)R
be a weighting
function on the set of mentions in utterance
Un
.
Mentions that are assigned a higher weight are
more likely to link to a center, i.e. an entity in
focus, in the given context. For example, in
U1
in
Table 1, John is assigned the highest weight since
it is the subject of the sentence, thus, is more likely
to link to the center.3
Weighting function over Entities.
Now we turn
from weighting mentions to weighting entities.
Given an utterance
Un
, let
f
1
Un(e)
be the pre-image
of e∈ E:
f
1
Un(e) = nm|m∈ M(Un), f(m) = eo(1)
which maps an entity
e
back to a set of mentions it
links to. Now we may lift the weighting function of
a
weight
to an entity by having it take the highest
weight attached to the mentions that link to the
entity, i.e.,
weight (Un, e) = M
mf1
Un(e)
weight(Un, m)(2)
where
is a generic aggregator over mentions;
obvious choices are = max or =P.
The weighting function
weight
is arguably the
most important component of centering. Previous
3
It is worth noting that the original presentation of Grosz
et al. (1995), in contrast, specifies a ranking of the entities.
摘要:

InvestigatingtheRoleofCenteringTheoryintheContextofNeuralCoreferenceResolutionSystemsYuchenEleanorJiangRyanCotterellMrinmayaSachanfyuchen.jiang,mrinmaya.sachan,ryan.cotterellg@inf.ethz.chAbstractCenteringtheory(CT;Groszetal.,1995)providesalinguisticanalysisofthestructureofdiscourse.Accordingtothethe...

展开>> 收起<<
Investigating the Role of Centering Theory in the Context of Neural Coreference Resolution Systems Yuchen Eleanor Jiang Ryan Cotterell Mrinmaya Sachan.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.93MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注