No they did not Dialogue response dynamics in pre-trained language models Sanghee J. Kim1 Lang Yu2 Allyson Ettinger1

2025-05-02 0 0 351.13KB 12 页 10玖币
侵权投诉
“No, they did not”: Dialogue response dynamics in pre-trained language
models
Sanghee J. Kim1, Lang Yu2, Allyson Ettinger1
1Department of Linguistics, University of Chicago
2Meta
{sangheekim,aettinger}@uchicago.edu,langyu@fb.com
Abstract
A critical component of competence in lan-
guage is being able to identify relevant compo-
nents of an utterance and reply appropriately.
In this paper we examine the extent of such
dialogue response sensitivity in pre-trained
language models, conducting a series of ex-
periments with a particular focus on sensi-
tivity to dynamics involving phenomena of
at-issueness and ellipsis. We find that models
show clear sensitivity to a distinctive role of
embedded clauses, and a general preference
for responses that target main clause content of
prior utterances. However, the results indicate
mixed and generally weak trends with respect
to capturing the full range of dynamics in-
volved in targeting at-issue versus not-at-issue
content. Additionally, models show fundamen-
tal limitations in grasp of the dynamics gov-
erning ellipsis, and response selections show
clear interference from superficial factors that
outweigh the influence of principled discourse
constraints.
1 Introduction
Competence in language involves understanding
complex principles governing relevance of previous
content and dynamics of referring back to that con-
tent. Certain parts of an utterance are more central
and more likely to receive a response than others,
and the pragmatic and grammatical rules govern-
ing responses in dialogue interact with the nature
of the content being responded to. Humans are
highly sensitive to these distinctions, and we can
expect these sensitivities to be critical for robust
models in NLP, and especially for dialogue. Here
we examine sensitivity to these dialogue response
dynamics in pre-trained language models (PLMs).
PLMs are now used as foundation for nearly ev-
ery downstream NLP task, including dialogue ap-
plications (e.g., Upadhye et al.,2020;Koto et al.,
2021). The impressive downstream performance
enabled by these models has raised important ques-
tions about what types of linguistic competence
are being learned during pre-training—and though
there is a growing body of work answering aspects
of this question, topics of pragmatic and dialogue
competence have been relatively understudied. In
this paper we focus on addressing this gap, and
in particular on understanding the extent to which
PLMs develop sensitivity to dynamics governing
responses in dialogue. Though these PLMs are
not trained to engage in dialogue per se, they can
be expected to encounter dialogue during training
(in, for instance, novels), so it is not unreasonable
to expect that they may learn about such dialogue
dynamics along with other linguistic competences.
The strength of these models’ sensitivity to such
dynamics has important implications for robust-
ness in dialogue applications, since a strong grasp
of dialogue dynamics in standard PLMs stands to
reduce fine-tuning needs and enable more robust
downstream behaviors.
We begin with the notion of at-issueness. A com-
ponent of an utterance is considered at-issue if it is
part of the “main point” of the utterance—this is
to be contrasted with side comments or mentions
of background knowledge, which are not the main
focus of the sentence. As we lay out in Section 3.1,
the distinction between at-issue and not-at-issue
content of an utterance is reflected directly in the
nature of responses to that utterance. We thus exam-
ine models’ preferences for different responses, to
assess whether the preferences reflect understand-
ing of at-issueness and how to respond to it. We
find that models show consistent preference to tar-
get at-issue (main clause) content, but mixed and
overall fairly weak sensitivity when it comes to the
full range of dynamics involved with at-issueness.
These assessments of at-issueness sensitivity are
also critically reliant on another aspect of dialogue
response dynamics: ellipsis. We thus additionally
make a closer examination of the extent to which
constraints from context dictate models’ selection
arXiv:2210.02526v1 [cs.CL] 5 Oct 2022
of auxiliary verbs (such as did, does, would) in el-
lipsis constructions. We find that although models
often favor an auxiliary verb that targets the main
clause, they also make frequent errors, and they
very rarely favor both of the auxiliary forms that
align with the prior context. These results further-
more raise the important possibility that models
are highly sensitive to preferences for particular
auxiliary verb types, and that this could drive the
at-issueness results as well. With this in mind we
revisit the at-issueness experiments, and find that,
indeed, there are substantial differences in mod-
els’ preferences depending on the identity of the
particular verb that targets the relevant content.
Overall, our results suggest that PLMs have
non-trivial gaps in their understanding of response
dynamics in dialogue. Our results also indicate
certain differences between models: BERT and
RoBERTa show strong bias toward selecting re-
sponses that target the most recent and/or main
clause content, while other models show more re-
liance on individual auxiliary verb properties. In
all cases the results indicate that these PLMs have
not yet achieved ideal sensitivity to response dy-
namics involving at-issueness and ellipsis, and that
effectiveness in dialogue will benefit from addi-
tional training approaches. We make all datasets
and code available for further testing.1
2 Related work
Recent years have seen extensive work on analy-
sis of PLMs. Methodologically, some of the most
popular analysis paradigms targeting model em-
beddings have included classification-based prob-
ing (e.g., Kim et al.,2019;Zhang et al.,2019) and
correlation with similarity judgments (Finkelstein
et al.,2001;Gerz et al.,2016;Conneau and Kiela,
2018). Other work has analyzed PLMs by elicit-
ing and analyzing output predictions (Linzen et al.,
2016;Goldberg,2019). Our work here focuses pri-
marily on the latter methodology, examining and
comparing model output probabilities—however,
our analysis in Section 5.4 uses classification-based
probing. Our work also builds on approaches im-
plementing specialized sentence generation sys-
tems that produce large annotated datasets (Ettinger
et al.,2018;McCoy et al.,2019).
Analyses of PLMs have targeted a variety of
types of linguistic competence. In particular, a
1https://github.com/sangheek16/
dialogue-response-dynamics
large body of work has studied the extent to which
PLMs capture syntactic and semantic informa-
tion (Linzen et al.,2016;Peters et al.,2018;Bacon
and Regier,2019;Hewitt and Manning,2019;Ten-
ney et al.,2019). Less work has addressed the
extent to which PLMs show sensitivity to prag-
matic and discourse information, as we focus on
in this paper. Kurfalı and Östling (2021) study
multilingual models in various discourse tasks via
zero-shot learning. Pandia et al. (2021) investigate
LMs’ pragmatic competence to predict discourse
connectives. Pitler and Nenkova (2009) report that
a supervised classifier is able to identify discourse
relations given syntactic features along with con-
nectives. Patterson and Kehler (2013) implement
a similar idea and show that classifiers are able
to predict the presence of a connective based on
shallow linguistic cues. Koto et al. (2021) explore
pre-trained language models’ capability in captur-
ing discourse level relations. We complement this
existing work by branching into new areas of prag-
matic and discourse knowledge, examining models’
sensitivity to dialogue response dynamics.
Another closely related literature is that in which
PLMs, especially transformer LMs, are used for
building dialogue systems directly. Le et al. (2019)
propose Multimodal Transformer Networks (MTN)
for visual-grounded dialogue tasks. Other work
investigates topic-driven language models for emo-
tion detection in dialogues (Zhu et al.,2021).
Oluwatobi and Mueller (2020) report state-of-
the-art performance on dialogue generation using
transformer-based models. There are also language
models designed for and trained on dialogue or
conversation, such as TransferTransfo (Wolf et al.,
2019), PLATO (Bao et al.,2020), ConveRT (Hen-
derson et al.,2020), TOD-BERT (Wu et al.,2020),
DialoGPT (Zhang et al.,2020), DialogBERT (Gu
et al.,2021), and LaMDA (Thoppilan et al.,2022).
Here we focus on clarifying the extent to which
PLMs pre-trained in the standard paradigm can de-
velop knowledge of dialogue dynamics prior to any
specialized dialogue training. This line of inquiry
serves to broaden our general understanding of lin-
guistic competence of standard PLMs, and also
has implications for use of these standard PLMs as
foundation for further dialogue-specific training.
3 Background
3.1 At-issueness
Our analyses focus on the dynamics that govern
responses in dialogue, and aspects of prior utter-
ances that they target. The first notion that we
test for in PLMs is sensitivity to “at-issueness.
At-issueness refers to content’s status as the main
point of the utterance—to be contrasted with not-at-
issue content, such as side comments and assumed
knowledge (see Potts (2005) for a comprehensive
overview). Humans are sensitive to which content
in an utterance is “at-issue” and which content is
not—and this sensitivity is reflected in dialogue
response dynamics. Consider the utterance in (1).
(1)
The nurse, who has interest in French cui-
sine, adopted a rescue dog.
If a listener responds to (1) with “No” or “That’s
not true, they would most likely be objecting to
the claim that the nurse adopted a rescue dog, since
this is the main point (at-issue content) of (1). It
is less likely that they would be objecting to the
side comment about French cuisine. As a result, a
response of “No, he didn’t (adopt a rescue dog),
would be natural, while “No, he doesn’t (have in-
terest in French cuisine)” would be less so.
This intuition drives a key diagnostic used to dis-
tinguish at-issue and not-at-issue content, known
as the
Rejection & Peripherality Test
(or the As-
sent/Dissent Test) (Amaral et al.,2007;Koev,2013;
Syrett and Koev,2015). The “rejection” compo-
nent of this test is illustrated in (2). Speaker B
1
replies to Speaker As utterance with a
rejection
(“No”), and uses the elliptical verb phrase (“did
not”) that targets the (at-issue) content of the main
clause (“The nurse adopted a rescue dog.”), for
a natural and appropriate response. In contrast,
Speaker B
2
rejects the (not-at-issue) content inside
the appositive relative clause (ARC), which is less
natural (indicated with ‘#’).
(2) a.
Speaker A: “The nurse, who has inter-
est in French cuisine, adopted a rescue
dog.
b.
Speaker B
1
: “No, he did not. [Target-
ing at-issue content]
c.
Speaker B
2
: #“No, he does not. [Tar-
geting not-at-issue content]
There is, however, a more natural way to object
to not-at-issue content: pausing the dialogue to
question a side comment or assumption. This is
highlighted in the
peripherality test
, which uses
phrases like, “Hey, wait a minute” (von Fintel,
2004;Amaral et al.,2007), or “Wait, this is pe-
ripheral to your point but... (Koev,2018) in order
to make targeting not-at-issue content more accept-
able. We show an example in (3).
(3) a.
Speaker A: “The nurse, who has inter-
est in French cuisine, adopted a rescue
dog.
b.
Speaker B: “Wait no, he does not (have
interest in French cuisine). [Targeting
not-at-issue content]
Human sensitivity to this pattern of relationship
between at-issueness and “No” versus “Wait no”
response types has been well attested in psycholin-
guistic experiments. Syrett and Koev (2015) in
their Experiment 1 find that when selecting be-
tween responses that target not-at-issue content in
an embedded clause of a prior utterance, humans
are much more likely to choose a response of type
“Wait no” (77%) than of type “No” (23%).
2
By
contrast, when selecting between responses that
target at-issue content in a main clause of a prior
utterance, humans’ rate of selection of these two
response types is roughly even. In their Experi-
ment 2, Syrett and Koev (2015) furthermore show
that when selecting among “No” type responses,
humans have a strong preference for choosing
those that target at-issue content of prior utterances
(73.9%) compared to not-at-issue content (26.1%).
Leveraging this knowledge of human sensitivi-
ties, we make use of diagnostics modeled after the
Rejection & Peripherality Test to examine whether
PLMs are also sensitive to these discourse dynam-
ics involving at-issueness and response type. For
structuring not-at-issue content, we focus on ARCs
as used in the examples above.
3.2 Ellipsis
The examples above make critical use of the gram-
matical phenomenon of ellipsis: use of abbrevi-
ated verb phrases that refer back to previous verb
phrases. In ellipsis, typically an auxiliary verb (like
did, does, would) remains as the verb in the elided
verb phrase—for instance: “No, he didn’t” is an
elided form that could refer back to “The nurse
adopted a rescue dog, standing in for the longer
phrase “No, he didn’t adopt a rescue dog. Ellipsis
is another critical component of forming responses
in dialogue, and it plays an important prerequisite
role in assessing at-issueness. For these reasons,
we also test models’ grasp of ellipsis in dialogue.
2
The specific wordings in this experiment were “Hey, wait
a minute,” and “That’s not true.
摘要:

“No,theydidnot”:Dialogueresponsedynamicsinpre-trainedlanguagemodelsSangheeJ.Kim1,LangYu2,AllysonEttinger11DepartmentofLinguistics,UniversityofChicago2Meta{sangheekim,aettinger}@uchicago.edu,langyu@fb.comAbstractAcriticalcomponentofcompetenceinlan-guageisbeingabletoidentifyrelevantcompo-nentsofanutte...

展开>> 收起<<
No they did not Dialogue response dynamics in pre-trained language models Sanghee J. Kim1 Lang Yu2 Allyson Ettinger1.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:351.13KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注