No they did not Dialogue response dynamics in pre-trained language models Sanghee J. Kim1 Lang Yu2 Allyson Ettinger1

2025-05-02 0 0 351.13KB 12 页 10玖币

侵权投诉

“No, they did not”: Dialogue response dynamics in pre-trained language

models

Sanghee J. Kim1, Lang Yu2, Allyson Ettinger1

1Department of Linguistics, University of Chicago

2Meta

{sangheekim,aettinger}@uchicago.edu,langyu@fb.com

Abstract

A critical component of competence in lan-

guage is being able to identify relevant compo-

nents of an utterance and reply appropriately.

In this paper we examine the extent of such

dialogue response sensitivity in pre-trained

language models, conducting a series of ex-

periments with a particular focus on sensi-

tivity to dynamics involving phenomena of

at-issueness and ellipsis. We ﬁnd that models

show clear sensitivity to a distinctive role of

embedded clauses, and a general preference

for responses that target main clause content of

prior utterances. However, the results indicate

mixed and generally weak trends with respect

to capturing the full range of dynamics in-

volved in targeting at-issue versus not-at-issue

content. Additionally, models show fundamen-

tal limitations in grasp of the dynamics gov-

erning ellipsis, and response selections show

clear interference from superﬁcial factors that

outweigh the inﬂuence of principled discourse

constraints.

1 Introduction

Competence in language involves understanding

complex principles governing relevance of previous

content and dynamics of referring back to that con-

tent. Certain parts of an utterance are more central

and more likely to receive a response than others,

and the pragmatic and grammatical rules govern-

ing responses in dialogue interact with the nature

of the content being responded to. Humans are

highly sensitive to these distinctions, and we can

expect these sensitivities to be critical for robust

models in NLP, and especially for dialogue. Here

we examine sensitivity to these dialogue response

dynamics in pre-trained language models (PLMs).

PLMs are now used as foundation for nearly ev-

ery downstream NLP task, including dialogue ap-

plications (e.g., Upadhye et al.,2020;Koto et al.,

2021). The impressive downstream performance

enabled by these models has raised important ques-

tions about what types of linguistic competence

are being learned during pre-training—and though

there is a growing body of work answering aspects

of this question, topics of pragmatic and dialogue

competence have been relatively understudied. In

this paper we focus on addressing this gap, and

in particular on understanding the extent to which

PLMs develop sensitivity to dynamics governing

responses in dialogue. Though these PLMs are

not trained to engage in dialogue per se, they can

be expected to encounter dialogue during training

(in, for instance, novels), so it is not unreasonable

to expect that they may learn about such dialogue

dynamics along with other linguistic competences.

The strength of these models’ sensitivity to such

dynamics has important implications for robust-

ness in dialogue applications, since a strong grasp

of dialogue dynamics in standard PLMs stands to

reduce ﬁne-tuning needs and enable more robust

downstream behaviors.

We begin with the notion of at-issueness. A com-

ponent of an utterance is considered at-issue if it is

part of the “main point” of the utterance—this is

to be contrasted with side comments or mentions

of background knowledge, which are not the main

focus of the sentence. As we lay out in Section 3.1,

the distinction between at-issue and not-at-issue

content of an utterance is reﬂected directly in the

nature of responses to that utterance. We thus exam-

ine models’ preferences for different responses, to

assess whether the preferences reﬂect understand-

ing of at-issueness and how to respond to it. We

ﬁnd that models show consistent preference to tar-

get at-issue (main clause) content, but mixed and

overall fairly weak sensitivity when it comes to the

full range of dynamics involved with at-issueness.

These assessments of at-issueness sensitivity are

also critically reliant on another aspect of dialogue

response dynamics: ellipsis. We thus additionally

make a closer examination of the extent to which

constraints from context dictate models’ selection

arXiv:2210.02526v1 [cs.CL] 5 Oct 2022

of auxiliary verbs (such as did, does, would) in el-

lipsis constructions. We ﬁnd that although models

often favor an auxiliary verb that targets the main

clause, they also make frequent errors, and they

very rarely favor both of the auxiliary forms that

align with the prior context. These results further-

more raise the important possibility that models

are highly sensitive to preferences for particular

auxiliary verb types, and that this could drive the

at-issueness results as well. With this in mind we

revisit the at-issueness experiments, and ﬁnd that,

indeed, there are substantial differences in mod-

els’ preferences depending on the identity of the

particular verb that targets the relevant content.

Overall, our results suggest that PLMs have

non-trivial gaps in their understanding of response

dynamics in dialogue. Our results also indicate

certain differences between models: BERT and

RoBERTa show strong bias toward selecting re-

sponses that target the most recent and/or main

clause content, while other models show more re-

liance on individual auxiliary verb properties. In

all cases the results indicate that these PLMs have

not yet achieved ideal sensitivity to response dy-

namics involving at-issueness and ellipsis, and that

effectiveness in dialogue will beneﬁt from addi-

tional training approaches. We make all datasets

and code available for further testing.1

2 Related work

Recent years have seen extensive work on analy-

sis of PLMs. Methodologically, some of the most

popular analysis paradigms targeting model em-

beddings have included classiﬁcation-based prob-

ing (e.g., Kim et al.,2019;Zhang et al.,2019) and

correlation with similarity judgments (Finkelstein

et al.,2001;Gerz et al.,2016;Conneau and Kiela,

2018). Other work has analyzed PLMs by elicit-

ing and analyzing output predictions (Linzen et al.,

2016;Goldberg,2019). Our work here focuses pri-

marily on the latter methodology, examining and

comparing model output probabilities—however,

our analysis in Section 5.4 uses classiﬁcation-based

probing. Our work also builds on approaches im-

plementing specialized sentence generation sys-

tems that produce large annotated datasets (Ettinger

et al.,2018;McCoy et al.,2019).

Analyses of PLMs have targeted a variety of

types of linguistic competence. In particular, a

1https://github.com/sangheek16/

dialogue-response-dynamics

large body of work has studied the extent to which

PLMs capture syntactic and semantic informa-

tion (Linzen et al.,2016;Peters et al.,2018;Bacon

and Regier,2019;Hewitt and Manning,2019;Ten-

ney et al.,2019). Less work has addressed the

extent to which PLMs show sensitivity to prag-

matic and discourse information, as we focus on

in this paper. Kurfalı and Östling (2021) study

multilingual models in various discourse tasks via

zero-shot learning. Pandia et al. (2021) investigate

LMs’ pragmatic competence to predict discourse

connectives. Pitler and Nenkova (2009) report that

a supervised classiﬁer is able to identify discourse

relations given syntactic features along with con-

nectives. Patterson and Kehler (2013) implement

a similar idea and show that classiﬁers are able

to predict the presence of a connective based on

shallow linguistic cues. Koto et al. (2021) explore

pre-trained language models’ capability in captur-

ing discourse level relations. We complement this

existing work by branching into new areas of prag-

matic and discourse knowledge, examining models’

sensitivity to dialogue response dynamics.

Another closely related literature is that in which

PLMs, especially transformer LMs, are used for

building dialogue systems directly. Le et al. (2019)

propose Multimodal Transformer Networks (MTN)

for visual-grounded dialogue tasks. Other work

investigates topic-driven language models for emo-

tion detection in dialogues (Zhu et al.,2021).

Oluwatobi and Mueller (2020) report state-of-

the-art performance on dialogue generation using

transformer-based models. There are also language

models designed for and trained on dialogue or

conversation, such as TransferTransfo (Wolf et al.,

2019), PLATO (Bao et al.,2020), ConveRT (Hen-

derson et al.,2020), TOD-BERT (Wu et al.,2020),

DialoGPT (Zhang et al.,2020), DialogBERT (Gu

et al.,2021), and LaMDA (Thoppilan et al.,2022).

Here we focus on clarifying the extent to which

PLMs pre-trained in the standard paradigm can de-

velop knowledge of dialogue dynamics prior to any

specialized dialogue training. This line of inquiry

serves to broaden our general understanding of lin-

guistic competence of standard PLMs, and also

has implications for use of these standard PLMs as

foundation for further dialogue-speciﬁc training.

3 Background

3.1 At-issueness

Our analyses focus on the dynamics that govern

responses in dialogue, and aspects of prior utter-

ances that they target. The ﬁrst notion that we

test for in PLMs is sensitivity to “at-issueness.”

At-issueness refers to content’s status as the main

point of the utterance—to be contrasted with not-at-

issue content, such as side comments and assumed

knowledge (see Potts (2005) for a comprehensive

overview). Humans are sensitive to which content

in an utterance is “at-issue” and which content is

not—and this sensitivity is reﬂected in dialogue

response dynamics. Consider the utterance in (1).

(1)

The nurse, who has interest in French cui-

sine, adopted a rescue dog.

If a listener responds to (1) with “No” or “That’s

not true,” they would most likely be objecting to

the claim that the nurse adopted a rescue dog, since

this is the main point (at-issue content) of (1). It

is less likely that they would be objecting to the

side comment about French cuisine. As a result, a

response of “No, he didn’t (adopt a rescue dog),”

would be natural, while “No, he doesn’t (have in-

terest in French cuisine)” would be less so.

This intuition drives a key diagnostic used to dis-

tinguish at-issue and not-at-issue content, known

as the

Rejection & Peripherality Test

(or the As-

sent/Dissent Test) (Amaral et al.,2007;Koev,2013;

Syrett and Koev,2015). The “rejection” compo-

nent of this test is illustrated in (2). Speaker B

replies to Speaker A’s utterance with a

rejection

(“No”), and uses the elliptical verb phrase (“did

not”) that targets the (at-issue) content of the main

clause (“The nurse adopted a rescue dog.”), for

a natural and appropriate response. In contrast,

Speaker B

rejects the (not-at-issue) content inside

the appositive relative clause (ARC), which is less

natural (indicated with ‘#’).

(2) a.

Speaker A: “The nurse, who has inter-

est in French cuisine, adopted a rescue

dog.”

Speaker B

: “No, he did not.” [Target-

ing at-issue content]

Speaker B

: #“No, he does not.” [Tar-

geting not-at-issue content]

There is, however, a more natural way to object

to not-at-issue content: pausing the dialogue to

question a side comment or assumption. This is

highlighted in the

peripherality test

, which uses

phrases like, “Hey, wait a minute” (von Fintel,

2004;Amaral et al.,2007), or “Wait, this is pe-

ripheral to your point but...” (Koev,2018) in order

to make targeting not-at-issue content more accept-

able. We show an example in (3).

(3) a.

Speaker A: “The nurse, who has inter-

est in French cuisine, adopted a rescue

dog.”

Speaker B: “Wait no, he does not (have

interest in French cuisine).” [Targeting

not-at-issue content]

Human sensitivity to this pattern of relationship

between at-issueness and “No” versus “Wait no”

response types has been well attested in psycholin-

guistic experiments. Syrett and Koev (2015) in

their Experiment 1 ﬁnd that when selecting be-

tween responses that target not-at-issue content in

an embedded clause of a prior utterance, humans

are much more likely to choose a response of type

“Wait no” (77%) than of type “No” (23%).

contrast, when selecting between responses that

target at-issue content in a main clause of a prior

utterance, humans’ rate of selection of these two

response types is roughly even. In their Experi-

ment 2, Syrett and Koev (2015) furthermore show

that when selecting among “No” type responses,

humans have a strong preference for choosing

those that target at-issue content of prior utterances

(73.9%) compared to not-at-issue content (26.1%).

Leveraging this knowledge of human sensitivi-

ties, we make use of diagnostics modeled after the

Rejection & Peripherality Test to examine whether

PLMs are also sensitive to these discourse dynam-

ics involving at-issueness and response type. For

structuring not-at-issue content, we focus on ARCs

as used in the examples above.

3.2 Ellipsis

The examples above make critical use of the gram-

matical phenomenon of ellipsis: use of abbrevi-

ated verb phrases that refer back to previous verb

phrases. In ellipsis, typically an auxiliary verb (like

did, does, would) remains as the verb in the elided

verb phrase—for instance: “No, he didn’t” is an

elided form that could refer back to “The nurse

adopted a rescue dog,” standing in for the longer

phrase “No, he didn’t adopt a rescue dog.” Ellipsis

is another critical component of forming responses

in dialogue, and it plays an important prerequisite

role in assessing at-issueness. For these reasons,

we also test models’ grasp of ellipsis in dialogue.

The speciﬁc wordings in this experiment were “Hey, wait

a minute,” and “That’s not true.”

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

No,theydidnot:Dialogueresponsedynamicsinpre-trainedlanguagemodelsSangheeJ.Kim1,LangYu2,AllysonEttinger11DepartmentofLinguistics,UniversityofChicago2Meta{sangheekim,aettinger}@uchicago.edu,langyu@fb.comAbstractAcriticalcomponentofcompetenceinlan-guageisbeingabletoidentifyrelevantcompo-nentsofanutte...

展开>> 收起<<

No they did not Dialogue response dynamics in pre-trained language models Sanghee J. Kim1 Lang Yu2 Allyson Ettinger1.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

No they did not Dialogue response dynamics in pre-trained language models Sanghee J. Kim1 Lang Yu2 Allyson Ettinger1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: