Multiview Contextual Commonsense Inference A New Dataset and Task Siqi ShenF Deepanway GhosalF

2025-05-02 1 0 658.91KB 16 页 10玖币

侵权投诉

Multiview Contextual Commonsense Inference: A New Dataset and Task

Siqi ShenFDeepanway GhosalF

DeCLaRe

Navonil Majumder

DeCLaRe

Henry Lim

DeCLaRe

Rada Mihalcea Soujanya Poria

DeCLaRe

University of Michigan, USA

DeCLaRe

DeCLaRe Lab, Singapore University of Technology and Design, Singapore

{shensq,mihalcea}@umich.edu

deepanway_ghosal@mymail.sutd.edu.sg

{navonil_majumder@,henry_lim@,sporia@}sutd.edu.sg

CICEROv2is available at: https://declare-lab.github.io/CICERO

Abstract

Multiview contextual commonsense inference

is the task of determining commonsense ex-

planations around the events in a dyadic dia-

logue, where multiview refers to the character-

istic that there can be multiple plausible but

independent inferences. Producing a coherent

and non-trivial explanation requires awareness

of the dialogue’s structure and how an event

is grounded in the context, yet there is a lack

of high-quality resources dedicated to the task.

In this work, we create CICEROv2, a dataset

consisting of 8,351 instances from 2,379 dia-

logues, containing multiple human-written an-

swers for each contextual commonsense infer-

ence question, representing a type of explana-

tion on cause, subsequent event, motivation,

and emotional reaction. We show that the in-

ferences in CICEROv2are of higher seman-

tic diversity than other contextual common-

sense inference datasets. In addition, we pro-

pose a collection of pretraining objectives, in-

cluding concept denoising and utterance sort-

ing, to help adapt language models for the mul-

tiview contextual commonsense inference task.

Evaluation results show the effectiveness of

the pretraining stage, as there is a universal im-

provement in accuracy for all inference types.

1 Introduction

Perhaps unwittingly, commonsense is a key part of

daily conversations. Rather than being explicit, in-

terlocutors usually rely on shared context and com-

monsense knowledge to make sense of the inbound

utterances and respond as succinctly as possible

to maximize information ﬂow (Grice,1975). The

scope of this shared context, however, is quite often

broad enough to span beyond the scope of the given

conversation. Understanding various dimensions of

such conversations for NLP systems is thus rather

challenging without the aid of commonsense-based

reasoning. Some of the useful dimensions, such as

cause, subsequent events, and motivation behind

some given utterance, can be extracted from the ex-

plicit context. Otherwise, the broader context that

ﬁts the explicit context must be imagined. Either

way, commonsense knowledge must be employed

with the context in mind to broaden the context if

necessary and arrive at a ﬁtting explanation. In-

ferring such explanations for various dimensions

with the context and commonsense-based reason-

ing is called contextual commonsense inference.

An accurate understanding of dialogues achieved

through contextual commonsense inference can as-

sist in meaningful indexing, ﬁltering, and searching

of the copious amount of conversational content

available on the internet. Tasks like affect analy-

sis and relation extraction in dialogues may also

beneﬁt from such explanations.

To this end, the CICERO dataset (Ghosal et al.,

2022) collects ﬁve dimensions of contextual com-

monsense inferences for utterances in dialogues.

However, for each present dimension-utterance

pair, only one human-annotated explanation is col-

lected. The remaining explanations, if any, are

picked using adversarial ﬁltering (Zellers et al.,

2018a) from a set of ﬁne-tuned language model-

generated explanations. These auto-generated ex-

planations are both lexically and semantically very

close to the human-annotated explanation. This

contradicts the intuitive multiview nature of these

explanations, where multiple disparate explana-

tions for the same event may exist (see Fig. 1).

CICEROv2

seeks to address this issue by col-

lecting multiple distinct human-annotated expla-

nations, leading to the enrichment of the down-

stream models for contextual commonsense infer-

ence task.

The availability of multiple correct answers

brings the need for methods that can simulta-

neously select multiple correct answers from a

mixture of correct and incorrect answers given a

context. Ghosal et al. (2022) shows that given a

context, selecting two correct answers is harder

than selecting just one. On CICERO, T5-Large

attains an Exact Match (EM) score of 95% on the

arXiv:2210.02890v2 [cs.CL] 3 Nov 2022

single answer selection task but this score drops to

20% on the multiple answer selection task. Models

need to encode rich commonsense knowledge to

solve this task due to its hardness. In this work,

we attempt to encode commonsense knowledge

to a large pre-trained language model T5-Large by

continuing training it on a dialogue-level common-

sense dataset CICERO (Ghosal et al.,2022) using a

set of commonsense-aware pre-training objectives.

Large pre-trained language models, such as

GPT-2 (Radford et al.,2019) and T5 (Raffel

et al.,2020b), seem attractive frameworks to solve

contextual commonsense inference task. Through

ﬁne-tuning, these models have become state of the

art in several natural language understanding tasks,

such as SuperGLUE (Wang et al.,2019). Addition-

ally, being trained on several hundreds of GB of

text may have endowed these models with much

commonsense knowledge (Petroni et al.,2019).

However, the ﬁne-tuning approach may not

sufﬁce for tasks with limited training samples.

Nonetheless, previous work (Gururangan et al.,

2020;Zhou et al.,2021a) has shown that, prior to

ﬁne-tuning, pre-training with objectives catered to

the target tasks may improve performance on such

tasks. Following this intuition, we propose a set of

self-supervised pre-training objectives to adapt the

language models for the contextual commonsense

inference task, speciﬁcally addressing the task of

multi-choice answer selection.

Thus, our contribution in this paper is twofold:

i) we curate

CICEROv2

, containing multiple dis-

tinct contextual commonsense inferences per di-

mension, and ii) we propose a set of pre-training

objectives for contextual commonsense inference

that improves over the vanilla ﬁne-tuning by about

1.9% for the multi-choice answer selection task, de-

ﬁned on both CICERO and CICEROv2datasets.

2 Primer on CICERO

The dialogues in CICERO (Ghosal et al.,2022)

are sourced from three different datasets: Daily-

Dialog (Li et al.,2017), MuTual (Cui et al.,2020),

and DREAM (Sun et al.,2019). All dialogues

are dyadic and their inherent nature is particularly

conducive to qualitatively rich utterance-level

inferences. These annotated inferences are cate-

gorized into ﬁve dimensions: cause, subsequent

event, prerequisite, motivation, and emotional

reaction. The tasks proposed on these inferences

require contextual understanding, multi-utterance

reasoning, and commonsense knowledge.

In addition to introducing CICERO, Ghosal et al.

(2022) also deﬁnes a multi-choice answer selec-

tion task (MCQ), where the original annotation is

considered as the primary correct answer. The

candidates for the remaining correct and incor-

rect answers are generated using ﬁne-tuned T5

models (Raffel et al.,2020a). Adversarial ﬁlter-

ing (Zellers et al.,2018a) is applied to these can-

didates to identify the hard-to-distinguish answers,

which are manually labeled as correct or incorrect.

Drawbacks of CICERO.

The automatically-

generated and labeled-as-correct answers are the

only sources of secondary correct answers in the

CICERO dataset. In total, close to 15% of the

instances contain multiple correct answers (infer-

ences). We empirically analyzed these instances

and found that the adversarial ﬁltering algorithm

favors the selection of alternate answers that are

lexically close to the primary correct answer. As

such, both correct and incorrect answers bear a

relatively high degree of token-level and semantic

similarity with each other as indicated in Table 2in

terms of BLEU, ROUGE-L, CIDER and semantic-

similarity metrics. This belies the multiview nature

of commonsense-based explanations, where mul-

tiple either independent or related explanations of

the same event may exist. This is demonstrated in

Fig. 1where the target utterance “I don’t think so. I

know I’ve put on weight this winter.” can be a con-

sequence of multiple possible events. Particularly,

the event of weight gain can be caused by lack of

physical activity and exercise or unhealthy diet or

perhaps both. There are myriad of other possible

factors that may contribute to the weight gain, such

as disease, but those multitudes of possibilities or

views are not captured in CICERO.

3 CICEROv2

To address the drawbacks highlighted earlier, we

introduce

CICEROv2

, to improve the general-

ization ability of the models trained on this data.

CICEROv2

contains commonsense inferences

from target utterances of dyadic dialogues sampled

from CICERO. A human annotator is given a dia-

logue with a target utterance and asked a question

about the target utterance. The annotator writes

multiple distinct correct answers and two or more

incorrect answers for the question.

We start by sampling (dialogue,target,ques-

tion) triplets from CICERO. For these instances, we

show the original correct answer from CICERO to

the annotators to avoid duplication. The annotators

Linda would you care for some

candies or cookies?

No don't try to tend me. I'm becoming

chubby and I have to slender down.

You are not really chubby. You are

actually thin enough.

I don't think so. I know I've put on weight this

winter.

Linda starts a diet and tries to

lose weight

Linda starts to avoid unhealthy

foods such as candies and cookies

Candies and cookies

Excess sugar

Fat buildup

Weight gain

Has Property

Causes

Healthy

diet

Caloric deﬁcit

Weight loss

Causes

Linda didn’t exercise regularly

during the winter

Winter

Stay indoors

Less physical

activity like exercise

Caloric surplus

Weight gain

Causes

Linda was following an unhealthy diet

Unhealthy

diet

Excess sugar

Fat buildup

Weight

gain

Causes

Has Property

Causes Subsequent Events

Weight gain

Unhappiness

Encouragement

The listener encourages Linda to

maintain her diet

Causes

Curbed By

The listener is confused by Linda’s

concern about her physique

Linda looks

the same

Weight gain

Confusion

Contradicts

Causes

Emotional!

Reactions

Figure 1: Demonstration of multiple possible contextual explanations through multiple commonsense-based mechanisms.

write at least one more correct and at least two in-

correct answers that are semantically distinct from

each other and the answer from CICERO. This

original answer and the newly written answer(s)

constitute the set of answers for these instances.

We also sample new (dialogue,target,question)

triplets, not present in CICERO. The annotators

write at least two correct answers and two incorrect

answers for these instances.

The above strategy ensures that all instances in

CICEROv2

have at least two correct and two in-

correct answers.

3.1 Annotation Instructions

Guidelines for Writing Correct Answers.

instruct the annotators to write context-congruent

correct answers that are grammatically sound and

concise sentences. The answers may contain some

important terms from the context and must be

commonsense-based, factual, and plausible.

Guidelines for Writing Incorrect Answers.

The incorrect answers are also grammatically cor-

rect and concise but must contradict some infor-

mation in the dialogue. Incorrect answers should

contain some important terms from the context and

must be commonsense-based and factual. Annota-

tors were instructed not to write incorrect answers

that are clearly outlandish in the given context.

We also ask the annotators to write sufﬁciently

diverse and distinct correct and incorrect answers.

This diversity may stem from token-level differ-

ences, semantic differences, or various likely spec-

ulative scenarios around the given context. Human-

written diverse incorrect answers is a major contri-

bution in

CICEROv2

, which is absent CICERO.

We discuss the diversity of answers in CICERO

and CICEROv2in more detail in §3.3.

We collect inferences across four different di-

mensions in

CICEROv2

:subsequent event,cause,

motivation, and emotional reaction w.r.t the target.

Prerequisite dimension from CICERO is skipped as

the annotators found it difﬁcult to distinguish from

cause during annotation training. The annotators

are asked to write correct and incorrect answer(s)

to the questions representing each of the four in-

ference dimensions. We expand on the annotation

instructions outlined by Ghosal et al. (2022) for

answer writing. Both correct and incorrect answers

may describe either an overt or a speculative sce-

nario, as illustrated in CICERO. An overt answer is

explicitly or implicitly present in the dialogue con-

text. However, when a dialogue does not explicitly

or implicitly hold the answer to a question about a

particular target, the answer is speculated within

the dialogue context imagined and broadened using

commonsense and world knowledge.

The following illustrates the questions and pos-

sible correct and incorrect answer(s) for the (dia-

logue,target) pair shown in Fig. 2.

What's that smell?

No, I'm making chocolate banana

I smell something diﬀerent, pears?

At ﬁrst I was going to use the oranges, but

I think these will taste better

Are you making a chocolate cake?

Figure 2:

A (dialogue, target) pair; the utterance with the red

border is the target.

Q1. What subsequent event happens (overt)

or could happen (speculative) following the Tar-

get?

The annotators write about the event that hap-

pens or could happen following the target. They

are also made aware that at times such subsequent

events could be triggered by the target itself.

CICERO Correct Answer:

The speaker made de-

licious banana cookies.

Incorrect Answers:

The speaker is making a chocolate cake. ii) The

speaker was baking a cake.

CICEROv2Correct Answer:

The speaker threw

the leftover oranges into the rubbish bin.

Incor-

rect Answers:

i) The listener requests to taste the

orange cookies. ii) The listener started to make

orange chocolate cookies.

Q2. What is the event that directly causes

(overt) or could cause (speculative) Target?

The

annotators consider the events antecedent to the

target that cause or likely cause the target.

CICERO Correct Answer:

The speaker was mak-

ing banana cookies.

Incorrect Answers:

i) The

speaker is making a chocolate cake. ii) The speaker

was baking a cake.

CICEROv2Correct Answers:

i) It is too difﬁ-

cult to process the orange pulp. ii) The orange

smell doesn’t match well with chocolate.

Incor-

rect Answers

i) The orange smell matches much

better with chocolate compared with banana. ii)

The speaker loves the taste of orange and the tex-

ture of its pulp.

Q3. What is the emotion or basic human drive

that motivates or could motivate Target?

We ask

the annotators to consider the basic human drives

and needs of the speaker of the target utterances.

The basic human drives include food, water, cloth-

ing, rest, safety, friends, relationships, enjoyment,

etc. Do or may any of the human drives/states of

mind/emotional feelings motivate the target?

CICERO Answers: Instance not present.

CICEROv2Correct Answers:

i) The speaker

wants the cookies to be delicious. ii) The oranges

were not sweet enough for the cookies.

Incorrect

Answers:

i) The speaker prefers spicy cookies. ii)

The speaker wants to use the leftover pears before

they go bad.

Q4. What is the possible emotional reaction of

the listener: A (or B)?

What could be the possible

emotional reaction of the listener to the target?

The annotators capture the appropriate emotion

of the listener using the emotion terms listed in

the Appendix in Table 7using verbatim or related

words (e.g., anxious, confused, interested).

CICERO Correct Answer:

The listener is excited

to eat the cookies.

Incorrect Answers

i) The lis-

tener is excited to eat the salad. ii) The listener is

excited to eat the mufﬁns instead.

CICEROv2Correct Answer:

The listener feels

pity that they cannot have orange cookies.

Incor-

rect Answers

i) The listener is happy to taste or-

ange cookies. ii) The listener is annoyed by the

banana smell.

3.2 Sampling of Dialogues and Targets

From the (dialogue,target,question) triplets in CI-

CERO, the following criteria is used to subsample

a set of triplets for annotation:

•

The target utterance must contain at least one

non-stop verb word and more non-stop words than

stop words.

•

If the dialogue is from DailyDialog, then the

dialogue-act label of the target utterance must ei-

ther be directive or commissive (Li et al.,2017).

These sampled target utterances often describe

some action or activity, which the annotators found

easier to annotate across the four question types.

Overall, 17% of the correct answer annotations

CICEROv2

also appear in CICERO. However,

there is no overlap between the incorrect answers in

the two datasets. Crucially,

CICEROv2

contains

all manually annotated and semantically diverse

set of commonsense-based correct and incorrect

answers that capture distinct perspectives or views.

We expand upon the diversity of the answers next.

3.3 Diversity of Answers

Answers in

CICEROv2

are signiﬁcantly more di-

verse than CICERO. We observe this trend among

both correct and incorrect answers. As such,

CICEROv2

provides much richer and diversiﬁed

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MultiviewContextualCommonsenseInference:ANewDatasetandTaskSiqiShenFDeepanwayGhosalFNavonilMajumderHenryLimRadaMihalceaSoujanyaPoriaUniversityofMichigan,USADeCLaReLab,SingaporeUniversityofTechnologyandDesign,Singapore{shensq,mihalcea}@umich.edudeepanway_ghosal@mymail.sutd.edu.sg{navonil_majumder@,hen...

展开>> 收起<<

Multiview Contextual Commonsense Inference A New Dataset and Task Siqi ShenF Deepanway GhosalF.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multiview Contextual Commonsense Inference A New Dataset and Task Siqi ShenF Deepanway GhosalF

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: