Multiview Contextual Commonsense Inference A New Dataset and Task Siqi ShenF Deepanway GhosalF

2025-05-02 0 0 658.91KB 16 页 10玖币
侵权投诉
Multiview Contextual Commonsense Inference: A New Dataset and Task
Siqi ShenFDeepanway GhosalF
DeCLaRe
Navonil Majumder
DeCLaRe
Henry Lim
DeCLaRe
Rada Mihalcea Soujanya Poria
DeCLaRe
University of Michigan, USA
DeCLaRe
DeCLaRe Lab, Singapore University of Technology and Design, Singapore
{shensq,mihalcea}@umich.edu
deepanway_ghosal@mymail.sutd.edu.sg
{navonil_majumder@,henry_lim@,sporia@}sutd.edu.sg
CICEROv2is available at: https://declare-lab.github.io/CICERO
Abstract
Multiview contextual commonsense inference
is the task of determining commonsense ex-
planations around the events in a dyadic dia-
logue, where multiview refers to the character-
istic that there can be multiple plausible but
independent inferences. Producing a coherent
and non-trivial explanation requires awareness
of the dialogue’s structure and how an event
is grounded in the context, yet there is a lack
of high-quality resources dedicated to the task.
In this work, we create CICEROv2, a dataset
consisting of 8,351 instances from 2,379 dia-
logues, containing multiple human-written an-
swers for each contextual commonsense infer-
ence question, representing a type of explana-
tion on cause, subsequent event, motivation,
and emotional reaction. We show that the in-
ferences in CICEROv2are of higher seman-
tic diversity than other contextual common-
sense inference datasets. In addition, we pro-
pose a collection of pretraining objectives, in-
cluding concept denoising and utterance sort-
ing, to help adapt language models for the mul-
tiview contextual commonsense inference task.
Evaluation results show the effectiveness of
the pretraining stage, as there is a universal im-
provement in accuracy for all inference types.
1 Introduction
Perhaps unwittingly, commonsense is a key part of
daily conversations. Rather than being explicit, in-
terlocutors usually rely on shared context and com-
monsense knowledge to make sense of the inbound
utterances and respond as succinctly as possible
to maximize information flow (Grice,1975). The
scope of this shared context, however, is quite often
broad enough to span beyond the scope of the given
conversation. Understanding various dimensions of
such conversations for NLP systems is thus rather
challenging without the aid of commonsense-based
reasoning. Some of the useful dimensions, such as
cause, subsequent events, and motivation behind
some given utterance, can be extracted from the ex-
plicit context. Otherwise, the broader context that
fits the explicit context must be imagined. Either
way, commonsense knowledge must be employed
with the context in mind to broaden the context if
necessary and arrive at a fitting explanation. In-
ferring such explanations for various dimensions
with the context and commonsense-based reason-
ing is called contextual commonsense inference.
An accurate understanding of dialogues achieved
through contextual commonsense inference can as-
sist in meaningful indexing, filtering, and searching
of the copious amount of conversational content
available on the internet. Tasks like affect analy-
sis and relation extraction in dialogues may also
benefit from such explanations.
To this end, the CICERO dataset (Ghosal et al.,
2022) collects five dimensions of contextual com-
monsense inferences for utterances in dialogues.
However, for each present dimension-utterance
pair, only one human-annotated explanation is col-
lected. The remaining explanations, if any, are
picked using adversarial filtering (Zellers et al.,
2018a) from a set of fine-tuned language model-
generated explanations. These auto-generated ex-
planations are both lexically and semantically very
close to the human-annotated explanation. This
contradicts the intuitive multiview nature of these
explanations, where multiple disparate explana-
tions for the same event may exist (see Fig. 1).
CICEROv2
seeks to address this issue by col-
lecting multiple distinct human-annotated expla-
nations, leading to the enrichment of the down-
stream models for contextual commonsense infer-
ence task.
The availability of multiple correct answers
brings the need for methods that can simulta-
neously select multiple correct answers from a
mixture of correct and incorrect answers given a
context. Ghosal et al. (2022) shows that given a
context, selecting two correct answers is harder
than selecting just one. On CICERO, T5-Large
attains an Exact Match (EM) score of 95% on the
arXiv:2210.02890v2 [cs.CL] 3 Nov 2022
single answer selection task but this score drops to
20% on the multiple answer selection task. Models
need to encode rich commonsense knowledge to
solve this task due to its hardness. In this work,
we attempt to encode commonsense knowledge
to a large pre-trained language model T5-Large by
continuing training it on a dialogue-level common-
sense dataset CICERO (Ghosal et al.,2022) using a
set of commonsense-aware pre-training objectives.
Large pre-trained language models, such as
GPT-2 (Radford et al.,2019) and T5 (Raffel
et al.,2020b), seem attractive frameworks to solve
contextual commonsense inference task. Through
fine-tuning, these models have become state of the
art in several natural language understanding tasks,
such as SuperGLUE (Wang et al.,2019). Addition-
ally, being trained on several hundreds of GB of
text may have endowed these models with much
commonsense knowledge (Petroni et al.,2019).
However, the fine-tuning approach may not
suffice for tasks with limited training samples.
Nonetheless, previous work (Gururangan et al.,
2020;Zhou et al.,2021a) has shown that, prior to
fine-tuning, pre-training with objectives catered to
the target tasks may improve performance on such
tasks. Following this intuition, we propose a set of
self-supervised pre-training objectives to adapt the
language models for the contextual commonsense
inference task, specifically addressing the task of
multi-choice answer selection.
Thus, our contribution in this paper is twofold:
i) we curate
CICEROv2
, containing multiple dis-
tinct contextual commonsense inferences per di-
mension, and ii) we propose a set of pre-training
objectives for contextual commonsense inference
that improves over the vanilla fine-tuning by about
1.9% for the multi-choice answer selection task, de-
fined on both CICERO and CICEROv2datasets.
2 Primer on CICERO
The dialogues in CICERO (Ghosal et al.,2022)
are sourced from three different datasets: Daily-
Dialog (Li et al.,2017), MuTual (Cui et al.,2020),
and DREAM (Sun et al.,2019). All dialogues
are dyadic and their inherent nature is particularly
conducive to qualitatively rich utterance-level
inferences. These annotated inferences are cate-
gorized into five dimensions: cause, subsequent
event, prerequisite, motivation, and emotional
reaction. The tasks proposed on these inferences
require contextual understanding, multi-utterance
reasoning, and commonsense knowledge.
In addition to introducing CICERO, Ghosal et al.
(2022) also defines a multi-choice answer selec-
tion task (MCQ), where the original annotation is
considered as the primary correct answer. The
candidates for the remaining correct and incor-
rect answers are generated using fine-tuned T5
models (Raffel et al.,2020a). Adversarial filter-
ing (Zellers et al.,2018a) is applied to these can-
didates to identify the hard-to-distinguish answers,
which are manually labeled as correct or incorrect.
Drawbacks of CICERO.
The automatically-
generated and labeled-as-correct answers are the
only sources of secondary correct answers in the
CICERO dataset. In total, close to 15% of the
instances contain multiple correct answers (infer-
ences). We empirically analyzed these instances
and found that the adversarial filtering algorithm
favors the selection of alternate answers that are
lexically close to the primary correct answer. As
such, both correct and incorrect answers bear a
relatively high degree of token-level and semantic
similarity with each other as indicated in Table 2in
terms of BLEU, ROUGE-L, CIDER and semantic-
similarity metrics. This belies the multiview nature
of commonsense-based explanations, where mul-
tiple either independent or related explanations of
the same event may exist. This is demonstrated in
Fig. 1where the target utterance “I don’t think so. I
know I’ve put on weight this winter. can be a con-
sequence of multiple possible events. Particularly,
the event of weight gain can be caused by lack of
physical activity and exercise or unhealthy diet or
perhaps both. There are myriad of other possible
factors that may contribute to the weight gain, such
as disease, but those multitudes of possibilities or
views are not captured in CICERO.
3 CICEROv2
To address the drawbacks highlighted earlier, we
introduce
CICEROv2
, to improve the general-
ization ability of the models trained on this data.
CICEROv2
contains commonsense inferences
from target utterances of dyadic dialogues sampled
from CICERO. A human annotator is given a dia-
logue with a target utterance and asked a question
about the target utterance. The annotator writes
multiple distinct correct answers and two or more
incorrect answers for the question.
We start by sampling (dialogue,target,ques-
tion) triplets from CICERO. For these instances, we
show the original correct answer from CICERO to
the annotators to avoid duplication. The annotators
Linda would you care for some
candies or cookies?
No don't try to tend me. I'm becoming
chubby and I have to slender down.
You are not really chubby. You are
actually thin enough.
I don't think so. I know I've put on weight this
winter.
Linda starts a diet and tries to
lose weight
Linda starts to avoid unhealthy
foods such as candies and cookies
Excess sugar
Fat buildup
Weight gain
Has Property
Causes
Causes
Healthy
diet
Caloric deficit
Weight loss
Causes
Causes
Linda didn’t exercise regularly
during the winter
Winter
Stay indoors
Less physical
activity like exercise
Caloric surplus
Weight gain
Causes
Causes
Causes
Causes
Linda was following an unhealthy diet
Unhealthy
diet
Excess sugar
Fat buildup
Weight
gain
Causes
Causes
Has Property
Causes Subsequent Events
Weight gain
Unhappiness
Encouragement
The listener encourages Linda to
maintain her diet
Causes
Curbed By
The listener is confused by Linda’s
concern about her physique
Linda looks
the same
Weight gain
Confusion
Contradicts
Causes
Emotional!
Reactions
Figure 1: Demonstration of multiple possible contextual explanations through multiple commonsense-based mechanisms.
write at least one more correct and at least two in-
correct answers that are semantically distinct from
each other and the answer from CICERO. This
original answer and the newly written answer(s)
constitute the set of answers for these instances.
We also sample new (dialogue,target,question)
triplets, not present in CICERO. The annotators
write at least two correct answers and two incorrect
answers for these instances.
The above strategy ensures that all instances in
CICEROv2
have at least two correct and two in-
correct answers.
3.1 Annotation Instructions
Guidelines for Writing Correct Answers.
We
instruct the annotators to write context-congruent
correct answers that are grammatically sound and
concise sentences. The answers may contain some
important terms from the context and must be
commonsense-based, factual, and plausible.
Guidelines for Writing Incorrect Answers.
The incorrect answers are also grammatically cor-
rect and concise but must contradict some infor-
mation in the dialogue. Incorrect answers should
contain some important terms from the context and
must be commonsense-based and factual. Annota-
tors were instructed not to write incorrect answers
that are clearly outlandish in the given context.
We also ask the annotators to write sufficiently
diverse and distinct correct and incorrect answers.
This diversity may stem from token-level differ-
ences, semantic differences, or various likely spec-
ulative scenarios around the given context. Human-
written diverse incorrect answers is a major contri-
bution in
CICEROv2
, which is absent CICERO.
We discuss the diversity of answers in CICERO
and CICEROv2in more detail in §3.3.
We collect inferences across four different di-
mensions in
CICEROv2
:subsequent event,cause,
motivation, and emotional reaction w.r.t the target.
Prerequisite dimension from CICERO is skipped as
the annotators found it difficult to distinguish from
cause during annotation training. The annotators
are asked to write correct and incorrect answer(s)
to the questions representing each of the four in-
ference dimensions. We expand on the annotation
instructions outlined by Ghosal et al. (2022) for
answer writing. Both correct and incorrect answers
may describe either an overt or a speculative sce-
nario, as illustrated in CICERO. An overt answer is
explicitly or implicitly present in the dialogue con-
text. However, when a dialogue does not explicitly
or implicitly hold the answer to a question about a
particular target, the answer is speculated within
the dialogue context imagined and broadened using
commonsense and world knowledge.
The following illustrates the questions and pos-
sible correct and incorrect answer(s) for the (dia-
logue,target) pair shown in Fig. 2.
What's that smell?
No, I'm making chocolate banana
cookies
I smell something dierent, pears?
At first I was going to use the oranges, but
I think these will taste better
Are you making a chocolate cake?
Figure 2:
A (dialogue, target) pair; the utterance with the red
border is the target.
Q1. What subsequent event happens (overt)
or could happen (speculative) following the Tar-
get?
The annotators write about the event that hap-
pens or could happen following the target. They
are also made aware that at times such subsequent
events could be triggered by the target itself.
CICERO Correct Answer:
The speaker made de-
licious banana cookies.
Incorrect Answers:
i)
The speaker is making a chocolate cake. ii) The
speaker was baking a cake.
CICEROv2Correct Answer:
The speaker threw
the leftover oranges into the rubbish bin.
Incor-
rect Answers:
i) The listener requests to taste the
orange cookies. ii) The listener started to make
orange chocolate cookies.
Q2. What is the event that directly causes
(overt) or could cause (speculative) Target?
The
annotators consider the events antecedent to the
target that cause or likely cause the target.
CICERO Correct Answer:
The speaker was mak-
ing banana cookies.
Incorrect Answers:
i) The
speaker is making a chocolate cake. ii) The speaker
was baking a cake.
CICEROv2Correct Answers:
i) It is too diffi-
cult to process the orange pulp. ii) The orange
smell doesn’t match well with chocolate.
Incor-
rect Answers
i) The orange smell matches much
better with chocolate compared with banana. ii)
The speaker loves the taste of orange and the tex-
ture of its pulp.
Q3. What is the emotion or basic human drive
that motivates or could motivate Target?
We ask
the annotators to consider the basic human drives
and needs of the speaker of the target utterances.
The basic human drives include food, water, cloth-
ing, rest, safety, friends, relationships, enjoyment,
etc. Do or may any of the human drives/states of
mind/emotional feelings motivate the target?
CICERO Answers: Instance not present.
CICEROv2Correct Answers:
i) The speaker
wants the cookies to be delicious. ii) The oranges
were not sweet enough for the cookies.
Incorrect
Answers:
i) The speaker prefers spicy cookies. ii)
The speaker wants to use the leftover pears before
they go bad.
Q4. What is the possible emotional reaction of
the listener: A (or B)?
What could be the possible
emotional reaction of the listener to the target?
The annotators capture the appropriate emotion
of the listener using the emotion terms listed in
the Appendix in Table 7using verbatim or related
words (e.g., anxious, confused, interested).
CICERO Correct Answer:
The listener is excited
to eat the cookies.
Incorrect Answers
i) The lis-
tener is excited to eat the salad. ii) The listener is
excited to eat the muffins instead.
CICEROv2Correct Answer:
The listener feels
pity that they cannot have orange cookies.
Incor-
rect Answers
i) The listener is happy to taste or-
ange cookies. ii) The listener is annoyed by the
banana smell.
3.2 Sampling of Dialogues and Targets
From the (dialogue,target,question) triplets in CI-
CERO, the following criteria is used to subsample
a set of triplets for annotation:
The target utterance must contain at least one
non-stop verb word and more non-stop words than
stop words.
If the dialogue is from DailyDialog, then the
dialogue-act label of the target utterance must ei-
ther be directive or commissive (Li et al.,2017).
These sampled target utterances often describe
some action or activity, which the annotators found
easier to annotate across the four question types.
Overall, 17% of the correct answer annotations
in
CICEROv2
also appear in CICERO. However,
there is no overlap between the incorrect answers in
the two datasets. Crucially,
CICEROv2
contains
all manually annotated and semantically diverse
set of commonsense-based correct and incorrect
answers that capture distinct perspectives or views.
We expand upon the diversity of the answers next.
3.3 Diversity of Answers
Answers in
CICEROv2
are significantly more di-
verse than CICERO. We observe this trend among
both correct and incorrect answers. As such,
CICEROv2
provides much richer and diversified
摘要:

MultiviewContextualCommonsenseInference:ANewDatasetandTaskSiqiShenFDeepanwayGhosalFNavonilMajumderHenryLimRadaMihalceaSoujanyaPoriaUniversityofMichigan,USADeCLaReLab,SingaporeUniversityofTechnologyandDesign,Singapore{shensq,mihalcea}@umich.edudeepanway_ghosal@mymail.sutd.edu.sg{navonil_majumder@,hen...

展开>> 收起<<
Multiview Contextual Commonsense Inference A New Dataset and Task Siqi ShenF Deepanway GhosalF.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:658.91KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注