EMNLP23 Are All Steps Equally Important Benchmarking Essentiality Detection in Event Processes

2025-04-29 0 0 537.03KB 9 页 10玖币
侵权投诉
EMNLP’23
Are All Steps Equally Important?
Benchmarking Essentiality Detection in Event Processes
Haoyu Wang1, Hongming Zhang1, Yueguan Wang2,3, Yuqian Deng1,
Muhao Chen3, Dan Roth1
1Department of Computer and Information Science, UPenn
2Department of Electronic Engineering, THU
3Department of Computer Science, USC
{why16gzl,hzhangal,yuqiand,danroth}@seas.upenn.edu,
wangyuel18@mails.tsinghua.edu.cn, muhaoche@usc.edu
Abstract
Natural language expresses events with vary-
ing granularities, where coarse-grained events
(goals) can be broken down into finer-grained
event sequences (steps). A critical yet over-
looked aspect of understanding event processes
is recognizing that not all step events hold equal
importance toward the completion of a goal. In
this paper, we address this gap by examining
the extent to which current models comprehend
the essentiality of step events in relation to a
goal event. Cognitive studies suggest that such
capability enables machines to emulate human
commonsense reasoning about preconditions
and necessary efforts of everyday tasks. We
contribute a high-quality corpus of (goal, step)
pairs gathered from the community guideline
website WikiHow, with steps manually anno-
tated for their essentiality concerning the goal
by experts. The high inter-annotator agreement
demonstrates that humans possess a consistent
understanding of event essentiality. However,
after evaluating multiple statistical and large-
scale pre-trained language models, we find that
existing approaches considerably underperform
compared to humans. This observation high-
lights the need for further exploration into this
critical and challenging task.1
1 Introduction
As a fundamental semantic primitive unit in hu-
man language (Jackendoff,1992), events play a
pivotal role in facilitating efficient communica-
tion among humans and safe interactions with the
world. Recently, the natural language processing
(NLP) community has made significant strides in
helping machines comprehend events through vari-
ous directions, such as event extraction (Grishman
et al.,2005;Lin et al.,2020), event relation ex-
traction (Ning et al.,2018a;Wang et al.,2020a),
event schema induction (Chambers,2013;Dror
1
The dataset and code are available at http://cogcomp.org/
page/publication_view/1023.
Figure 1: Illustration of steps to obtain a Ph.D. degree,
with essential steps marked by red stars. Successfully
achieving the overall goal typically necessitates the com-
pletion of these crucial steps.
et al.,2023), and event-centric knowledge graph
construction (Tandon et al.,2015;Zhang et al.,
2021a). However, most of these studies primarily
concentrate on modeling horizontal relationships
between events, neglecting the internal components
of an event (i.e., how an individual perceives an
event mention).
Computational and cognitive studies (Schank
and Abelson,1977;Zacks and Tversky,2001) in-
dicate that humans can deconstruct a goal event
into a discrete representation of finer-grained step
events, ultimately facilitating the hierarchical or-
ganization of event-related knowledge. As illus-
trated in Figure 1, when discussing the goal event
of “obtaining a Ph.D. degree”, we understand that
several steps may occur along the way. For in-
stance, one might receive the offer,pass the qual-
ification exam,complete internships,publish pa-
pers, and defend the dissertation. Among these
steps, some are deemed essential to the goal, while
others are not. For instance, passing the qualifi-
cation exam is crucial for earning a Ph.D. degree,
whereas securing an internship is often not a re-
quirement. This ability to discern the essentiality
of steps pertaining to various goals equips humans
with the commonsense needed to address problems
and carry out daily tasks. Similarly, understanding
which steps are essential can profoundly benefit
arXiv:2210.04074v3 [cs.CL] 28 Oct 2023
numerous NLP applications. For instance, event
schema induction (Dror et al.,2023) relies on event-
centric information extraction to derive graphical
representations of events from text. In this context,
understanding essentiality can enhance the qual-
ity of induced schemas by eliminating hallucina-
tions and suggesting the addition of missing crucial
events. Moreover, grasping essentiality can poten-
tially benefit intelligent systems for QA tasks (Bisk
et al.,2020) and task-oriented dialogue process-
ing (Madotto et al.,2020).
In this paper, we aim to assess the depth of un-
derstanding that current NLU models possess re-
garding events in comparison to human cognition.
To accomplish this, we introduce a new cognitively
inspired problem of detecting essential step events
in goal event processes and establish a novel bench-
mark, Essential Step Detection (ESD), to promote
research in this area. Specifically, we gather goals
and their corresponding steps from WikiHow
2
and
manually annotate the essentiality of various steps
in relation to the goal. Our experimental findings
reveal that although humans consistently perceive
event essentiality, current models still have a long
way to go to match this level of understanding.
2 Task and Data
The essential step detection task is defined as fol-
lows: for each goal
G
and one of its sub-steps
S
,
the objective is to predict whether the failure of
S
will result in the failure of
G
. In our formulation,
G
and
S
are presented as natural language sentences.
The construction of ESD includes two steps: (1)
Data Preparation and (2) Essentiality Annotation.
Details of these steps are provided below.
2.1 Data Preparation
WikiHow is a widely-used and well-structured
resource for exploring the relationship between
goal-oriented processes and their corresponding
steps (Koupaee and Wang,2018;Zhang et al.,
2020b). To the best of our knowledge, it is the
most appropriate resource for the purpose of our re-
search. Consequently, we begin by collecting 1,000
random goal-oriented processes from WikiHow. To
avoid oversimplified and overly complex processes,
we only retain those with three to ten steps. Further-
more, given that all WikiHow processes and their
associated steps are carefully crafted by humans,
2
WikiHow is a community website featuring extensive
collections of step-by-step guidelines.
Essential Non-essential Total
Number of instances 1,118 397 1,515
Average step length 17.1 17.4 17.2
Table 1: Dataset statistics of ESD. The average step
length represents the mean number of tokens per step.
the majority of the steps mentioned are essential.
To achieve balance in the dataset, we enlist crowd-
sourcing workers to contribute optional steps (i.e.,
those that could occur as part of the process but are
not essential)
3
. We employ three annotators from
Amazon Mechanical Turk
4
, who are native English
speakers, to provide optional steps for each goal.
To ensure high-quality annotations, we require an-
notators to hold the “Master annotator” title. The
average cost and time for supplying annotations
are 0.1 USD and 32 seconds per instance (approxi-
mately 12 USD per hour).
2.2 Essentiality Annotation
Given that our task necessitates a profound under-
standing of the events and careful consideration,
we ensure annotation quality by employing three
well-trained research assistants from our depart-
ment rather than ordinary annotators to conduct the
essentiality annotations. For each goal-step pair,
annotators are asked to rate it as 0 (non-essential), 1
(essential), or -1 (the step is not a valid step for the
target goal, or the goal/step contains confidential or
hostile information)
5
. Since all annotators are well-
trained and fully comprehend our task, we discard
any pair that is deemed invalid (i.e., -1) by at least
one annotator. This results in 1,515 pairs being
retained. We determine the final label based on
majority voting. The dataset statistics can be found
in Table 1. Altogether, we compile 1,118 essential
and 397 non-essential "goal-step" pairs. The inter-
annotator agreement, measured by Fleiss’s Kappa
6
,
is 0.611, signifying the high quality of ESD.
3 Experiments
Recently, large-scale pre-trained language models
have exhibited impressive language understanding
capabilities. To assess the extent to which these
models truly understand events, we evaluate them
3The survey template is shown in Appendix Figure 2.
4https://www.mturk.com/
5The survey template is shown in Appendix Figure 3.
6
We utilize tools from https://github.com/Shamya/
FleissKappa.
摘要:

EMNLP’23AreAllStepsEquallyImportant?BenchmarkingEssentialityDetectioninEventProcessesHaoyuWang1,HongmingZhang1,YueguanWang2,3,YuqianDeng1,MuhaoChen3,DanRoth11DepartmentofComputerandInformationScience,UPenn2DepartmentofElectronicEngineering,THU3DepartmentofComputerScience,USC{why16gzl,hzhangal,yuqian...

展开>> 收起<<
EMNLP23 Are All Steps Equally Important Benchmarking Essentiality Detection in Event Processes.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:537.03KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注