EMNLP23 Are All Steps Equally Important Benchmarking Essentiality Detection in Event Processes

2025-04-29 0 0 537.03KB 9 页 10玖币

侵权投诉

EMNLP’23

Are All Steps Equally Important?

Benchmarking Essentiality Detection in Event Processes

Haoyu Wang1, Hongming Zhang1, Yueguan Wang2,3, Yuqian Deng1,

Muhao Chen3, Dan Roth1

1Department of Computer and Information Science, UPenn

2Department of Electronic Engineering, THU

3Department of Computer Science, USC

{why16gzl,hzhangal,yuqiand,danroth}@seas.upenn.edu,

wangyuel18@mails.tsinghua.edu.cn, muhaoche@usc.edu

Abstract

Natural language expresses events with vary-

ing granularities, where coarse-grained events

(goals) can be broken down into ﬁner-grained

event sequences (steps). A critical yet over-

looked aspect of understanding event processes

is recognizing that not all step events hold equal

importance toward the completion of a goal. In

this paper, we address this gap by examining

the extent to which current models comprehend

the essentiality of step events in relation to a

goal event. Cognitive studies suggest that such

capability enables machines to emulate human

commonsense reasoning about preconditions

and necessary efforts of everyday tasks. We

contribute a high-quality corpus of (goal, step)

pairs gathered from the community guideline

website WikiHow, with steps manually anno-

tated for their essentiality concerning the goal

by experts. The high inter-annotator agreement

demonstrates that humans possess a consistent

understanding of event essentiality. However,

after evaluating multiple statistical and large-

scale pre-trained language models, we ﬁnd that

existing approaches considerably underperform

compared to humans. This observation high-

lights the need for further exploration into this

critical and challenging task.1

1 Introduction

As a fundamental semantic primitive unit in hu-

man language (Jackendoff,1992), events play a

pivotal role in facilitating efﬁcient communica-

tion among humans and safe interactions with the

world. Recently, the natural language processing

(NLP) community has made signiﬁcant strides in

helping machines comprehend events through vari-

ous directions, such as event extraction (Grishman

et al.,2005;Lin et al.,2020), event relation ex-

traction (Ning et al.,2018a;Wang et al.,2020a),

event schema induction (Chambers,2013;Dror

The dataset and code are available at http://cogcomp.org/

page/publication_view/1023.

Figure 1: Illustration of steps to obtain a Ph.D. degree,

with essential steps marked by red stars. Successfully

achieving the overall goal typically necessitates the com-

pletion of these crucial steps.

et al.,2023), and event-centric knowledge graph

construction (Tandon et al.,2015;Zhang et al.,

2021a). However, most of these studies primarily

concentrate on modeling horizontal relationships

between events, neglecting the internal components

of an event (i.e., how an individual perceives an

event mention).

Computational and cognitive studies (Schank

and Abelson,1977;Zacks and Tversky,2001) in-

dicate that humans can deconstruct a goal event

into a discrete representation of ﬁner-grained step

events, ultimately facilitating the hierarchical or-

ganization of event-related knowledge. As illus-

trated in Figure 1, when discussing the goal event

of “obtaining a Ph.D. degree”, we understand that

several steps may occur along the way. For in-

stance, one might receive the offer,pass the qual-

iﬁcation exam,complete internships,publish pa-

pers, and defend the dissertation. Among these

steps, some are deemed essential to the goal, while

others are not. For instance, passing the qualiﬁ-

cation exam is crucial for earning a Ph.D. degree,

whereas securing an internship is often not a re-

quirement. This ability to discern the essentiality

of steps pertaining to various goals equips humans

with the commonsense needed to address problems

and carry out daily tasks. Similarly, understanding

which steps are essential can profoundly beneﬁt

arXiv:2210.04074v3 [cs.CL] 28 Oct 2023

numerous NLP applications. For instance, event

schema induction (Dror et al.,2023) relies on event-

centric information extraction to derive graphical

representations of events from text. In this context,

understanding essentiality can enhance the qual-

ity of induced schemas by eliminating hallucina-

tions and suggesting the addition of missing crucial

events. Moreover, grasping essentiality can poten-

tially beneﬁt intelligent systems for QA tasks (Bisk

et al.,2020) and task-oriented dialogue process-

ing (Madotto et al.,2020).

In this paper, we aim to assess the depth of un-

derstanding that current NLU models possess re-

garding events in comparison to human cognition.

To accomplish this, we introduce a new cognitively

inspired problem of detecting essential step events

in goal event processes and establish a novel bench-

mark, Essential Step Detection (ESD), to promote

research in this area. Speciﬁcally, we gather goals

and their corresponding steps from WikiHow

and

manually annotate the essentiality of various steps

in relation to the goal. Our experimental ﬁndings

reveal that although humans consistently perceive

event essentiality, current models still have a long

way to go to match this level of understanding.

2 Task and Data

The essential step detection task is deﬁned as fol-

lows: for each goal

and one of its sub-steps

the objective is to predict whether the failure of

will result in the failure of

. In our formulation,

and

are presented as natural language sentences.

The construction of ESD includes two steps: (1)

Data Preparation and (2) Essentiality Annotation.

Details of these steps are provided below.

2.1 Data Preparation

WikiHow is a widely-used and well-structured

resource for exploring the relationship between

goal-oriented processes and their corresponding

steps (Koupaee and Wang,2018;Zhang et al.,

2020b). To the best of our knowledge, it is the

most appropriate resource for the purpose of our re-

search. Consequently, we begin by collecting 1,000

random goal-oriented processes from WikiHow. To

avoid oversimpliﬁed and overly complex processes,

we only retain those with three to ten steps. Further-

more, given that all WikiHow processes and their

associated steps are carefully crafted by humans,

WikiHow is a community website featuring extensive

collections of step-by-step guidelines.

Essential Non-essential Total

Number of instances 1,118 397 1,515

Average step length 17.1 17.4 17.2

Table 1: Dataset statistics of ESD. The average step

length represents the mean number of tokens per step.

the majority of the steps mentioned are essential.

To achieve balance in the dataset, we enlist crowd-

sourcing workers to contribute optional steps (i.e.,

those that could occur as part of the process but are

not essential)

. We employ three annotators from

Amazon Mechanical Turk

, who are native English

speakers, to provide optional steps for each goal.

To ensure high-quality annotations, we require an-

notators to hold the “Master annotator” title. The

average cost and time for supplying annotations

are 0.1 USD and 32 seconds per instance (approxi-

mately 12 USD per hour).

2.2 Essentiality Annotation

Given that our task necessitates a profound under-

standing of the events and careful consideration,

we ensure annotation quality by employing three

well-trained research assistants from our depart-

ment rather than ordinary annotators to conduct the

essentiality annotations. For each goal-step pair,

annotators are asked to rate it as 0 (non-essential), 1

(essential), or -1 (the step is not a valid step for the

target goal, or the goal/step contains conﬁdential or

hostile information)

. Since all annotators are well-

trained and fully comprehend our task, we discard

any pair that is deemed invalid (i.e., -1) by at least

one annotator. This results in 1,515 pairs being

retained. We determine the ﬁnal label based on

majority voting. The dataset statistics can be found

in Table 1. Altogether, we compile 1,118 essential

and 397 non-essential "goal-step" pairs. The inter-

annotator agreement, measured by Fleiss’s Kappa

is 0.611, signifying the high quality of ESD.

3 Experiments

Recently, large-scale pre-trained language models

have exhibited impressive language understanding

capabilities. To assess the extent to which these

models truly understand events, we evaluate them

3The survey template is shown in Appendix Figure 2.

4https://www.mturk.com/

5The survey template is shown in Appendix Figure 3.

We utilize tools from https://github.com/Shamya/

FleissKappa.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EMNLP’23AreAllStepsEquallyImportant?BenchmarkingEssentialityDetectioninEventProcessesHaoyuWang1,HongmingZhang1,YueguanWang2,3,YuqianDeng1,MuhaoChen3,DanRoth11DepartmentofComputerandInformationScience,UPenn2DepartmentofElectronicEngineering,THU3DepartmentofComputerScience,USC{why16gzl,hzhangal,yuqian...

展开>> 收起<<

EMNLP23 Are All Steps Equally Important Benchmarking Essentiality Detection in Event Processes.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

EMNLP23 Are All Steps Equally Important Benchmarking Essentiality Detection in Event Processes

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: