Rethinking the Event Coding Pipeline with Prompt Entailment Clément Lefebvre Swiss Data Science Center

2025-04-29 0 0 666.59KB 16 页 10玖币

侵权投诉

Rethinking the Event Coding Pipeline with Prompt Entailment

Clément Lefebvre∗

Swiss Data Science Center

clement.lefebvre@datascience.ch

Niklas Stoehr∗

ETH Zürich

niklas.stoehr@inf.ethz.ch

Abstract

For monitoring crises, political events are ex-

tracted from the news. The large amount of

unstructured full-text event descriptions makes

a case-by-case analysis unmanageable, partic-

ularly for low-resource humanitarian aid orga-

nizations. This creates a demand to classify

events into event types, a task referred to as

event coding. Typically, domain experts craft

an event type ontology, annotators label a large

dataset and technical experts develop a super-

vised coding system. In this work, we propose

PR-ENT1, a new event coding approach that

is more ﬂexible and resource-efﬁcient, while

maintaining competitive accuracy: ﬁrst, we ex-

tend an event description such as “Military in-

jured two civilians” by a template, e.g. “Peo-

ple were [Z]” and prompt a pre-trained (cloze)

language model to ﬁll the slot Z. Second, we

select suitable answer candidates Z∗= {“in-

jured”, “hurt”...} by treating the event descrip-

tion as premise and the ﬁlled templates as hy-

pothesis in a textual entailment task. In a ﬁ-

nal step, the selected answer candidate can be

mapped to its corresponding event type. This

allows domain experts to draft the codebook

directly as labeled prompts and interpretable

answer candidates. This human-in-the-loop

process is guided by our codebook design

tool2. We show that our approach is robust

through several checks: perturbing the event

description and prompt template, restricting

the vocabulary and removing contextual infor-

mation.

1 Introduction

Decision-makers in politics and humanitarian aid

report a growing demand for comprehensive and

structured overviews of socio-political events (Lep-

uschitz and Stoehr,2021). For this purpose, news

papers are automatically screened for event men-

tions, a task referred to as event detection and

∗

authors contributed equally

1https://huggingface.co/spaces/clef/PRENT-Demo

2https://huggingface.co/spaces/clef/PRENT-Codebook

A Conventional Event Coding Pipeline

Domain experts

design codebook

event type: injury

description

examples

Human annotators

collect and label

training data

Engineers build and

train classifier

injury

abuse

injury

→ injuryevent →

B Our approach: Prompt Entailment PR-ENT

Domain experts design codebook in the

form of prompt template and answer

candidates directly with engineers

injured

involved

civilians

harmed

killed

[ZK]

answer

candid.

event descript template

Military injured two civilians.

premise

validate hypothesis People were [Z* = injured]

Military injured

two civilians.

People were [Z]

Prompting

Entailment

Figure 1: (A) The conventional event coding pipeline

involves many hand-overs between involved stakehold-

ers and is strictly tailored to the event ontology. (B)

Our approach combines prompting and textual entail-

ment to perform ﬂexible, unsupervised event coding.

extraction. The sheer amount of extracted, full-

text event descriptions day-to-day is impossible to

be parsed by humans, especially when limited by

scarce ﬁnancial and computational resources.

Event coding seeks to automatically classify

event descriptions into pre-deﬁned event types.

Event coding is conventionally approached via a

multi-step pipeline as shown in Fig. 1A. It incurs

large costs in terms of human labor and time. We

sketch out this pipeline expressed in human intelli-

arXiv:2210.05257v2 [cs.CL] 5 May 2023

gence tasks (HITs)3(ul Hassan et al.,2013).

As a ﬁrst step, an event ontology is deﬁned in

terms of a codebook. Codebook development re-

quires multiple domain experts (Goldstein,1992)

spending up to

200

HITs. The initial develop-

ment phase of the widely-used Conﬂict and Me-

diation Event Observations (CAMEO) (Schrodt,

2012) codebook reports a

-year initial develop-

ment phase. Next, context-relevant event descrip-

tions need to be collected to serve as training data.

This often requires paid access to online newspaper

distribution services and data collection infrastruc-

ture, estimated at

200

HITs. Next, human anno-

tators need to be recruited and trained to annotate

data according to the codebook accounting for an-

other

200

HITs. Finally, a machine-based coding

system needs to be developed, trained and vali-

dated, costing another

200

HITs. In earlier days,

systems were dictionary- and pattern- based (King

and Lowe,2003;Norris et al.,2017), while more

recently machine learning-based approaches have

gained momentum (Piskorski and Jacquet,2020;

Olsson et al.,2020;Hürriyeto˘

glu,2021).

In total, the conventional event coding pipeline

amounts to roughly

800

HITs. This development

cost is often not bearable by non-proﬁt / non-

governmental organizations in the humanitarian

aid sector. Moreover, the process requires multi-

ple hand-overs between workers of different back-

ground which leads to errors, misunderstanding

and delays. It is also important to highlight that the

developed coding system is speciﬁcally tailored to

a ﬁxed event ontology. Any post-hoc changes of

event types or even a different dataset incurs huge

costs. In practice, event types frequently change

and even vary widely between different divisions

of the same organization.

To address these shortcomings, we present a new

paradigm for highly adaptive event coding. Based

on our method illustrated in Fig. 1B, domain ex-

perts are able to work directly with an interactive

coding tool to design a codebook. They express

event types by means of prompt templates and

single-token answer candidates. For automated

coding, a pre-trained language model is prompted

to ﬁll in those answer candidates taking a full-text

In our formulation, one HIT corresponds to roughly one

hour of low-skill work by a single person such as reading and

labeling single-sentence event descriptions. Our estimations

are based on practical experience in working with domain

experts and human annotators in the ﬁeld of political event

coding and serve the purpose of providing a very approximate

quantiﬁcation of required resources and labour.

event description as an input. Since prompting

can be noisy (Gao et al.,2021), we propose ﬁlter-

ing answer candidates based on textual entailment.

Speciﬁcally, our contributions are as follows: (1)

We propose a methodology combining prompting

(§3.1) and textual entailment (§3.2) for event cod-

ing, termed

PR-ENT

. (2) We thoroughly evaluate

this paradigm based on three aspects: accuracy

(§4.1), ﬂexibility (§4.2) and efﬁciency (§4.3). (3)

We present two online dashboards: (a) A demo of

the

PR-ENT

coding tool. (b) An interactive code-

book design tool that guides the codebook design

by presenting accuracy validation in a human-in-

the-loop manner (§6).

2 Event Data and Types

We consider a subset of the Armed Conﬂict Loca-

tion and Event Data (ACLED) (Raleigh et al.,2010)

dataset. It is widely-used and has large coverage

of political violence and protest events around the

world. Each event is human annotated with a short

description, its event type and additional details

such as the number of fatalities and actor and tar-

gets. The event types are based on ACLED’s own

event ontology which distinguishes

higher-level

and

lower-level event types. Some event types

are easily separable (e.g. protests vs battles), while

others are harder to distinguish semantically (e.g.

protests vs riots) (see Fig. 9in the appendix).

We sample

4000

ACLED events (

3000

for train-

ing,

1000

for testing) in the African region while

maintaining the event type distribution of the full

dataset (see Fig. 9). We remove empty event de-

scriptions and annotator notes (e.g. “[size: no

report]”). In Fig. 8in the appendix, we present

statistics of the test set, showing different aspects

of linguistic complexity. In §4.2, we consider

the Global Terrorism Dataset (GTD) (LaFree and

Dugan,2007) to study the effect of domain shift.

3 Entailment-based Prompt Selection

Our approach,

PR-ENT

, represents a real-world

use case of prompting and textual entailment to

code event descriptions

e∈ E

into event types

y∈ Y as shown in Fig. 1B.

3.1 Prompting

Methodological Approach.

In traditional super-

vised learning, a model is trained to learn a map-

ping between the input

and the output class

Prompting (Liu et al.,2021) is a learning paradigm

making use of (cloze) language models that have

been trained to predict masked tokens within text.

Prompt-based learning transfers this capability to

perform classiﬁcation in the following way:

We extend each event description

e∈ E

by a

template

t∈ T

to form the input

he, ti ∈ E × T

Each template contains a masked slot

, e.g. “This

event involves [

]”, “People were [

]”.

The lan-

guage model takes

he, ti

as input and returns an

output distribution of probabilities over the answer

vocabulary

. Each token

ze,t ∈ Z

can serve as

a potential slot ﬁller to

Z=ze,t

. However, we

only consider the top

most probable answer can-

didates

e,t ∈ ZK

e,t

can be a constrained subset

that only features a template-related answer vo-

cabulary to increase interpretability as pointed out

in §5. We discuss how to map answer candidates

to event types in §4.1.

Implementation Details.

We discuss the design

of templates and constrained answer vocabularies

resulting in a codebook (Tab. 7) in §6. In particular,

we prompt DistilBERT-base-uncased (Sanh et al.,

2020), a distilled version of the BERT model which

is more computationally efﬁcient at the cost of a

small performance decrease. For each prompt, we

consider the

K= 30

most probable tokens as the

set of answer candidates

e,t

. Ideally, we select a

larger set, but performance gains are minimal while

computational costs increase in subsequent steps.

3.2 Textual Entailment

Limitations of Prompting.

Prompting yields

event-related tokens for event coding, but comes

with challenges. There is no guarantee that a

prompted answer candidate

e,t ∈ ZK

e,t

is suited

to represent an event. Answer candidates may be

semantically unrelated as shown in Fig. 2. To ad-

dress this shortcoming, we propose ﬁltering

e,t

via textual entailment. Textual entailment, or nat-

ural language inference (NLI) (Fyodorov et al.,

2000;Bowman et al.,2015) can be framed as the

following task: Given a “premise”, verify whether

a “hypothesis” is true (entailment), false (contra-

diction), or undetermined (neutral). It has been

evaluated as a popular method for performing text

classiﬁcation (Wang et al.,2021).

“Cloze” pertains to ﬁlling in missing tokens not necessar-

ily uni-directional left-to-right, but anywhere in a string.

The ﬁrst prompt template is intended to provide a one-

word summary of the event. For the second template, we

expect a verb describing the actions undertaken by the actor

or a verb that describes what happened to the target.

Figure 2: Given the event description “Several demon-

strators were injured.” and two templates (A) and

(B), prompting alone can yield tokens that ﬁt syntacti-

cally but not semantically (blue bar). In contrast, ﬁlter-

ing prompted answer candidates via textual entailment

leaves us with tokens more closely related to the event

(orange bar). To this end, we treat the event description

as premise and the ﬁlled template as hypothesis.

Selecting Entailed Answer Candidates.

consider the event description

as premise and

the template

ﬁlled with a prompted answer can-

didate as hypothesis. For example, given the

premise “Two bombs detonated...”, we automat-

ically construct hypotheses “This event involves

[zk

e,t]∈ ZK

e,t ={explosives, civilians...}

”, see

Tab. 1. We pass the concatenation of the premise

and hypothesis to RoBERTa-large-mnli (Liu et al.,

2019). If the model ﬁnds premise and hypothesis

to be entailed, then the prompted answer candidate

e,t

is considered an entailed answer candidate

z∗

e,t

(e.g.

z∗

e,t

= explosives). We combine the categories

“neutral” and “contradiction” into one since we are

interested in a hypothesis being entailed or not.

This means,

PR-ENT

has two hyperparameters:

the top

answer candidate tokens yielded by the

prompting step and the acceptance threshold in the

entailment step that governs whether an answer can-

didate is kept. We empirically analyse the effect of

both hyperparameters on the ﬁnal F1 classiﬁcation

score in Fig. 5. In Fig. 5A, we verify that consid-

ering the top

answer candidate tokens leads to

good performance on average. Further, we ﬁnd a

suitable threshold of

0.5

on the entailment model’s

output probability in Fig. 5B.

Event Description + Template he, tiAnswer Candidates zk

e,t Entailed Answer Candidates z∗

e,t

Several demonstrators

were injured.

+ People were [Z].

arrested, killed, hospitalized,

injured, evacuated,

wounded, shot,

homeless, hurt, detained

injured, wounded, hurt

Several demonstrators

were injured.

+ This event involves [Z].

ﬁreworks, demonstrations,

protests, violence, suicide,

bicycles, shooting, strikes,

motorcycles, cycling

demonstrations, protests,

violence

The sponsorship deal

between the shoes brand

and the soccer team

was conﬁrmed.

+ This event involves [Z].

sponsorship, nike, sponsors,

fundraising, cycling,

advertising, charity, donations,

concerts, competitions

sponsorship, sponsors,

advertising, competitions

Table 1: We prompt a language model based on an event description eand template twith slot Z. We keep only

those prompted answer candidates zk

e,t ∈ ZK

e,t entailed in a subsequent textual entailment task z∗

e,t ∈ Z∗

e,t.

4 Evaluation: Event Classiﬁcation

We compare

PR-ENT

against the conventional

event coding pipeline in an evaluation along three

dimensions: accuracy, ﬂexibility and efﬁciency.

4.1 Accuracy

So far we have not discussed how to map entailed

answer candidates

z∗

e,t ∈ Z∗

e,t

onto event types

y∈ Y

. We choose to do hard prompting, as op-

posed to soft prompting. This means, tokens in

Z∗

e,t

are mapped onto event types

via a simple lin-

ear transform

y=f(z∗

e,t)

. When

is the identity

function, no additional mapping is needed (§4.2).

Hard prompting allows deﬁning event types, i.e.

an event ontology, in terms of interpretable answer

candidates. As an example, we present an inter-

pretable event ontology in Tab. 7in the appendix.

We use it to classify “lethal” and “non-lethal” event

as explained in §4.2. Generally, we observe a

trade-off between accuracy and interpretability. We

want different sets of entailed answer candidates

to uniquely deﬁne different event types at a high

accuracy. At the same time, we require the set to

be limited to a few, interpretable tokens only, that

are highly representative for the event type. In the

following, we learn a shallow mapping between

Z∗

e,t

and the

high-level event types

provided

by the ACLED event ontology as ground truth.

Baselines and Ceilings.

As baselines, we con-

sider bag-of-words (BoW) and GloVe (Pennington

et al.,2014) embeddings of event descriptions. Em-

beddings are mapped onto event types via logistic

regression (LR). Further, we contrast our

PR-ENT

with a prompting-only (

) approach also using

Model Accuracy F1 Score

BoW + LR 80.5 77.1

GloVe + LR 78.5 74.6

Random Tokens + BoW + LR 77.1 72.2

PR + BoW + LR 82.9 80.8

PR-ENT + BoW + LR 85.1 83.7

DistilBERT 87.1 86.0

Table 2: Classiﬁcation of 6event types in the ACLED

dataset. As expected, DistilBERT performs best as it is

ﬁne-tuned speciﬁcally on this classiﬁcation task. Our

approach PR-ENT is more ad-hoc and does not fall far

behind. The additional entailment step reduces noise

compared to the prompting-only approach PR. On top

of the two standard baselines using BoW and GloVe,

we introduce an additional baseline where we select 10

random tokens from ZK

e,t for each he, ti. Compared to

all baselines, PR-ENT performs better.

logistic regression as a classiﬁcation layer. As a

ceiling model, we consider DistilBERT ﬁne-tuned

in a sequence classiﬁcation task.

Our Approach PR-ENT.

To evaluate our ap-

proach, we only consider the template “This event

involves [

]” and construct a BoW feature matrix

by extending the event descriptions

with the en-

tailed answer candidates

z∗

e,t

. The resulting feature

matrix serves as input to logistic regression. We

report classiﬁcation results in Tab. 2and ﬁnd that

PR-ENT

is only outperformed by the supervised,

ﬁne-tuned DistilBERT ceiling, but performs better

than all baselines.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RethinkingtheEventCodingPipelinewithPromptEntailmentClémentLefebvreSwissDataScienceCenterclement.lefebvre@datascience.chNiklasStoehrETHZürichniklas.stoehr@inf.ethz.chAbstractFormonitoringcrises,politicaleventsareex-tractedfromthenews.Thelargeamountofunstructuredfull-texteventdescriptionsmakesacase...

展开>> 收起<<

Rethinking the Event Coding Pipeline with Prompt Entailment Clément Lefebvre Swiss Data Science Center.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Rethinking the Event Coding Pipeline with Prompt Entailment Clément Lefebvre Swiss Data Science Center

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: