Rethinking the Event Coding Pipeline with Prompt Entailment Clément Lefebvre Swiss Data Science Center

2025-04-29 0 0 666.59KB 16 页 10玖币
侵权投诉
Rethinking the Event Coding Pipeline with Prompt Entailment
Clément Lefebvre
Swiss Data Science Center
clement.lefebvre@datascience.ch
Niklas Stoehr
ETH Zürich
niklas.stoehr@inf.ethz.ch
Abstract
For monitoring crises, political events are ex-
tracted from the news. The large amount of
unstructured full-text event descriptions makes
a case-by-case analysis unmanageable, partic-
ularly for low-resource humanitarian aid orga-
nizations. This creates a demand to classify
events into event types, a task referred to as
event coding. Typically, domain experts craft
an event type ontology, annotators label a large
dataset and technical experts develop a super-
vised coding system. In this work, we propose
PR-ENT1, a new event coding approach that
is more flexible and resource-efficient, while
maintaining competitive accuracy: first, we ex-
tend an event description such as “Military in-
jured two civilians” by a template, e.g. “Peo-
ple were [Z]” and prompt a pre-trained (cloze)
language model to fill the slot Z. Second, we
select suitable answer candidates Z= {“in-
jured”, “hurt”...} by treating the event descrip-
tion as premise and the filled templates as hy-
pothesis in a textual entailment task. In a fi-
nal step, the selected answer candidate can be
mapped to its corresponding event type. This
allows domain experts to draft the codebook
directly as labeled prompts and interpretable
answer candidates. This human-in-the-loop
process is guided by our codebook design
tool2. We show that our approach is robust
through several checks: perturbing the event
description and prompt template, restricting
the vocabulary and removing contextual infor-
mation.
1 Introduction
Decision-makers in politics and humanitarian aid
report a growing demand for comprehensive and
structured overviews of socio-political events (Lep-
uschitz and Stoehr,2021). For this purpose, news
papers are automatically screened for event men-
tions, a task referred to as event detection and
authors contributed equally
1https://huggingface.co/spaces/clef/PRENT-Demo
2https://huggingface.co/spaces/clef/PRENT-Codebook
A Conventional Event Coding Pipeline
Domain experts
design codebook
event type: injury
description
examples
Human annotators
collect and label
training data
Engineers build and
train classifier
injury
abuse
injury
→ injuryevent →
B Our approach: Prompt Entailment PR-ENT
Domain experts design codebook in the
form of prompt template and answer
candidates directly with engineers
injured
involved
civilians
harmed
killed
[ZK]
answer
candid.
event descript template
Military injured two civilians.
premise
validate hypothesis People were [Z* = injured]
Military injured
two civilians.
People were [Z]
Prompting
Entailment
2
1
Figure 1: (A) The conventional event coding pipeline
involves many hand-overs between involved stakehold-
ers and is strictly tailored to the event ontology. (B)
Our approach combines prompting and textual entail-
ment to perform flexible, unsupervised event coding.
extraction. The sheer amount of extracted, full-
text event descriptions day-to-day is impossible to
be parsed by humans, especially when limited by
scarce financial and computational resources.
Event coding seeks to automatically classify
event descriptions into pre-defined event types.
Event coding is conventionally approached via a
multi-step pipeline as shown in Fig. 1A. It incurs
large costs in terms of human labor and time. We
sketch out this pipeline expressed in human intelli-
arXiv:2210.05257v2 [cs.CL] 5 May 2023
gence tasks (HITs)3(ul Hassan et al.,2013).
As a first step, an event ontology is defined in
terms of a codebook. Codebook development re-
quires multiple domain experts (Goldstein,1992)
spending up to
200
HITs. The initial develop-
ment phase of the widely-used Conflict and Me-
diation Event Observations (CAMEO) (Schrodt,
2012) codebook reports a
3
-year initial develop-
ment phase. Next, context-relevant event descrip-
tions need to be collected to serve as training data.
This often requires paid access to online newspaper
distribution services and data collection infrastruc-
ture, estimated at
200
HITs. Next, human anno-
tators need to be recruited and trained to annotate
data according to the codebook accounting for an-
other
200
HITs. Finally, a machine-based coding
system needs to be developed, trained and vali-
dated, costing another
200
HITs. In earlier days,
systems were dictionary- and pattern- based (King
and Lowe,2003;Norris et al.,2017), while more
recently machine learning-based approaches have
gained momentum (Piskorski and Jacquet,2020;
Olsson et al.,2020;Hürriyeto˘
glu,2021).
In total, the conventional event coding pipeline
amounts to roughly
800
HITs. This development
cost is often not bearable by non-profit / non-
governmental organizations in the humanitarian
aid sector. Moreover, the process requires multi-
ple hand-overs between workers of different back-
ground which leads to errors, misunderstanding
and delays. It is also important to highlight that the
developed coding system is specifically tailored to
a fixed event ontology. Any post-hoc changes of
event types or even a different dataset incurs huge
costs. In practice, event types frequently change
and even vary widely between different divisions
of the same organization.
To address these shortcomings, we present a new
paradigm for highly adaptive event coding. Based
on our method illustrated in Fig. 1B, domain ex-
perts are able to work directly with an interactive
coding tool to design a codebook. They express
event types by means of prompt templates and
single-token answer candidates. For automated
coding, a pre-trained language model is prompted
to fill in those answer candidates taking a full-text
3
In our formulation, one HIT corresponds to roughly one
hour of low-skill work by a single person such as reading and
labeling single-sentence event descriptions. Our estimations
are based on practical experience in working with domain
experts and human annotators in the field of political event
coding and serve the purpose of providing a very approximate
quantification of required resources and labour.
event description as an input. Since prompting
can be noisy (Gao et al.,2021), we propose filter-
ing answer candidates based on textual entailment.
Specifically, our contributions are as follows: (1)
We propose a methodology combining prompting
3.1) and textual entailment (§3.2) for event cod-
ing, termed
PR-ENT
. (2) We thoroughly evaluate
this paradigm based on three aspects: accuracy
4.1), flexibility (§4.2) and efficiency (§4.3). (3)
We present two online dashboards: (a) A demo of
the
PR-ENT
coding tool. (b) An interactive code-
book design tool that guides the codebook design
by presenting accuracy validation in a human-in-
the-loop manner (§6).
2 Event Data and Types
We consider a subset of the Armed Conflict Loca-
tion and Event Data (ACLED) (Raleigh et al.,2010)
dataset. It is widely-used and has large coverage
of political violence and protest events around the
world. Each event is human annotated with a short
description, its event type and additional details
such as the number of fatalities and actor and tar-
gets. The event types are based on ACLED’s own
event ontology which distinguishes
6
higher-level
and
25
lower-level event types. Some event types
are easily separable (e.g. protests vs battles), while
others are harder to distinguish semantically (e.g.
protests vs riots) (see Fig. 9in the appendix).
We sample
4000
ACLED events (
3000
for train-
ing,
1000
for testing) in the African region while
maintaining the event type distribution of the full
dataset (see Fig. 9). We remove empty event de-
scriptions and annotator notes (e.g. “[size: no
report]”). In Fig. 8in the appendix, we present
statistics of the test set, showing different aspects
of linguistic complexity. In §4.2, we consider
the Global Terrorism Dataset (GTD) (LaFree and
Dugan,2007) to study the effect of domain shift.
3 Entailment-based Prompt Selection
Our approach,
PR-ENT
, represents a real-world
use case of prompting and textual entailment to
code event descriptions
e∈ E
into event types
y∈ Y as shown in Fig. 1B.
3.1 Prompting
Methodological Approach.
In traditional super-
vised learning, a model is trained to learn a map-
ping between the input
e
and the output class
y
.
Prompting (Liu et al.,2021) is a learning paradigm
making use of (cloze) language models that have
been trained to predict masked tokens within text.
4
Prompt-based learning transfers this capability to
perform classification in the following way:
We extend each event description
e∈ E
by a
template
t∈ T
to form the input
he, ti E × T
.
Each template contains a masked slot
Z
, e.g. “This
event involves [
Z
]”, “People were [
Z
]”.
5
The lan-
guage model takes
he, ti
as input and returns an
output distribution of probabilities over the answer
vocabulary
Z
. Each token
ze,t ∈ Z
can serve as
a potential slot filler to
Z=ze,t
. However, we
only consider the top
k
most probable answer can-
didates
zk
e,t ∈ ZK
e,t
.
Z
can be a constrained subset
Zt
that only features a template-related answer vo-
cabulary to increase interpretability as pointed out
in §5. We discuss how to map answer candidates
to event types in §4.1.
Implementation Details.
We discuss the design
of templates and constrained answer vocabularies
resulting in a codebook (Tab. 7) in §6. In particular,
we prompt DistilBERT-base-uncased (Sanh et al.,
2020), a distilled version of the BERT model which
is more computationally efficient at the cost of a
small performance decrease. For each prompt, we
consider the
K= 30
most probable tokens as the
set of answer candidates
ZK
e,t
. Ideally, we select a
larger set, but performance gains are minimal while
computational costs increase in subsequent steps.
3.2 Textual Entailment
Limitations of Prompting.
Prompting yields
event-related tokens for event coding, but comes
with challenges. There is no guarantee that a
prompted answer candidate
zk
e,t ∈ ZK
e,t
is suited
to represent an event. Answer candidates may be
semantically unrelated as shown in Fig. 2. To ad-
dress this shortcoming, we propose filtering
ZK
e,t
via textual entailment. Textual entailment, or nat-
ural language inference (NLI) (Fyodorov et al.,
2000;Bowman et al.,2015) can be framed as the
following task: Given a “premise”, verify whether
a “hypothesis” is true (entailment), false (contra-
diction), or undetermined (neutral). It has been
evaluated as a popular method for performing text
classification (Wang et al.,2021).
4
“Cloze” pertains to filling in missing tokens not necessar-
ily uni-directional left-to-right, but anywhere in a string.
5
The first prompt template is intended to provide a one-
word summary of the event. For the second template, we
expect a verb describing the actions undertaken by the actor
or a verb that describes what happened to the target.
Figure 2: Given the event description “Several demon-
strators were injured. and two templates (A) and
(B), prompting alone can yield tokens that fit syntacti-
cally but not semantically (blue bar). In contrast, filter-
ing prompted answer candidates via textual entailment
leaves us with tokens more closely related to the event
(orange bar). To this end, we treat the event description
as premise and the filled template as hypothesis.
Selecting Entailed Answer Candidates.
We
consider the event description
e
as premise and
the template
t0
filled with a prompted answer can-
didate as hypothesis. For example, given the
premise “Two bombs detonated...”, we automat-
ically construct hypotheses “This event involves
[zk
e,t]∈ ZK
e,t ={explosives, civilians...}
”, see
Tab. 1. We pass the concatenation of the premise
and hypothesis to RoBERTa-large-mnli (Liu et al.,
2019). If the model finds premise and hypothesis
to be entailed, then the prompted answer candidate
zk
e,t
is considered an entailed answer candidate
z
e,t
(e.g.
z
e,t
= explosives). We combine the categories
“neutral” and “contradiction” into one since we are
interested in a hypothesis being entailed or not.
This means,
PR-ENT
has two hyperparameters:
the top
K
answer candidate tokens yielded by the
prompting step and the acceptance threshold in the
entailment step that governs whether an answer can-
didate is kept. We empirically analyse the effect of
both hyperparameters on the final F1 classification
score in Fig. 5. In Fig. 5A, we verify that consid-
ering the top
30
answer candidate tokens leads to
good performance on average. Further, we find a
suitable threshold of
0.5
on the entailment model’s
output probability in Fig. 5B.
Event Description + Template he, tiAnswer Candidates zk
e,t Entailed Answer Candidates z
e,t
Several demonstrators
were injured.
+ People were [Z].
arrested, killed, hospitalized,
injured, evacuated,
wounded, shot,
homeless, hurt, detained
injured, wounded, hurt
Several demonstrators
were injured.
+ This event involves [Z].
fireworks, demonstrations,
protests, violence, suicide,
bicycles, shooting, strikes,
motorcycles, cycling
demonstrations, protests,
violence
The sponsorship deal
between the shoes brand
and the soccer team
was confirmed.
+ This event involves [Z].
sponsorship, nike, sponsors,
fundraising, cycling,
advertising, charity, donations,
concerts, competitions
sponsorship, sponsors,
advertising, competitions
Table 1: We prompt a language model based on an event description eand template twith slot Z. We keep only
those prompted answer candidates zk
e,t ∈ ZK
e,t entailed in a subsequent textual entailment task z
e,t ∈ Z
e,t.
4 Evaluation: Event Classification
We compare
PR-ENT
against the conventional
event coding pipeline in an evaluation along three
dimensions: accuracy, flexibility and efficiency.
4.1 Accuracy
So far we have not discussed how to map entailed
answer candidates
z
e,t ∈ Z
e,t
onto event types
y∈ Y
. We choose to do hard prompting, as op-
posed to soft prompting. This means, tokens in
Z
e,t
are mapped onto event types
y
via a simple lin-
ear transform
y=f(z
e,t)
. When
f
is the identity
function, no additional mapping is needed (§4.2).
Hard prompting allows defining event types, i.e.
an event ontology, in terms of interpretable answer
candidates. As an example, we present an inter-
pretable event ontology in Tab. 7in the appendix.
We use it to classify “lethal” and “non-lethal” event
as explained in §4.2. Generally, we observe a
trade-off between accuracy and interpretability. We
want different sets of entailed answer candidates
to uniquely define different event types at a high
accuracy. At the same time, we require the set to
be limited to a few, interpretable tokens only, that
are highly representative for the event type. In the
following, we learn a shallow mapping between
Z
e,t
and the
6
high-level event types
Y
provided
by the ACLED event ontology as ground truth.
Baselines and Ceilings.
As baselines, we con-
sider bag-of-words (BoW) and GloVe (Pennington
et al.,2014) embeddings of event descriptions. Em-
beddings are mapped onto event types via logistic
regression (LR). Further, we contrast our
PR-ENT
with a prompting-only (
PR
) approach also using
Model Accuracy F1 Score
BoW + LR 80.5 77.1
GloVe + LR 78.5 74.6
Random Tokens + BoW + LR 77.1 72.2
PR + BoW + LR 82.9 80.8
PR-ENT + BoW + LR 85.1 83.7
DistilBERT 87.1 86.0
Table 2: Classification of 6event types in the ACLED
dataset. As expected, DistilBERT performs best as it is
fine-tuned specifically on this classification task. Our
approach PR-ENT is more ad-hoc and does not fall far
behind. The additional entailment step reduces noise
compared to the prompting-only approach PR. On top
of the two standard baselines using BoW and GloVe,
we introduce an additional baseline where we select 10
random tokens from ZK
e,t for each he, ti. Compared to
all baselines, PR-ENT performs better.
logistic regression as a classification layer. As a
ceiling model, we consider DistilBERT fine-tuned
in a sequence classification task.
Our Approach PR-ENT.
To evaluate our ap-
proach, we only consider the template “This event
involves [
Z
]” and construct a BoW feature matrix
by extending the event descriptions
e
with the en-
tailed answer candidates
z
e,t
. The resulting feature
matrix serves as input to logistic regression. We
report classification results in Tab. 2and find that
PR-ENT
is only outperformed by the supervised,
fine-tuned DistilBERT ceiling, but performs better
than all baselines.
摘要:

RethinkingtheEventCodingPipelinewithPromptEntailmentClémentLefebvreSwissDataScienceCenterclement.lefebvre@datascience.chNiklasStoehrETHZürichniklas.stoehr@inf.ethz.chAbstractFormonitoringcrises,politicaleventsareex-tractedfromthenews.Thelargeamountofunstructuredfull-texteventdescriptionsmakesacase...

展开>> 收起<<
Rethinking the Event Coding Pipeline with Prompt Entailment Clément Lefebvre Swiss Data Science Center.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:666.59KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注