
gence tasks (HITs)3(ul Hassan et al.,2013).
As a first step, an event ontology is defined in
terms of a codebook. Codebook development re-
quires multiple domain experts (Goldstein,1992)
spending up to
200
HITs. The initial develop-
ment phase of the widely-used Conflict and Me-
diation Event Observations (CAMEO) (Schrodt,
2012) codebook reports a
3
-year initial develop-
ment phase. Next, context-relevant event descrip-
tions need to be collected to serve as training data.
This often requires paid access to online newspaper
distribution services and data collection infrastruc-
ture, estimated at
200
HITs. Next, human anno-
tators need to be recruited and trained to annotate
data according to the codebook accounting for an-
other
200
HITs. Finally, a machine-based coding
system needs to be developed, trained and vali-
dated, costing another
200
HITs. In earlier days,
systems were dictionary- and pattern- based (King
and Lowe,2003;Norris et al.,2017), while more
recently machine learning-based approaches have
gained momentum (Piskorski and Jacquet,2020;
Olsson et al.,2020;Hürriyeto˘
glu,2021).
In total, the conventional event coding pipeline
amounts to roughly
800
HITs. This development
cost is often not bearable by non-profit / non-
governmental organizations in the humanitarian
aid sector. Moreover, the process requires multi-
ple hand-overs between workers of different back-
ground which leads to errors, misunderstanding
and delays. It is also important to highlight that the
developed coding system is specifically tailored to
a fixed event ontology. Any post-hoc changes of
event types or even a different dataset incurs huge
costs. In practice, event types frequently change
and even vary widely between different divisions
of the same organization.
To address these shortcomings, we present a new
paradigm for highly adaptive event coding. Based
on our method illustrated in Fig. 1B, domain ex-
perts are able to work directly with an interactive
coding tool to design a codebook. They express
event types by means of prompt templates and
single-token answer candidates. For automated
coding, a pre-trained language model is prompted
to fill in those answer candidates taking a full-text
3
In our formulation, one HIT corresponds to roughly one
hour of low-skill work by a single person such as reading and
labeling single-sentence event descriptions. Our estimations
are based on practical experience in working with domain
experts and human annotators in the field of political event
coding and serve the purpose of providing a very approximate
quantification of required resources and labour.
event description as an input. Since prompting
can be noisy (Gao et al.,2021), we propose filter-
ing answer candidates based on textual entailment.
Specifically, our contributions are as follows: (1)
We propose a methodology combining prompting
(§3.1) and textual entailment (§3.2) for event cod-
ing, termed
PR-ENT
. (2) We thoroughly evaluate
this paradigm based on three aspects: accuracy
(§4.1), flexibility (§4.2) and efficiency (§4.3). (3)
We present two online dashboards: (a) A demo of
the
PR-ENT
coding tool. (b) An interactive code-
book design tool that guides the codebook design
by presenting accuracy validation in a human-in-
the-loop manner (§6).
2 Event Data and Types
We consider a subset of the Armed Conflict Loca-
tion and Event Data (ACLED) (Raleigh et al.,2010)
dataset. It is widely-used and has large coverage
of political violence and protest events around the
world. Each event is human annotated with a short
description, its event type and additional details
such as the number of fatalities and actor and tar-
gets. The event types are based on ACLED’s own
event ontology which distinguishes
6
higher-level
and
25
lower-level event types. Some event types
are easily separable (e.g. protests vs battles), while
others are harder to distinguish semantically (e.g.
protests vs riots) (see Fig. 9in the appendix).
We sample
4000
ACLED events (
3000
for train-
ing,
1000
for testing) in the African region while
maintaining the event type distribution of the full
dataset (see Fig. 9). We remove empty event de-
scriptions and annotator notes (e.g. “[size: no
report]”). In Fig. 8in the appendix, we present
statistics of the test set, showing different aspects
of linguistic complexity. In §4.2, we consider
the Global Terrorism Dataset (GTD) (LaFree and
Dugan,2007) to study the effect of domain shift.
3 Entailment-based Prompt Selection
Our approach,
PR-ENT
, represents a real-world
use case of prompting and textual entailment to
code event descriptions
e∈ E
into event types
y∈ Y as shown in Fig. 1B.
3.1 Prompting
Methodological Approach.
In traditional super-
vised learning, a model is trained to learn a map-
ping between the input
e
and the output class
y
.
Prompting (Liu et al.,2021) is a learning paradigm