An Ordinal Latent Variable Model of Conflict Intensity Niklas StoehrQLucas Torroba HennigenJosef ValvodaD Robert WestNRyan CotterellQAaron ScheinR

2025-04-27 0 0 911.79KB 12 页 10玖币
侵权投诉
An Ordinal Latent Variable Model of Conflict Intensity
Niklas StoehrQLucas Torroba Hennigen@Josef ValvodaD
Robert WestNRyan CotterellQAaron ScheinR
QETH Zürich @MIT DUniversity of Cambridge NEPFL RThe University of Chicago
niklas.stoehr@inf.ethz.ch lucastor@mit.edu jv406@cam.ac.uk
robert.west@epfl.ch ryan.cotterell@inf.ethz.ch schein@uchicago.edu
Abstract
Measuring the intensity of events is crucial
for monitoring and tracking armed conflict.
Advances in automated event extraction have
yielded massive data sets of “who did what
to whom” micro-records that enable data-
driven approaches to monitoring conflict. The
Goldstein scale is a widely-used expert-based
measure that scores events on a conflictual–
cooperative scale. It is based only on the action
category (“what”) and disregards the subject
(“who”) and object (“to whom”) of an event, as
well as contextual information, like associated
casualty count, that should contribute to the
perception of an event’s “intensity”. To address
these shortcomings, we take a latent variable-
based approach to measuring conflict intensity.
We introduce a probabilistic generative model
that assumes each observed event is associated
with a latent intensity class. A novel aspect of
this model is that it imposes an ordering on the
classes, such that higher-valued classes denote
higher levels of intensity. The ordinal nature of
the latent variable is induced from naturally or-
dered aspects of the data (e.g., casualty counts)
where higher values naturally indicate higher
intensity. We evaluate the proposed model both
intrinsically and extrinsically, showing that it
obtains good held-out predictive performance.
https://github.com/niklasstoehr/
ordinal-conflict-intensity
1 Introduction
On a scale from
10
for conflictual to
+10
for
cooperative, which of the following events should
be considered more “intense”: “Soldiers injured
two civilians” or “Rebels detained fifty soldiers”?
Measuring the intensity of events is crucial
for monitoring and tracking armed conflict. Ad-
vances in the automated collection and coding of
events have produced massive and systematized
data sets of micro-records that enable data-driven
CAMEO
code
action
name
Goldstein
value
avg. #
casualties
19 fight -10.0 9.31
20 mass violence -10.0 42.20
18 assault -9.0 11.47
15 force posture -7.2 0.13
17 coerce -7.0 1.44
14 protest -6.5 2.06
13 threaten -6.0 0.13
10 demand -5.0 0.01
12 reject -4.0 0.00
16 reduce relations -4.0 0.00
9 investigate -2.0 0.04
11 disapprove -2.0 0.03
1 public statement 0.0 0.19
4 consult 1.0 0.03
2 appeal 3.0 0.00
5 diplom cooperation 3.5 0.03
3 intent cooperate 4.0 0.05
8 yield 5.0 0.10
6 material cooperation 6.0 0.01
7 provide aid 7.0 0.00
Table 1: The The Goldstein scale is an expert-based
intensity ranking of the
20
CAMEO action categories
ranging between
10.0
(conflictual) to
+10.0
(cooper-
ative). The scale disregards casualty counts that are typ-
ically considered in conflict assessment. Here, we dis-
play casualty counts as reported in the NAVCO dataset.
approaches to monitoring conflict. While the “in-
tensity” of a given event has traditionally been
assessed by human expert raters, the tremendous
quantity of events collected every day makes case-
by-case analysis unmanageable. As a consequence,
there is a strong demand for automated and model-
based methods to aggregate events into meaningful
“conflict intensity” measures.
One of the most frequently used measures is
the Goldstein scale (Goldstein,1992). Major
event datasets like IDEA (Bond et al.,2003),
KEDS (Schrodt,2008), GDELT (Leetaru and
Schrodt,2013), ICEWS (Boschee et al.,2015),
Phoenix (Beieler,2016) and NAVCO (Lewis et al.,
2016) all rely on it. The Goldstein scale as-
signs intensity scores between
10.0
and
+10
1
arXiv:2210.03971v2 [cs.LG] 4 Jun 2023
on a conflictual–cooperative scale to the action
categories defined by the Conflict and Media-
tion Event Observations (CAMEO) event coding
scheme (Schrodt,2012). CAMEO specifies
204
low-level event types which are summarized into
20
high-level action categories. The Goldstein
scale ranks “use unconventional mass violence”
and “fight” as the most conflictual of the 20 high-
level action categories (
10.0
) and “provide aid”
(+7.0) as the most cooperative; see Tab. 1.
Despite its usage, the Goldstein scale has many
well-known shortcomings (King and Lowe,2003;
Schrodt,2019). In particular, it applies only to
action categories, and does not account for any
contextual information of a given event, like which
actors are involved, or how many fatalities resulted,
among other bits of context that should contribute
to the perception of an event’s “intensity”.
This paper takes a latent-variable based
approach to measuring conflict intensity. We
introduce a probabilistic generative model that
assumes each observed event
n
is associated with
a latent intensity class
zn
. A novel aspect of this
model is that it imposes an ordering on the classes,
such that higher values of
zn
denote higher levels
of intensity. The ordinal nature of
zn
is induced
from naturally ordered aspects of the data (e.g.,
casualty counts) where higher values naturally
indicate higher intensity. The model effectively
learns to interpolate the ordered (i.e., cardinal
or ordinal) elements of the data while inferring
correlation structure with the non-ordered (e.g.,
categorical) elements of the data (e.g., actor types).
We start with a discussion of the Goldstein scale
and introduce a political event dataset annotated
with Goldstein values in §2and §3. Then, we
propose our model with an ordinal latent variable
in §4. We evaluate the performance of the model
intrinsically (§5) and extrinsically (§6) and find that
it improves over measures based on the original
Goldstein scale or heuristics based on the raw data.
2 Limitations of the Goldstein Scale
The Goldstein scale is a widely-used measure of the
conflictual versus cooperative nature of interactions
between countries (Goldstein,1992). The scale
was created by a panel of international relations
experts who ranked descriptions of interactions. It
was initially created to score action categories in
the WEIS event ontology (McClelland,1984) and
was later adapted to CAMEO (Schrodt,2012).
The Goldstein scale applies only to the action
category of an event (e.g., “fight” or “trade”). Thus,
two different “fight” events, which might involve
two different pairs of actors, occur at different
times, or differ dramatically with respect to the
number of associated fatalities, will still be as-
signed the same Goldstein value. The Goldstein
scale is thus a poor measure of a conflict’s per-
ceived “intensity”, as it ignores much of the infor-
mation that contributes to that perception. Recent
work in conflict studies, for instance, operational-
izes “intensity” primarily using casualty counts
(Chaudoin et al.,2017;Zhong et al.,2023), which
the Goldstein scale ignores entirely.
In Tab. 1, we show the empirical distribution
of assigned Goldstein values alongside the empir-
ical distribution of casualty counts in a dataset of
conflict events. The Goldstein scale is very coarse-
grained; while it ostensibly ranges between
10.0
and
+10.0
, only a small number of discrete values
ever occur, with many action categories assigned
the same value. For the purpose of measuring con-
flict intensity, a finer-grained and more contextual
scale is desirable.
3 Conflict Event Data
This paper considers the publicly available Non-
violent and Violent Campaigns and Outcomes
(NAVCO) data collection (Chenoweth et al.,2018),
specifically, the latest release NAVCO 3.0 from
November 2019 which comprises
N= 112,089
events between December 1990 and December
2012. An exemplary event description is “On
19 May 2012, soldiers injured two civilians in
Afghanistan”. Each part of this description has
been parsed by human coders into standardized,
structural features. We color-code the features that
correspond to the semantic roles subject,predicate,
quantifier,object, which are the focus of our mod-
eling approach. Each data point
n
thus consists
of a four-element tuple
{sn,pn,qn,on}
. We note
that events are further coded for their location (in
this case, Afghanistan) and time (19 May 2012),
among other bits of contextual information. Let us
discuss each feature in more detail:
Subject
sn
.NAVCO contains columns termed
“actor3”, “actor6” and “actor9” which code for the
subject (or agent) of a given action. The actor types
are defined by the CAMEO actor codebook. We
first merge the higher-level categories “actor3” and
“actor6”, resulting in
33
different actor types, and
2
military political government civilian
0
20000
subject type distribution
0.0 0.2 0.4 0.6 0.8 1.0
Goldstein values
0
10000
predicate: Goldstein value distribution
0 50 100 150 200 250 300
victim counts
102
105quantifier: casualty count distribution
military government political civilian
0
20000
40000
object type distribution
counts
Figure 1: Data distributions of NAVCO 3.0 dataset
(Chenoweth et al.,2018). Goldstein values and
casualty counts have intrinsic intensity orderings.
Goldstein values are reversed and transformed, so
that
1.0
represents the most conflictual. We model the
subject
sn
as Categorical, the predicate
pn
as Beta, the
quantifier
qn
as Zero-inflated Geometric and the object
onas Categorical.
then map all actor types into one of
S= 4
classes:
sn∈ {civilian,military,governmental,political}
.
We present our
4
-class actor type mapping in Tab. 3
of the appendix.
Predicate
pn
(Goldstein values). NAVCO
codes
1
each event description into one of the 20
CAMEO action categories in the column “verb10”,
which is by extension associated with a Goldstein
value
pn
. Throughout, we refer to the Goldstein
value
pn
as an action’s “predicate”, since there is a
one-to-one mapping between action categories and
Goldstein values. We scale Goldstein values pnto
a
[0,1]
range and invert them (i.e.,
pn1pn
)
so that higher values close to
1
represent more con-
flictual action categories.
Quantifier
qn
(casualty counts). Each event
description is annotated with human-verified
fatality and wounded counts. We add the two and
1
NAVCO features a
21st
action category which we exclude
since it is not specified by the CAMEO taxonomy and thus
has no Goldstein value.
refer to the resulting value
qnN+
0
as an event’s
“quantifier” or its “casualty count”. In Tab. 1, we
give the average number of casualties associated
with each action alongside its Goldstein value—as
intuition might suggest, actions that Goldstein
scores as more conflictual (e.g., “fight” (
10.0
))
coincide with more casualties on average.
Object
on
.Similar to its subject, NAVCO codes
for the direct object or “target” of a conflict action
using the CAMEO coding scheme; these codes are
found in the columns “target3” and “target6”. We
map those into the
O= 4
classes so that
on
{civilian,military,governmental,political}.
Contextual information: location and time.
Finally, each event is further annotated with a
timestamp and location, which we use to design
extrinsic evaluation tasks in §6.
4 Ordinal Latent Variable Model
We operationalize conflict intensity as a latent vari-
able that expresses the association between the ob-
served variables subject (
sn
), predicate (
pn
), quan-
tifier (
qn
) and object (
on
). Each data point is a
tuple
{sn,pn,qn,on}
representing an event. Our
Bayesian latent variable model is depicted in Fig. 2.
We assume the following generative story. For each
event
n
, we assume that its event intensity class
zn∈ {1, . . . , C}is a Categorical random variable
znCategorical(π(z))(1)
where
π(z)
is a
C
-dimensional discrete distribution
over latent intensity classes. We place a Dirichlet
prior over π(z)
π(z)Dirichlet(α(z))(2)
with concentration parameter
α(z)RC
+
. Condi-
tioned on
zn
, we assume each of the observed sites
per event tuple
sn,pn,qn
and
on
are then drawn as
sn|znCategorical(π(s)
zn)(3)
pn|znBeta(ω(p)
zn,κ(p)
zn)(4)
qn|znZero-infl.Geom.(δ(q)
zn,b(q)
zn)(5)
on|znCategorical(π(o)
zn)(6)
Here
π(s)
zn
and
π(o)
zn
are the discrete distributions for
class
zn
over
S
subject and
O
object types, respec-
tively. They are given as row vectors in the matrices
Π(s)(0,1)C×S
and
Π(o)(0,1)C×O
that
zn
3
摘要:

AnOrdinalLatentVariableModelofConflictIntensityNiklasStoehrQLucasTorrobaHennigen@JosefValvodaDRobertWestNRyanCotterellQAaronScheinRQETHZürich@MITDUniversityofCambridgeNEPFLRTheUniversityofChicagoniklas.stoehr@inf.ethz.chlucastor@mit.edujv406@cam.ac.ukrobert.west@epfl.chryan.cotterell@inf.ethz.chsche...

展开>> 收起<<
An Ordinal Latent Variable Model of Conflict Intensity Niklas StoehrQLucas Torroba HennigenJosef ValvodaD Robert WestNRyan CotterellQAaron ScheinR.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:911.79KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注