EventGraph at CASE 2021 Task 1 A General Graph-based Approach to Protest Event Extraction Huiling You1David Samuel1Samia Touileb2andLilja Øvrelid1

2025-05-06 0 0 271.55KB 6 页 10玖币
侵权投诉
EventGraph at CASE 2021 Task 1: A General Graph-based Approach to
Protest Event Extraction
Huiling You,1David Samuel,1Samia Touileb,2and Lilja Øvrelid1
1University of Oslo
2University of Bergen
{huiliny, davisamu, liljao}@ifi.uio.no
samia.touileb@uib.no
Abstract
This paper presents our submission to the
2022 edition of the CASE 2021 shared task
1, subtask 4. The EventGraph system adapts
an end-to-end, graph-based semantic parser
to the task of Protest Event Extraction and
more specifically subtask 4 on event trigger
and argument extraction. We experiment with
various graphs, encoding the events as ei-
ther “labeled-edge” or “node-centric” graphs.
We show that the “node-centric” approach
yields best results overall, performing well
across the three languages of the task, namely
English, Spanish, and Portuguese. Event-
Graph is ranked 3rd for English and Por-
tuguese, and 4th for Spanish. Our code
is available at: https://github.com/
huiling-y/eventgraph_at_case
1 Introduction
The automated extraction of socio-political event
information from text constitutes an important
NLP task, with a number of application areas for
social scientists, policy makers, etc. The task
involves analysis at different levels of granular-
ity: document-level, sentence-level, and the fine-
grained extraction of event triggers and arguments
within a sentence. The CASE 2022 Shared Task
1 on Multilingual Protest Event Detection extends
the 2021 shared task (Hürriyeto˘
glu et al.,2021a)
with additional data in the evaluation phase and
features four subtasks: (i) document classification,
(ii) sentence classification, (iii) event sentence co-
reference, and (iv) event extraction.
The task of event extraction involves the detec-
tion of explicit event triggers and corresponding
arguments in text. Current classification-based ap-
proaches to the task typically model the task as a
pipeline of classifiers (Ji and Grishman,2008;Li
et al.,2013;Liu et al.,2020;Du and Cardie,2020;
Li et al.,2020) or using joint modeling approaches
(Yang and Mitchell,2016;Nguyen et al.,2016;Liu
et al.,2018;Wadden et al.,2019;Lin et al.,2020).
In this paper, we present the EventGraph sys-
tem and its application to Task 1 Subtask 4 in the
2022 edition of the CASE 2021 shared task. Event-
Graph is a joint framework for event extraction,
which encodes events as graphs and solves event
extraction as semantic graph parsing. We show
that it is beneficial to model the relation between
event triggers and arguments and approach event
extraction via structured prediction instead of se-
quence labelling. Our system performs well on the
three languages, achieving competitive results and
consistently ranked among the top four systems.
In the following, we briefly describe the data
supplied by the shared task organizers and present
Subtask 4 in some more detail. We then go on
to present an overview of the EventGraph system
focusing on the encoding of the data to semantic
graphs and the model architecture. We experiment
with several different graph encodings and provide
a more detailed analysis of the results.
2 Data and task
Our contribution is to subtask 4, which falls under
shared task 1 – the detection and extraction of socio-
political and crisis events. While most subtasks of
shared task 1 have sentence-level annotations, sub-
task 4 has been annotated at the token-level while
providing the annotators the document-level con-
texts. Subtask 4 focuses on the extraction of event
triggers and event arguments related to contentious
politics and riots (Hürriyeto˘
glu et al.,2021a). This
subtask has been previously approached as a se-
quence labeling problem combining various meth-
ods of fine-tuning pre-trained language models
(Hürriyeto˘
glu et al.,2021a).
The data supplied for Subtask 4 is identical to
that of the 2021 edition of the task, as presented
in Hürriyeto˘
glu et al. (2021a). The data is part of
the multilingual extension of the GLOCON dataset
(Hürriyeto˘
glu et al.,2021b) with data from En-
glish, Portuguese, and Spanish. The source of the
arXiv:2210.09770v1 [cs.CL] 18 Oct 2022
trigger
<root>
target participant
chased, hacked to death
groupChale people
participant
Artificial root:
Triggers:
Arguments:
<root>
trigger
chased, hacked to death
participant
group
target
Chale
participant
people
<root>
trigger
chased
participant
group
target
Chale
participant
people
trigger
hacked to death
Labeled-edge representation Node-centric representation Node-centric-split representation
Figure 1: Graph representations of sentence “Chale was allegedly chased by a group of about 30 people and was
hacked to death with pangas, axes and spears.
data is protest event coverage in news articles from
specific countries: China and South Africa (En-
glish), Brazil (Portuguese), and Argentina (Span-
ish). The data has been doubly annotated by grad-
uate students in political science with token-level
information regarding event triggers and arguments.
Hürriyeto˘
glu et al. (2021a) reports the token level
inter-annotator agreement to be between 0.35 and
0.60. Disagreements between annotators were sub-
sequently resolved by an annotation supervisor. Ta-
ble 1 shows the number of news articles for each
of the languages in the task, distributed over the
training and test sets. This clearly shows that the
majority of the data is in English with only a frac-
tion of articles in Portuguese and Spanish.
Relevant statistics for the different event compo-
nent annotations for Subtask 4 are presented in Ta-
ble 1 detailing the number of triggers, participants,
and various other types of argument components,
such as place, target, organizer, etc. Once again,
the table also illustrates the comparative imbalance
in data across the three languages.
3 System overview
We use our system, EventGraph, that adapts an
end-to-end graph-based semantic parser to solve
the task of extracting socio-political events. In
what follows, we give more details about the graph
representation and the model architecture of our
system.
3.1 Graph representations
We represent each sentence as an event graph,
which contains event trigger(s) and arguments as
nodes. In an event graph, edges are constrained
between the trigger(s) and the corresponding ar-
guments. However, since our system can take as
input graphs in a general sense the precise graph
representation that works best for this task must
English Portuguese Spanish
train 732 (2,925) 29 (78) 29 (91)
dev 76 (323) 4 (9) 1 (15)
test 179 (311) 50 (190) 50 (192)
trigger 4,595 122 157
participant 2,663 73 88
place 1,570 61 15
target 1,470 32 64
organizer 1,261 19 25
etime 1,209 41 40
fname 1,201 48 49
Table 1: Top: Number of articles (sentences) for the
different languages in Subtask 4 (Hürriyeto˘
glu et al.,
2021a). About 10 percent (in terms of sentences) of the
official training data is used as the development split.
Bottom: Counts for the different event components in
Subtask 4 training data for English, Portuguese, and
Spanish (Hürriyeto˘
glu et al.,2021a).
be determined empirically. We here explore two
different graph encoding methods, where the labels
for triggers and arguments are represented either
as edge labels or node labels, namely “labeled-
edge” and “node-centric”. Since sentences in the
data may contain information about several events
with arguments shared across these, we also experi-
ment with a version of the “node-centric” approach
where multiple triggers give rise to separate nodes
in the graph. The intuition behind this is that it is
easier for the model to predict a node anchoring to
a single span than to several disjoint spans.
Labeled-edge
: labels for event trigger(s) and
arguments are represented as edge labels; mul-
tiple triggers are merged into one node, as
shown by the first graph of Figure 1.
Node-centric
: labels for event trigger(s) and
arguments are represented as node labels;
摘要:

EventGraphatCASE2021Task1:AGeneralGraph-basedApproachtoProtestEventExtractionHuilingYou,1DavidSamuel,1SamiaTouileb,2andLiljaØvrelid11UniversityofOslo2UniversityofBergen{huiliny,davisamu,liljao}@ifi.uio.nosamia.touileb@uib.noAbstractThispaperpresentsoursubmissiontothe2022editionoftheCASE2021sharedtas...

展开>> 收起<<
EventGraph at CASE 2021 Task 1 A General Graph-based Approach to Protest Event Extraction Huiling You1David Samuel1Samia Touileb2andLilja Øvrelid1.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:271.55KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注