Event Extraction A Survey Viet Dac Lai Department of Computer Science

2025-04-27 0 0 2.55MB 30 页 10玖币
侵权投诉
Event Extraction: A Survey
Viet Dac Lai
Department of Computer Science
University of Oregon
Eugene, OR 97403, USA
vietl@cs.uoregon.edu
Abstract
Extracting the reported events from text is one
of the key research themes in natural language
processing. This process includes several tasks
such as event detection, argument extraction,
role labeling. As one of the most important
topics in natural language processing and nat-
ural language understanding, the applications
of event extraction spans across a wide range
of domains such as newswire, biomedical do-
main, history and humanity, and cyber security.
This report presents a comprehensive survey
for event detection from textual documents. In
this report, we provide the task definition, the
evaluation method, as well as the benchmark
datasets and a taxonomy of methodologies for
event extraction. We also present our vision of
future research direction in event detection.
1 Introduction
Event Extraction (EE) is an essential task in Infor-
mation Extraction (IE) in Natural Language Pro-
cessing (NLP). An event is an occurrence of an
activity that happens at a particular time and place,
or it might be described as a change of state (LDC,
2005). The main task of event extraction is to detect
events in the text (i.e., event detection) and then sort
them into some classes of interest (i.e., event clas-
sification). The second task involves detecting the
event participants (i.e., argument extraction) and
their attributes (e.g., argument role labeling). In
short, event extraction structures the unstructured
text by answering the WH questions of an event
(i.e., what, who, when, where, why, and how).
Event extraction plays an important role in vari-
ous natural language processing applications. For
instance, the extracted event can be used to con-
struct knowledge bases on which people can per-
form logical queries easily (Ge et al.,2018). Many
domains can benefit from the development of event
extraction research. In the biomedical domain,
event extraction can be used to extract interaction
between biomolecules (e.g., protein-protein inter-
actions) that have been described in the biomedi-
cal literature (Kim et al.,2009). In the economic
domain, events reported on social media and so-
cial networks can be used for measuring socio-
economic indicators (Min and Zhao,2019). Re-
cently, event extraction has been adopted in many
other domains such as: literature (Sims et al.,2019),
cyber security (Man Duc Trong et al.,2020), his-
tory (Sprugnoli and Tonelli,2019), and humanity
(Lai et al.,2021b).
Even though event extraction has been studied
for decades, it is still a very challenging task. To
perform the event extraction, a system needs to
understand the text’s semantics and ambiguity and
organize the extracted information into structures
(LDC,2005).
It closely connects with other natural language
processing tasks such as named entity recognition
(NER), entity linking (EL), and dependency pars-
ing. Although these tasks can boost the develop-
ment of event extraction (McClosky et al.,2011),
they might have an inverse impact on the perfor-
mance of the event extraction systems (Zhang et al.,
2018), depending on how the output of these tasks
is exploited.
Last but not least, lacking training data is a fun-
damental problem in expanding event extraction
to a new domain because the traditional classifica-
tion model requires a large amount of training data
(Huang et al.,2018b). Therefore, extracting events
with a substantially small amount of training data
is a new and challenging problem.
2 The event extraction task
2.1 Definition
Event extraction aims to detect the appearance of
an event in the text (e.g., sentence, document) and
arXiv:2210.03419v2 [cs.CL] 10 Oct 2022
its related information. In some cases, a predefined
structure of the event is provided to formulate the
event, such as participants and their relations to the
event. This is called close-domain event extraction,
as the expected structure is provided according to a
certain application. Otherwise, open-domain event
extraction does not require any predefined structure.
In this report, we focus on close-domain event ex-
traction.
ACE-2005 (LDC,2005) defines an event schema
whose terminologies have been widely used in
event extraction:
Event extent is a sentence within which an
event is expressed.
Event trigger is a word or a phrase that most
clearly expresses the occurrence of the event.
In many cases, the event trigger is the main
verb of the sentence expressing the event.
Event arguments are entities that are part of
the event. They include participants and at-
tributes.
Argument role is the relationship between an
event and its arguments.
Based on these terminologies, Ahn (2006) pro-
poses to divide the event extraction into sub-tasks:
trigger detection, trigger classification, argument
detection, and argument classification.
Earlier documents in the case have included em-
barrassing details about perks
Welch
received as
part of
his retirement
package from
GE
at a time
when corporate scandals were sparking outrage.
Trigger retirement
Event type Personnel:End-Position
Person-Arg Welch
Entity-Arg GE
Position-Arg -
Time-Arg -
Place-Arg -
Table 1: Example of a sample in ACE-2005.
2.2 Corpora
The development of event extraction is mostly pro-
moted by the availability of data offered by public
evaluation programs such as Message Understand-
ing Conference (MUC), Automatic Content Ex-
traction (ACE), and Knowledge Base Population
(TAC-KBP).
Event type Event subtype
Life
Be-born, Marry, Divorce, Injure,
Die
Movement Transport
Transaction
Transfer-Ownership, Transfer-
Money
Business
Start-Org, Merge-Org, Declare-
Bankruptcy, End-Org
Conflict Attack, Demonstrate
Contact Meet, Phone-Write
Personnel
Start-Position, End-Position,
Nominate, Elect
Justice
Arrest-Jail, Release-Parole, Trial-
Hearing, Charge-Indict, Sue,
Convict, Sentence, Fine, Execute,
Extradite, Acquit, Appeal, Par-
don
Table 2: List of event types and event subtypes covered
in ACE-2005.
Automatic Content Extraction (ACE-2005)
is
the most widely used corpus in event extraction for
English, Arabic, and Chinese. It annotates entities,
events, relation, and time (LDC,2005). There are
7 categories of entities in ACE-2005 i.e., person,
organization, location, geopolitical entity, facility,
vehicle, and weapon. The ACE-2005 defines 8
event types and 33 event subtypes as presented
in table 2. This dataset annotates 599 documents
from various sources e.g., weblogs, broadcast news,
newsgroups, and broadcast conversation.
TAC-KBP
datasets aim to promote extracting in-
formation from unstructured text that fits the knowl-
edge base. The dataset includes the annotation for
event detection, event coreference, event linking,
argument extraction, and argument linking (Ellis
et al.,2015). The event taxonomy in TAC-KBP is
mostly derived from ACE-2005, with 9 event types
and 38 event subtypes. This dataset is annotated
from 360 documents, of which 158 documents
are used for training and 202 documents are used
for testing. The TAC-KBP 2015 contains docu-
ments for English only (Ellis et al.,2015), whereas
TAC-KBP 2016 includes Chinese and Spanish doc-
uments (Ji et al.,2016).
Many corpora for specific domains have been
published for public use.
MUC
corpora anno-
tate events for various domains such as fleet op-
eration, terrorism, and semiconductor production
(Grishman and Sundheim,1996). The
GENIA
is
an event detection corpus for the biomedical do-
main. It is compiled from scientific documents
from PubMed by the BioNLP Shared Task (Kim
et al.,2009).
TimeBank
annotates 183 English
news articles with event, temporal annotations, and
their links (Pustejovsky et al.,2003). Recently,
event detection has expands to many other fields
such as
CASIE
and
CyberED
for cyber-security
(Satyapanich et al.,2020;Trong et al.,2020),
Lit-
bank
for literature (Sims et al.,2019), and music
(Ding et al.,2011). However, these corpora are
both small in the number of data samples and close
in terms of the domain. Consequently, this limits
the ability of the pre-trained models to perform
tasks in a new domain in real applications.
On the other hand, a general-domain dataset for
event detection is a good fit for real applications be-
cause it offers a much more comprehensive range
of domains and topics. However, manually creat-
ing a large-scale general-domain dataset for ED is
too costly to anyone ever attempt. Instead, general-
domain datasets for event detection have been pro-
duced at a large scale by exploiting a knowledge
base and unlabeled text. Distant supervision and
learning models are two main methods that have
been employed to generate large-scale ED datasets.
Distant supervision (Mintz et al.,2009) is the
most widely use with facts derived from existing
knowledge base such as WordNet (Miller,1995),
FrameNet (Baker et al.,1998), and Freebase (Bol-
lacker et al.,2008).
Chen et al. (2017) proposes an approach to
align key arguments of an event by using Free-
base. Then these arguments are used to detect
the event and its trigger word automatically. The
data is further denoised by using FrameNet (Baker
et al.,1998). Similarly, (Wang et al.,2020b) con-
structs the
MAVEN
dataset from Wikipedia text
and FrameNet. This dataset also offers a tree-like
event schema structure rooted in the word sense
hierarchy in FrameNet.
Similarly, Le and Nguyen (2021) creates
Fed-
Semcor
from WordNet and Word Sense Disam-
biguation dataset. A subset of WordNet synsets that
are more likely eventive is collected and grouped
into event detection classes with similar meanings.
The Semcor is a word sense disambiguation dataset
whose tokens are labeled by WordNet synsets. As
such, to create the event detection, the text from
the Semcor dataset is realigned with the collected
ED classes.
Table 3presents a summary of the existing event
extraction dataset for English.
3 Supervised Learning Models
3.1 Feature-based models
In the early stage of event extraction, most meth-
ods utilize a large set of features (i.e., feature en-
gineering) for statistical classifiers. The features
can be derived from constituent parser (Ahn,2006),
dependency parser(Ahn,2006), POS taggers, unsu-
pervised topic features (Liao and Grishman,2011),
and contextual features (Patwardhan and Riloff,
2009). These models employ statistical models
such as nearest neighbor (Ahn,2006), maximum-
entropy classifier (Liao and Grishman,2011), and
conditional random field (Majumder and Ekbal,
2015).
Ahn (2006) employed a rich feature set of lexical
features, dependency features, and entity features.
The lexical features consist of the word and its
lemma, lowercase, and Part-of-Speech (POS) tag.
The dependency features include the depth of the
word in the dependency tree, the dependency rela-
tion of the trigger, and the POS of the connected
nodes. The context features consist of left/right con-
text, such as lowercase, POS tag, and entity type.
The entity features include the number of depen-
dants, labels, constituent headwords, the number
of entities along a dependency path, and the length
of the path to the closest entity.
Ji and Grishman (2008) further introduced cross-
sentence and cross-document rules to mandate the
consistencies of the classification of triggers and
their arguments in a document. In particular, they
include (1) the consistency of word sense across
sentences in related documents and (2) the consis-
tency of roles and entity types for different men-
tions of the related events.
Patwardhan and Riloff (2009) suggest using con-
textual features such as the lexical head of the
candidate, the semantic class of the lexical head,
lexico-semantic pattern surrounding the candidate.
This information provides rich contextual features
of the words surrounding the candidate and its
lexical-connected words, which provides some sig-
nal for the success of convolutional neural networks
and graph convolutional neural networks based on
the dependency graph in recent studies.
Liao and Grishman (2011) shows that global
topic features can help improve EE performance
on test data, especially for a balanced corpus. The
unsupervised topic model trained on large untagged
corpus can provide underlying relations between
event and entity types. Therefore, it can reduce
the bias introduced in an imbalanced corpus (e.g.,
ACE-2005 dataset).
Majumder and Ekbal (2015) extracts various fea-
tures for biomedical event extraction, such as de-
pendency path and distance to the nearest protein
entity. Since the terminologies in the biomedical
domain follow some particular rules, the suffix-
prefix of words provides substantial semantic infor-
mation about the terms.
Even though tremendous effort has been poured
into feature engineering, feature-based models with
statistical classifiers hinder the application of event
extraction models in practical situations for two
reasons. The first reason is the need for the manual
design of the feature set, which requires research
expertise in both linguistics and the target-specific
domain. Second, since feature extraction tools are
imperfect, their incorrect extracted features can
harm the statistical models. Hence, a model which
can automatically learn would significantly boost
the application of event extraction.
3.2 Neural-based models
As mentioned in the previous section, crafting a
diverse set of lexical, syntactic, semantic, and topic
features requires both linguistic and domain ex-
pertise. This might hinder the adaptability of the
model to real applications where expertise is scarce.
Therefore, instead of manually designing linguistic
features, automatically extracting features is more
practical in virtually every NLP task. Hence, it can
revolutionize the common practice of NLP studies.
Toward this end, the deep neural network is the per-
fect match because of its ability to capture features
from text automatically.
Deep neural networks employing multiple lay-
ers of a large number of artificial neurons have
been adapted to various classification and genera-
tion tasks. In an artificial neural network, a layer
takes input from the output of the lower layer and
transforms it into a more abstract representation
with two exceptions. The lowest layer takes input
as a vector generated from the data sample. The
highest layer usually outputs a score for each of the
classification classes. These scores are used for the
prediction of the label.
3.2.1 Distributed word embedding
Distributed word embedding is one of the most
impactful tools for most NLP tasks, including event
extraction. Word embedding plays a vital role in
transitioning from feature-based to neural-based
modeling. The representation obtained from word
embedding captures a rich set of syntactic features,
semantic features, and knowledge learned from a
large amount of text (Mikolov et al.,2013).
Technically, distributed word embedding is a ma-
trix that can be viewed as a list of low-dimensional
continuous float vectors (Bengio et al.,2003).
Word embedding maps a word into a single vec-
tor within its dictionary. Hence, a sentence can
be encoded into a list of vectors. These vectors
are fed into the neural network. Among tens of
variants, Word2Vec (Mikolov et al.,2013) and
GloVe (Pennington et al.,2014) are the most pop-
ular word embeddings. These word embeddings
were then called context-free embedding to dis-
tinguish against contextualized word embedding,
which was invented a few years after context-free
word embedding.
Contextualized word embedding is one of the
greatest inventions in the field of NLP recently.
Contrary to context-free word embedding, contex-
tualized embedding dynamically encodes the word
in a sentence based on the context presented in the
text (Peters et al.,2018). In addition, the contex-
tualized embeddings are usually trained on a large
text corpus. Hence, its embedding encodes a sub-
stantial amount of knowledge from the text. These
lead to the improvement of virtually every model in
NLP. There have been many variants of contextual-
ized word embedding for general English text e.g.
BERT (Devlin et al.,2019), RoBERTa (Liu et al.,
2019b), multi-lingual text e.g. mBERT (Devlin
et al.,2019), XLM-RoBERTa (Ruder et al.,2019),
scientific document SciBERT (Beltagy et al.,2019),
text generation e.g. GPT2 (Radford et al.,2019).
3.2.2 Convolutional Neural Networks
Nguyen and Grishman (2015) employed a convo-
lutional neural network, inspired by CNNs in com-
puter vision (LeCun et al.,1998) and NLP (Kalch-
brenner et al.,2014), that automatically learns the
features from the text, and minimizes the effort
spent on feature extraction. Instead of producing
a large vector representation for each sample, i.e.,
tens of thousands of dimensions, this model em-
ploys three much smaller word embedding vectors
with just a few hundred dimensions. As shown in
Figure 1: The convolutional neural network model for event detection in (Nguyen and Grishman,2015).
Figure 1, given a sentence with marked entities,
each word in the sentence is represented by a low-
dimension vector concatenated from (1)the word
embedding, (2) the relative position embedding,
and (3) the entity type embedding. The vectors
of words then form a matrix working as the repre-
sentation of the sentence. The matrix is then fed
to multiple stacks of a convolutional layer, a max-
pooling layer, and a fully connected layer. The
model is trained using the gradient descent algo-
rithm with cross-entropy loss. Some regularization
techniques are applied to improve the model, such
as mini-batch training, adaptive learning rate opti-
mizer, and weight normalization.
Many efforts have introduced different pooling
techniques to extract meaningful information for
event extract from what is provided in the sentence.
Chen et al. (2015) improved the CNN model by
using multi-pooling (DMCNN) instead of vanilla
max-pooling. In this model, the sentence is split
into multiple parts by either the examining event
trigger or the given entity markers. The pooling
layer is applied separately on each part of the sen-
tence. Zhang et al. (2016) proposed skip-window
convolution neural networks (S-CNNs) to extract
global structured features. The model effectively
captures the global dependencies of every token in
the sentence. Li et al. (2018) proposed a parallel
multi-pooling convolutional neural network (PM-
CNN) that applies not only multiple pooling for
the examining event trigger and entities but also to
every other trigger and argument that appear in the
sentence. This helps to capture the compositional
semantic features of the sentence.
Kodelja et al. (2019) integrated the global repre-
sentation of contexts beyond the sentence level into
the convolutional neural network. To generate the
global representation in connection with the target
event detection task, they label the whole given
document using a bootstrapping model. The boot-
strapping model is based on the usual CNN model.
The predictions for every token are aggregated to
generate the global representation.
Even though CNN, together with the distributed
word representations, can automatically capture
local features, EE models based on CNN are not
successful at capturing long-range dependency be-
tween words. The reason is that CNN can only
model the short-range dependencies within the win-
dow of its kernel. Moreover, a large amount of in-
formation is lost because of the pooling operations
(e.g. max pooling). As such, a more sophisticated
neural network design is needed to model the long-
range dependency between words in long sentences
and documents without sacrificing information.
摘要:

EventExtraction:ASurveyVietDacLaiDepartmentofComputerScienceUniversityofOregonEugene,OR97403,USAvietl@cs.uoregon.eduAbstractExtractingthereportedeventsfromtextisoneofthekeyresearchthemesinnaturallanguageprocessing.Thisprocessincludesseveraltaskssuchaseventdetection,argumentextraction,rolelabeling.As...

展开>> 收起<<
Event Extraction A Survey Viet Dac Lai Department of Computer Science.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:2.55MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注