an event detection corpus for the biomedical do-
main. It is compiled from scientific documents
from PubMed by the BioNLP Shared Task (Kim
et al.,2009).
TimeBank
annotates 183 English
news articles with event, temporal annotations, and
their links (Pustejovsky et al.,2003). Recently,
event detection has expands to many other fields
such as
CASIE
and
CyberED
for cyber-security
(Satyapanich et al.,2020;Trong et al.,2020),
Lit-
bank
for literature (Sims et al.,2019), and music
(Ding et al.,2011). However, these corpora are
both small in the number of data samples and close
in terms of the domain. Consequently, this limits
the ability of the pre-trained models to perform
tasks in a new domain in real applications.
On the other hand, a general-domain dataset for
event detection is a good fit for real applications be-
cause it offers a much more comprehensive range
of domains and topics. However, manually creat-
ing a large-scale general-domain dataset for ED is
too costly to anyone ever attempt. Instead, general-
domain datasets for event detection have been pro-
duced at a large scale by exploiting a knowledge
base and unlabeled text. Distant supervision and
learning models are two main methods that have
been employed to generate large-scale ED datasets.
Distant supervision (Mintz et al.,2009) is the
most widely use with facts derived from existing
knowledge base such as WordNet (Miller,1995),
FrameNet (Baker et al.,1998), and Freebase (Bol-
lacker et al.,2008).
Chen et al. (2017) proposes an approach to
align key arguments of an event by using Free-
base. Then these arguments are used to detect
the event and its trigger word automatically. The
data is further denoised by using FrameNet (Baker
et al.,1998). Similarly, (Wang et al.,2020b) con-
structs the
MAVEN
dataset from Wikipedia text
and FrameNet. This dataset also offers a tree-like
event schema structure rooted in the word sense
hierarchy in FrameNet.
Similarly, Le and Nguyen (2021) creates
Fed-
Semcor
from WordNet and Word Sense Disam-
biguation dataset. A subset of WordNet synsets that
are more likely eventive is collected and grouped
into event detection classes with similar meanings.
The Semcor is a word sense disambiguation dataset
whose tokens are labeled by WordNet synsets. As
such, to create the event detection, the text from
the Semcor dataset is realigned with the collected
ED classes.
Table 3presents a summary of the existing event
extraction dataset for English.
3 Supervised Learning Models
3.1 Feature-based models
In the early stage of event extraction, most meth-
ods utilize a large set of features (i.e., feature en-
gineering) for statistical classifiers. The features
can be derived from constituent parser (Ahn,2006),
dependency parser(Ahn,2006), POS taggers, unsu-
pervised topic features (Liao and Grishman,2011),
and contextual features (Patwardhan and Riloff,
2009). These models employ statistical models
such as nearest neighbor (Ahn,2006), maximum-
entropy classifier (Liao and Grishman,2011), and
conditional random field (Majumder and Ekbal,
2015).
Ahn (2006) employed a rich feature set of lexical
features, dependency features, and entity features.
The lexical features consist of the word and its
lemma, lowercase, and Part-of-Speech (POS) tag.
The dependency features include the depth of the
word in the dependency tree, the dependency rela-
tion of the trigger, and the POS of the connected
nodes. The context features consist of left/right con-
text, such as lowercase, POS tag, and entity type.
The entity features include the number of depen-
dants, labels, constituent headwords, the number
of entities along a dependency path, and the length
of the path to the closest entity.
Ji and Grishman (2008) further introduced cross-
sentence and cross-document rules to mandate the
consistencies of the classification of triggers and
their arguments in a document. In particular, they
include (1) the consistency of word sense across
sentences in related documents and (2) the consis-
tency of roles and entity types for different men-
tions of the related events.
Patwardhan and Riloff (2009) suggest using con-
textual features such as the lexical head of the
candidate, the semantic class of the lexical head,
lexico-semantic pattern surrounding the candidate.
This information provides rich contextual features
of the words surrounding the candidate and its
lexical-connected words, which provides some sig-
nal for the success of convolutional neural networks
and graph convolutional neural networks based on
the dependency graph in recent studies.
Liao and Grishman (2011) shows that global
topic features can help improve EE performance
on test data, especially for a balanced corpus. The