Event Extraction A Survey Viet Dac Lai Department of Computer Science

2025-04-27 0 0 2.55MB 30 页 10玖币

侵权投诉

Event Extraction: A Survey

Viet Dac Lai

Department of Computer Science

University of Oregon

Eugene, OR 97403, USA

vietl@cs.uoregon.edu

Abstract

Extracting the reported events from text is one

of the key research themes in natural language

processing. This process includes several tasks

such as event detection, argument extraction,

role labeling. As one of the most important

topics in natural language processing and nat-

ural language understanding, the applications

of event extraction spans across a wide range

of domains such as newswire, biomedical do-

main, history and humanity, and cyber security.

This report presents a comprehensive survey

for event detection from textual documents. In

this report, we provide the task deﬁnition, the

evaluation method, as well as the benchmark

datasets and a taxonomy of methodologies for

event extraction. We also present our vision of

future research direction in event detection.

1 Introduction

Event Extraction (EE) is an essential task in Infor-

mation Extraction (IE) in Natural Language Pro-

cessing (NLP). An event is an occurrence of an

activity that happens at a particular time and place,

or it might be described as a change of state (LDC,

2005). The main task of event extraction is to detect

events in the text (i.e., event detection) and then sort

them into some classes of interest (i.e., event clas-

siﬁcation). The second task involves detecting the

event participants (i.e., argument extraction) and

their attributes (e.g., argument role labeling). In

short, event extraction structures the unstructured

text by answering the WH questions of an event

(i.e., what, who, when, where, why, and how).

Event extraction plays an important role in vari-

ous natural language processing applications. For

instance, the extracted event can be used to con-

struct knowledge bases on which people can per-

form logical queries easily (Ge et al.,2018). Many

domains can beneﬁt from the development of event

extraction research. In the biomedical domain,

event extraction can be used to extract interaction

between biomolecules (e.g., protein-protein inter-

actions) that have been described in the biomedi-

cal literature (Kim et al.,2009). In the economic

domain, events reported on social media and so-

cial networks can be used for measuring socio-

economic indicators (Min and Zhao,2019). Re-

cently, event extraction has been adopted in many

other domains such as: literature (Sims et al.,2019),

cyber security (Man Duc Trong et al.,2020), his-

tory (Sprugnoli and Tonelli,2019), and humanity

(Lai et al.,2021b).

Even though event extraction has been studied

for decades, it is still a very challenging task. To

perform the event extraction, a system needs to

understand the text’s semantics and ambiguity and

organize the extracted information into structures

(LDC,2005).

It closely connects with other natural language

processing tasks such as named entity recognition

(NER), entity linking (EL), and dependency pars-

ing. Although these tasks can boost the develop-

ment of event extraction (McClosky et al.,2011),

they might have an inverse impact on the perfor-

mance of the event extraction systems (Zhang et al.,

2018), depending on how the output of these tasks

is exploited.

Last but not least, lacking training data is a fun-

damental problem in expanding event extraction

to a new domain because the traditional classiﬁca-

tion model requires a large amount of training data

(Huang et al.,2018b). Therefore, extracting events

with a substantially small amount of training data

is a new and challenging problem.

2 The event extraction task

2.1 Deﬁnition

Event extraction aims to detect the appearance of

an event in the text (e.g., sentence, document) and

arXiv:2210.03419v2 [cs.CL] 10 Oct 2022

its related information. In some cases, a predeﬁned

structure of the event is provided to formulate the

event, such as participants and their relations to the

event. This is called close-domain event extraction,

as the expected structure is provided according to a

certain application. Otherwise, open-domain event

extraction does not require any predeﬁned structure.

In this report, we focus on close-domain event ex-

traction.

ACE-2005 (LDC,2005) deﬁnes an event schema

whose terminologies have been widely used in

event extraction:

•

Event extent is a sentence within which an

event is expressed.

•

Event trigger is a word or a phrase that most

clearly expresses the occurrence of the event.

In many cases, the event trigger is the main

verb of the sentence expressing the event.

•

Event arguments are entities that are part of

the event. They include participants and at-

tributes.

•

Argument role is the relationship between an

event and its arguments.

Based on these terminologies, Ahn (2006) pro-

poses to divide the event extraction into sub-tasks:

trigger detection, trigger classiﬁcation, argument

detection, and argument classiﬁcation.

Earlier documents in the case have included em-

barrassing details about perks

Welch

received as

part of

his retirement

package from

at a time

when corporate scandals were sparking outrage.

Trigger retirement

Event type Personnel:End-Position

Person-Arg Welch

Entity-Arg GE

Position-Arg -

Time-Arg -

Place-Arg -

Table 1: Example of a sample in ACE-2005.

2.2 Corpora

The development of event extraction is mostly pro-

moted by the availability of data offered by public

evaluation programs such as Message Understand-

ing Conference (MUC), Automatic Content Ex-

traction (ACE), and Knowledge Base Population

(TAC-KBP).

Event type Event subtype

Life

Be-born, Marry, Divorce, Injure,

Die

Movement Transport

Transaction

Transfer-Ownership, Transfer-

Money

Business

Start-Org, Merge-Org, Declare-

Bankruptcy, End-Org

Conﬂict Attack, Demonstrate

Contact Meet, Phone-Write

Personnel

Start-Position, End-Position,

Nominate, Elect

Justice

Arrest-Jail, Release-Parole, Trial-

Hearing, Charge-Indict, Sue,

Convict, Sentence, Fine, Execute,

Extradite, Acquit, Appeal, Par-

don

Table 2: List of event types and event subtypes covered

in ACE-2005.

Automatic Content Extraction (ACE-2005)

the most widely used corpus in event extraction for

English, Arabic, and Chinese. It annotates entities,

events, relation, and time (LDC,2005). There are

7 categories of entities in ACE-2005 i.e., person,

organization, location, geopolitical entity, facility,

vehicle, and weapon. The ACE-2005 deﬁnes 8

event types and 33 event subtypes as presented

in table 2. This dataset annotates 599 documents

from various sources e.g., weblogs, broadcast news,

newsgroups, and broadcast conversation.

TAC-KBP

datasets aim to promote extracting in-

formation from unstructured text that ﬁts the knowl-

edge base. The dataset includes the annotation for

event detection, event coreference, event linking,

argument extraction, and argument linking (Ellis

et al.,2015). The event taxonomy in TAC-KBP is

mostly derived from ACE-2005, with 9 event types

and 38 event subtypes. This dataset is annotated

from 360 documents, of which 158 documents

are used for training and 202 documents are used

for testing. The TAC-KBP 2015 contains docu-

ments for English only (Ellis et al.,2015), whereas

TAC-KBP 2016 includes Chinese and Spanish doc-

uments (Ji et al.,2016).

Many corpora for speciﬁc domains have been

published for public use.

MUC

corpora anno-

tate events for various domains such as ﬂeet op-

eration, terrorism, and semiconductor production

(Grishman and Sundheim,1996). The

GENIA

an event detection corpus for the biomedical do-

main. It is compiled from scientiﬁc documents

from PubMed by the BioNLP Shared Task (Kim

et al.,2009).

TimeBank

annotates 183 English

news articles with event, temporal annotations, and

their links (Pustejovsky et al.,2003). Recently,

event detection has expands to many other ﬁelds

such as

CASIE

and

CyberED

for cyber-security

(Satyapanich et al.,2020;Trong et al.,2020),

Lit-

bank

for literature (Sims et al.,2019), and music

(Ding et al.,2011). However, these corpora are

both small in the number of data samples and close

in terms of the domain. Consequently, this limits

the ability of the pre-trained models to perform

tasks in a new domain in real applications.

On the other hand, a general-domain dataset for

event detection is a good ﬁt for real applications be-

cause it offers a much more comprehensive range

of domains and topics. However, manually creat-

ing a large-scale general-domain dataset for ED is

too costly to anyone ever attempt. Instead, general-

domain datasets for event detection have been pro-

duced at a large scale by exploiting a knowledge

base and unlabeled text. Distant supervision and

learning models are two main methods that have

been employed to generate large-scale ED datasets.

Distant supervision (Mintz et al.,2009) is the

most widely use with facts derived from existing

knowledge base such as WordNet (Miller,1995),

FrameNet (Baker et al.,1998), and Freebase (Bol-

lacker et al.,2008).

Chen et al. (2017) proposes an approach to

align key arguments of an event by using Free-

base. Then these arguments are used to detect

the event and its trigger word automatically. The

data is further denoised by using FrameNet (Baker

et al.,1998). Similarly, (Wang et al.,2020b) con-

structs the

MAVEN

dataset from Wikipedia text

and FrameNet. This dataset also offers a tree-like

event schema structure rooted in the word sense

hierarchy in FrameNet.

Similarly, Le and Nguyen (2021) creates

Fed-

Semcor

from WordNet and Word Sense Disam-

biguation dataset. A subset of WordNet synsets that

are more likely eventive is collected and grouped

into event detection classes with similar meanings.

The Semcor is a word sense disambiguation dataset

whose tokens are labeled by WordNet synsets. As

such, to create the event detection, the text from

the Semcor dataset is realigned with the collected

ED classes.

Table 3presents a summary of the existing event

extraction dataset for English.

3 Supervised Learning Models

3.1 Feature-based models

In the early stage of event extraction, most meth-

ods utilize a large set of features (i.e., feature en-

gineering) for statistical classiﬁers. The features

can be derived from constituent parser (Ahn,2006),

dependency parser(Ahn,2006), POS taggers, unsu-

pervised topic features (Liao and Grishman,2011),

and contextual features (Patwardhan and Riloff,

2009). These models employ statistical models

such as nearest neighbor (Ahn,2006), maximum-

entropy classiﬁer (Liao and Grishman,2011), and

conditional random ﬁeld (Majumder and Ekbal,

2015).

Ahn (2006) employed a rich feature set of lexical

features, dependency features, and entity features.

The lexical features consist of the word and its

lemma, lowercase, and Part-of-Speech (POS) tag.

The dependency features include the depth of the

word in the dependency tree, the dependency rela-

tion of the trigger, and the POS of the connected

nodes. The context features consist of left/right con-

text, such as lowercase, POS tag, and entity type.

The entity features include the number of depen-

dants, labels, constituent headwords, the number

of entities along a dependency path, and the length

of the path to the closest entity.

Ji and Grishman (2008) further introduced cross-

sentence and cross-document rules to mandate the

consistencies of the classiﬁcation of triggers and

their arguments in a document. In particular, they

include (1) the consistency of word sense across

sentences in related documents and (2) the consis-

tency of roles and entity types for different men-

tions of the related events.

Patwardhan and Riloff (2009) suggest using con-

textual features such as the lexical head of the

candidate, the semantic class of the lexical head,

lexico-semantic pattern surrounding the candidate.

This information provides rich contextual features

of the words surrounding the candidate and its

lexical-connected words, which provides some sig-

nal for the success of convolutional neural networks

and graph convolutional neural networks based on

the dependency graph in recent studies.

Liao and Grishman (2011) shows that global

topic features can help improve EE performance

on test data, especially for a balanced corpus. The

unsupervised topic model trained on large untagged

corpus can provide underlying relations between

event and entity types. Therefore, it can reduce

the bias introduced in an imbalanced corpus (e.g.,

ACE-2005 dataset).

Majumder and Ekbal (2015) extracts various fea-

tures for biomedical event extraction, such as de-

pendency path and distance to the nearest protein

entity. Since the terminologies in the biomedical

domain follow some particular rules, the sufﬁx-

preﬁx of words provides substantial semantic infor-

mation about the terms.

Even though tremendous effort has been poured

into feature engineering, feature-based models with

statistical classiﬁers hinder the application of event

extraction models in practical situations for two

reasons. The ﬁrst reason is the need for the manual

design of the feature set, which requires research

expertise in both linguistics and the target-speciﬁc

domain. Second, since feature extraction tools are

imperfect, their incorrect extracted features can

harm the statistical models. Hence, a model which

can automatically learn would signiﬁcantly boost

the application of event extraction.

3.2 Neural-based models

As mentioned in the previous section, crafting a

diverse set of lexical, syntactic, semantic, and topic

features requires both linguistic and domain ex-

pertise. This might hinder the adaptability of the

model to real applications where expertise is scarce.

Therefore, instead of manually designing linguistic

features, automatically extracting features is more

practical in virtually every NLP task. Hence, it can

revolutionize the common practice of NLP studies.

Toward this end, the deep neural network is the per-

fect match because of its ability to capture features

from text automatically.

Deep neural networks employing multiple lay-

ers of a large number of artiﬁcial neurons have

been adapted to various classiﬁcation and genera-

tion tasks. In an artiﬁcial neural network, a layer

takes input from the output of the lower layer and

transforms it into a more abstract representation

with two exceptions. The lowest layer takes input

as a vector generated from the data sample. The

highest layer usually outputs a score for each of the

classiﬁcation classes. These scores are used for the

prediction of the label.

3.2.1 Distributed word embedding

Distributed word embedding is one of the most

impactful tools for most NLP tasks, including event

extraction. Word embedding plays a vital role in

transitioning from feature-based to neural-based

modeling. The representation obtained from word

embedding captures a rich set of syntactic features,

semantic features, and knowledge learned from a

large amount of text (Mikolov et al.,2013).

Technically, distributed word embedding is a ma-

trix that can be viewed as a list of low-dimensional

continuous ﬂoat vectors (Bengio et al.,2003).

Word embedding maps a word into a single vec-

tor within its dictionary. Hence, a sentence can

be encoded into a list of vectors. These vectors

are fed into the neural network. Among tens of

variants, Word2Vec (Mikolov et al.,2013) and

GloVe (Pennington et al.,2014) are the most pop-

ular word embeddings. These word embeddings

were then called context-free embedding to dis-

tinguish against contextualized word embedding,

which was invented a few years after context-free

word embedding.

Contextualized word embedding is one of the

greatest inventions in the ﬁeld of NLP recently.

Contrary to context-free word embedding, contex-

tualized embedding dynamically encodes the word

in a sentence based on the context presented in the

text (Peters et al.,2018). In addition, the contex-

tualized embeddings are usually trained on a large

text corpus. Hence, its embedding encodes a sub-

stantial amount of knowledge from the text. These

lead to the improvement of virtually every model in

NLP. There have been many variants of contextual-

ized word embedding for general English text e.g.

BERT (Devlin et al.,2019), RoBERTa (Liu et al.,

2019b), multi-lingual text e.g. mBERT (Devlin

et al.,2019), XLM-RoBERTa (Ruder et al.,2019),

scientiﬁc document SciBERT (Beltagy et al.,2019),

text generation e.g. GPT2 (Radford et al.,2019).

3.2.2 Convolutional Neural Networks

Nguyen and Grishman (2015) employed a convo-

lutional neural network, inspired by CNNs in com-

puter vision (LeCun et al.,1998) and NLP (Kalch-

brenner et al.,2014), that automatically learns the

features from the text, and minimizes the effort

spent on feature extraction. Instead of producing

a large vector representation for each sample, i.e.,

tens of thousands of dimensions, this model em-

ploys three much smaller word embedding vectors

with just a few hundred dimensions. As shown in

Figure 1: The convolutional neural network model for event detection in (Nguyen and Grishman,2015).

Figure 1, given a sentence with marked entities,

each word in the sentence is represented by a low-

dimension vector concatenated from (1)the word

embedding, (2) the relative position embedding,

and (3) the entity type embedding. The vectors

of words then form a matrix working as the repre-

sentation of the sentence. The matrix is then fed

to multiple stacks of a convolutional layer, a max-

pooling layer, and a fully connected layer. The

model is trained using the gradient descent algo-

rithm with cross-entropy loss. Some regularization

techniques are applied to improve the model, such

as mini-batch training, adaptive learning rate opti-

mizer, and weight normalization.

Many efforts have introduced different pooling

techniques to extract meaningful information for

event extract from what is provided in the sentence.

Chen et al. (2015) improved the CNN model by

using multi-pooling (DMCNN) instead of vanilla

max-pooling. In this model, the sentence is split

into multiple parts by either the examining event

trigger or the given entity markers. The pooling

layer is applied separately on each part of the sen-

tence. Zhang et al. (2016) proposed skip-window

convolution neural networks (S-CNNs) to extract

global structured features. The model effectively

captures the global dependencies of every token in

the sentence. Li et al. (2018) proposed a parallel

multi-pooling convolutional neural network (PM-

CNN) that applies not only multiple pooling for

the examining event trigger and entities but also to

every other trigger and argument that appear in the

sentence. This helps to capture the compositional

semantic features of the sentence.

Kodelja et al. (2019) integrated the global repre-

sentation of contexts beyond the sentence level into

the convolutional neural network. To generate the

global representation in connection with the target

event detection task, they label the whole given

document using a bootstrapping model. The boot-

strapping model is based on the usual CNN model.

The predictions for every token are aggregated to

generate the global representation.

Even though CNN, together with the distributed

word representations, can automatically capture

local features, EE models based on CNN are not

successful at capturing long-range dependency be-

tween words. The reason is that CNN can only

model the short-range dependencies within the win-

dow of its kernel. Moreover, a large amount of in-

formation is lost because of the pooling operations

(e.g. max pooling). As such, a more sophisticated

neural network design is needed to model the long-

range dependency between words in long sentences

and documents without sacriﬁcing information.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EventExtraction:ASurveyVietDacLaiDepartmentofComputerScienceUniversityofOregonEugene,OR97403,USAvietl@cs.uoregon.eduAbstractExtractingthereportedeventsfromtextisoneofthekeyresearchthemesinnaturallanguageprocessing.Thisprocessincludesseveraltaskssuchaseventdetection,argumentextraction,rolelabeling.As...

展开>> 收起<<

Event Extraction A Survey Viet Dac Lai Department of Computer Science.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Event Extraction A Survey Viet Dac Lai Department of Computer Science

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: