CrisisLTLSum A Benchmark for Local Crisis Event Timeline Extraction and Summarization Hossein Rajaby Faghihi1 Bashar Alhafni2 Ke Zhang3 Shihao Ran3

2025-05-06 0 0 2.38MB 23 页 10玖币

侵权投诉

CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction

and Summarization

Hossein Rajaby Faghihi1∗

, Bashar Alhafni2∗, Ke Zhang3, Shihao Ran3,

Joel Tetreault3,Alejandro Jaimes3

1Michigan State University, 2New York University Abu Dhabi,

3Dataminr, Inc.

rajabyfa@msu.edu,alhafni@nyu.edu,

{kzhang,sran,jtetreault,ajaimes}@dataminr.com

Abstract

Social media has increasingly played a key

role in emergency response: ﬁrst responders

can use public posts to better react to ongo-

ing crisis events and deploy the necessary re-

sources where they are most needed. Timeline

extraction and abstractive summarization are

critical technical tasks to leverage large num-

bers of social media posts about events. Un-

fortunately, there are few datasets for bench-

marking technical approaches for those tasks.

This paper presents CrisisLTLSum, the largest

dataset of local crisis event timelines avail-

able to date. CrisisLTLSum contains 1,000 cri-

sis event timelines across four domains: wild-

ﬁres, local ﬁres, trafﬁc, and storms. We built

CrisisLTLSum using a semi-automated cluster-

then-reﬁne approach to collect data from the

public Twitter stream. Our initial experiments

indicate a signiﬁcant gap between the perfor-

mance of strong baselines compared to the hu-

man performance on both tasks. Our dataset,

code, and models are publicly available.1

1 Introduction

We present CrisisLTLSum, the ﬁrst dataset on ex-

traction and summarization of local crisis event

timelines from Twitter. An example of an anno-

tated timeline in CrisisLTLSum is shown in Figure

1. A timeline is a chronologically sorted set of

posts, where each brings in new information or up-

dates about an ongoing event (such as a ﬁre, storm,

or trafﬁc incident). CrisisLTLSum supports two

complex downstream tasks:

timeline extraction

and

timeline summarization

. As shown in Figure

1, the

timeline extraction

task is formalized as:

given a seed tweet as the initial mention of a crisis

event, extract relevant tweets with updates on the

same event from the incoming noisy tweet stream.

This task is crucial for real-time event tracking.

∗Work done as Research Interns at Dataminr, Inc.

1https://github.com/CrisisLTLSum/

CrisisTimelines

Timeline

Part of the

Timeline

June 21

00:04

#BREAKING Smoke from vegetation fire near Friant and Rice

Road fills skies of Fresno amid strong winds (Seed Tweet)

June 21

0:18

#BREAKING A large fire burning in northeast Fresno near

Woodward Lake Thursday sent plumes of smoke into the air

above the city.

✔

June 21

0:24

The season is upon us. Strong winds and fire. Stay safe and stay

inside when you smell/see the smoke and dust. Word is that this

is a grass fire at Friant and Rice.

✗

(Repetitive)

June 21

0:40

A large fire is burning along Friant Road. You can watch

@VanessaABC30 giving an update on what we know right now. ✗

(Repetitive)

June 21

1:08

Winds have picked up immensely. Flames have reached Friant Rd.

Windy conditions making it difficult for firefighters to contain the

blaze.

✔

June 21

1:18

UPDATE: @VanessaABC30 talking to CAL FIRE about the fire

burning along Rice Rd. & Friant Rd. Officials say the fire started

as a commercial fire. Officials say there is a shelter in place order,

but no evacuations at this point.

✔

June 21

1:31

Kevin Larrivee captured this video of a grass fire burning right

now on Friant Rd, north of Woodward Park. The wind is crazy

today. As you can see, that wind is pushing the fire quickly.

✔

June 21

1:33

Fire crews are working a vegetation fire near Friant and Rice

roads. Please use caution when driving in the area and follow all

directions from emergency personnel.

✔

Summary 1

Cal Fire officials have issued a shelter in place for residents

northeast of Fresno near Woodward lake due to a vegetation fire.

Strong winds have made it difficult for firefighters as it pushes it

closer to Friant and Rice roads and officials urge motorists to use

caution when driving in the area.

Summary 2

A vegetation fire northeast of Fresno combined with windy

conditions to cause firefighters problems. There was a shelter in

place order given but no evacuation orders were given.

Figure 1: This is a sample annotated timeline from Cri-

sisLTLSum. The noisy timeline is the set of tweets

sorted chronologically. The ﬁrst tweet is the seed of

the event. 3means that the tweet is annotated to be

part of the timeline and 7indicates that the tweet is ex-

cluded. The reason for the exclusion is written under

the mark.

The

timeline summarization

task aims to gener-

ate abstractive summaries of evolving events by

aggregating important details from temporal and

incremental information.

CrisisLTLSum can facilitate research in two di-

rections: 1) NLP for Social Good (crisis domain),

and 2) natural language inference and generation,

i.e., timeline extraction and summarization tasks.

Here, we discuss the importance and the differ-

ences of CrisisLTLSum compared to previous work

for both of these aspects. Towards the ﬁrst direc-

tion, the extraction of real-time crisis-relevant in-

formation from microblogs (Zhang and Eick,2019;

Mamo et al.,2021) plays a vital role in providing

time-sensitive information to help ﬁrst responders

arXiv:2210.14190v1 [cs.CL] 25 Oct 2022

understand ongoing situations and plan relief ef-

forts accordingly (Sarter and Woods,1991). Cri-

sisLTLSum goes beyond the task of categorizing

each single crisis-relevant post independently (Im-

ran et al.,2013;Olteanu et al.,2014;Imran et al.,

2016;Alam et al.,2018;Wiegmann et al.,2020;

Alam et al.,2021a,b) and enables a more challeng-

ing task for extracting new updates of an ongoing

crisis event from incoming posts and summarizing

them with respect to the important event details.

This can help provide time-sensitive updates while

avoiding missing critical information in the bulk

of the posts in microblogs due to the high volume

of redundant and noisy information (Alam et al.,

2021a). To the best of our knowledge, this is the

ﬁrst annotated dataset for such an extraction task,

while this problem has been tackled before in unsu-

pervised settings (Zhang et al.,2018).

Moreover, we focus on the extraction of

local

cri-

sis events. The term “local” indicates that an event

is bound to an exact location, such as a building,

a street, or a county, and usually lasts for a short

period. Building a corpus of local crisis events is

particularly useful for ﬁrst responders but also chal-

lenging because the timelines of these events are

often not captured in existing knowledge sources.

This means one has to design mechanisms for au-

tomatically detecting and tracking events directly

from the Twitter stream, which is especially hard

for existing clustering methods (Guille and Favre,

2015;Asgari-Chenaghlu et al.,2021) given the low

number of available tweets for each local event.

For the second point, CrisisLTLSum enables

NLP research on the complex tasks of timeline

extraction and abstractive summarization. These

tasks are particularly challenging in the context of

social media. First, the process of identifying and

extracting relevant updates for a speciﬁc event has

to contend with the large volume of noise (Alam

et al.,2021a) and informal tone (Rudra et al.,2018)

compared to other domains such as news. Addition-

ally, summarizing an on-going event helps toward a

quick and better understanding of its progress. This

requires a good level of abstraction with important

details covered and properly presented (e.g., the

temporal order of event evolution). CrisisLTLSum

is the ﬁrst dataset to provide human-written time-

line summaries to support research in this direction.

CrisisLTLSum is developed through a two-step

semi-automated process to create 1,000 local crisis

timelines from the public Twitter stream. To our

best knowledge, this is the ﬁrst timeline dataset

focusing on “local” crisis events with the largest

number of unique events. The contributions of this

paper are as follows:

•

We propose CrisisLTLSum, which is the

largest dataset over local crisis event timelines.

Notably, this is the ﬁrst benchmark for ab-

stractive timeline summarization in the crisis

domain or on Twitter.

•

We develop strong baselines for both tasks,

and our experiments show a considerable

gap between these models and human per-

formance, indicating the importance of this

dataset for enabling future research on extract-

ing timelines of crisis event updates and sum-

marizing them.

2 Related Work

Our work in this paper is related to two main direc-

tions of crisis domain datasets for NLP and timeline

summarization.

Crisis Datasets for NLP:

Prior research has in-

vestigated generating datasets from online social

media (e.g., Twitter) on large scale crisis events,

while providing labels for event categories (Wieg-

mann et al.,2020;Imran et al.,2013), humanitarian

types and sub-types (Olteanu et al.,2014;Imran

et al.,2016;Alam et al.,2018;Wiegmann et al.,

2020;Arachie et al.,2020;Alam et al.,2021a,b),

actionable information (McCreadie et al.,2019),

or witness levels (Zahra et al.,2020) of each cri-

sis related post. While existing datasets on crisis

event timelines (Binh Tran et al.,2013;Tran et al.,

2015;Pasquali et al.,2021) are limited to a small

set of large-scale events, CrisisLTLSum covers a

thousand timelines compared to only tens of events

covered by each of the existing datasets. Addition-

ally, we further go beyond the simple tweet catego-

rization by enabling the extraction of information

that include updates over the events’ progress.

Timeline Summarization:

Timeline summa-

rization (TLS) was ﬁrstly proposed in Allan et al.

(2001), which extracts a single sentence from the

news stream of an event topic. In general, the

TLS task aims to summarize the target’s evolution

(e.g., a topic or an entity) in a timeline (Martschat

and Markert,2018;Ghalandari and Ifrim,2020).

Existing approaches of TLS are mainly based on

extractive

methods, which are often grouped into

several categories. For instance, Update Summa-

rization (Dang et al.,2008;Li et al.,2009) aims to

update the previous summary given new informa-

tion at a later time, while Timeline Generation (Yan

et al.,2011;Tran et al.,2015;Martschat and Mark-

ert,2018) aims to generate itemized summaries as

the timeline, where each item is extracted by ﬁnd-

ing important temporal points (e.g., spikes, changes

or clusters) and selecting representative sentences.

Another category, Temporal Summarization, was

ﬁrst proposed in the TREC shared task (Aslam

et al.,2013) with follow-up work (Kedzie et al.,

2015), which targets extracting sentences from a

large volume of news streams and social media

posts as updates for large events. Temporal Sum-

marization is close to the ﬁrst task (Timeline Ex-

traction) proposed in CrisisLTLSum.

There have been a few recent works on

abstrac-

tive

timeline summarization across different do-

mains, e.g., biography (Chen et al.,2019), nar-

ratives (Barros et al.,2019), and news headlines

(Steen and Markert,2019), where the human-

written summaries are directly collected from the

web. The abstractive summarization goal is to gen-

erate a set of sentences summarizing the context of

interest without taking the exact words or phrases

from the original text but rather by combining them

and summarizing the important content. To our best

knowledge, CrisisLTLSum is the ﬁrst to provide

human-written summaries for crisis event timelines

collected from noisy social media stream. Recent

research (Nguyen et al.,2018) has also investigated

the summarization task based on tweets in other

domains which essentially do not reﬂect the chal-

lenges in the summarization of an evolving event.

3 CrisisLTLSum Collection

This section presents our semi-automated approach

to collect CrisisLTLSum. We ﬁrst extract clusters

of tweets as noisy timelines and then reﬁne them

via human annotation to get clean timelines that

only include non-redundant, informative, and rele-

vant tweets.

3.1 Noisy Timeline Collection

Figure 2shows the process for generating a set of

noisy timelines starting from the Twitter stream

and followed by pre-processing and knowledge

enhancement steps, the online clustering method,

and post-processing & cleaning steps.

Location, Time, and Keywords Filtering

limit the incoming tweets to speciﬁc geographical

areas, periods, and domains of interest.

OpenIE

Massﬁre

building

Twitter Stream

Location & Date

Filtering Keywords Filtering

Filtering Mechanism

Tweet 1

Tweet N

AllenNLP

CAL FIRE

Biden, USA

Category

#wildﬁre

#Surfside

Entity Extraction & Augmentation

Timeline 1 Tweet 1 Tweet 2 Tweet 3 Tweet 4

Timeline 2 Tweet 1 Tweet 2 Tweet 3

Timeline N Tweet 1 Tweet 2 Tweet 3 Tweet 4

Tweet 5

Online Clustering

Timeline 1 Tweet 1 Tweet 2 Tweet 3

Timeline 2 Tweet 1 Tweet 2

Timeline N-2 Tweet 1 Tweet 2 Tweet 4

Tweet 5

Merge, Noise/Duplicates Removal

Location

38.8794°N 86°0530W

30.3760°N 86°3663W

Figure 2: The process of noisy timeline collection. The

output of this step are noisy clusters which are used to

create the dataset.

The location ﬁltering relies on a list of location

candidates created by gathering cities, towns, and

famous neighborhoods in a big area of interest. A

tweet is considered relevant to our area of interest

if 1) the text mentions one of the candidates, 2)

the geo-tag matches the area of interest, or 3) the

user location matches the area of interest. To limit

the tweets to a speciﬁed crisis domain, we curate

domain-speciﬁc keywords and only select tweets

with phrases matching one of the keywords. This

approach is not comprehensive or exhaustive but

somewhat representative of each crisis domain. Im-

proving this method to be more encompassing is

an area for future research. The combinations of

(area

, domain

, time

) are manually selected

so that the events of type

at location

are more

frequent during time period

. For instance, wild-

ﬁre events are most likely to happen in California

from May through August, while the same type of

event is more likely from December to Match in

Victoria (Australia). More details with examples

of curated keywords can be found in Appendix A.

Entity Extraction

This step aims to extract en-

tity mentions from the tweet text and provide ad-

ditional information that can be used to help iden-

tify related tweets. We use three different mod-

ules. First, we use a pre-trained neural model

from AllenNLP (Gardner et al.,2018), trained

on CoNLL03 (Tjong Kim Sang and De Meulder,

2003), to extract entities with types of people, lo-

cation, and organization. Although this module

extracts some important entities in the text, it fails

to extract uncommon entities or special mentions

such as the name of a wildﬁre. To address this,

similar to prior research (Zheng and Kordjamshidi,

2020), we further exploit the extractions from Ope-

nIE (Stanovsky et al.,2018) and select the noun

arguments with less than ten characters as entities.

Lastly, we add the tweet’s hashtags to the entity set.

Since location mentions are crucial in extracting

local events and existing models have low perfor-

mance detecting them from noisy tweets text, we

further developed a BERT-based NER model tuned

on Twitter data to detect location mentions.

Location Augmentation

We use Open-

StreetMap API to map location mentions to

physical addresses.

This step provides com-

plementary information about each location

while reducing the noise introduced by the entity

extraction module through removing location

mentions that are wrongly detected or are not

located in the area of interest. This is especially

important since our focus is on local events

happening at speciﬁc locations.

Online Clustering

This step aims to mimic the

real-life scenario where tweets are sequentially fed

into a clustering algorithm (Wang et al.,2015).

We further choose this method since this is a lot

faster than the retro-respective (all data available

at the same time) clustering methods for a large

pool of input data. Here, the clustering objective

is to group tweets related to the same local event,

such as a “ﬁre in building A” or a “wildﬁre in a

speciﬁc area”. The online clustering method uti-

lizes a custom similarity metric that combines the

similarity of the entities, the closeness of locations

in the real world, and the existence of shared hash-

tags. Algorithm 1shows the similarity computation

between two tweets. The smallest_distance com-

putes the minimum physical distance between lo-

cation mentions given their augmented real-world

location (the output of the location augmentation

step). As the distance between higher-level loca-

tion mentions such as state/city/country is always

zero, we simply ignore those location types. The

ﬁnd_matching_entities function follows the ideas

in Faghihi et al. (2020) on creating a unique match-

ing matrix, which we use for extracting the top

matching pairs of entities from tweets. Here, each

entity can only be paired once with the highest

matching-score entity from the other tweet. The

mindist

maxdist

shashtag

, and

sdist

are hyper-

2https://www.openstreetmap.org/

Algorithm 1 Find Similarity of tweet tiand tj

similarity = 0

if t1.hashtags ∩t2.hashtags 6=∅then

similarity =similarity +shashtag

end if

distmin =smallest_distance(t1, t2)

if distmin ≥maxdist then

return 0

else if distmin ≤mindist then

similarity =similarity +sdist

end if

top_pairs =ﬁnd_matching_entities(t1, t2)

norm_factor =

min(len(t1

.entities

), len(t2

.entities

), top_pairs)

sentity =Ptop_pairs.similarity/norm_factor

similarity =similarity +sentity

return similarity

parameters of the clustering algorithm. We have

only used heuristics and a small set of executions

to tune these hyper-parameters.

The pre-processed set of tweets is passed to the

online clustering method, one tweet at a time. For

each new tweet, similarity scores are computed be-

tween the new tweet and all cluster heads. The

new tweet is added to the highest matching-score

cluster where the similarity score is higher than

simthreshold

and the time elapsed between the new

tweet and the last update of the cluster is less than

timethreshold

. If the previous criteria are met for

none of the clusters, a new cluster is created based

on the new tweet. During this process, we re-

move inactive clusters whose last update was at

least

expirationthreshold

minutes ago and have

less than

tweetthreshold

number of tweets available.

A cluster head is always the tweet with the most

entity mentions; In case of a tie, the more recent

tweet becomes the head of the cluster. The hyper-

parameters of this method is noted in Appendix

Cluster Post-Processing

We apply three post-

processing steps to improve the quality of the gen-

erated clusters. First, we manually merge pairs of

clusters with a cluster head similarity higher than

a threshold

headmin

. This step compensates for

some of the errors from missing entities in the pre-

processing step, which affects the intermediate sim-

ilarity scores in the clustering algorithm. Second,

we use a simple fuzzy sequence-matching tech-

nique to remove identical or similar tweets inside

each cluster. Third, we train a BERT-based (De-

vlin et al.,2019) binary classiﬁer to detect infor-

mative content, which can be used to prune out

the noisy tweets that do not include crisis-relevant

information. This classiﬁer is trained on the avail-

able labeled data (Alam et al.,2021a) on tweets’

informativeness. Since most of the available tweets

in Alam et al. (2021a) are speciﬁc to storm and

wildﬁre domains and there are no representative

subsets for our other domains of interest (trafﬁc,

local ﬁre), we only apply this step to the clusters

that are generated for those categories. These post-

processing steps aim not to prune out all the noisy

information but rather to provide a better starting

point for our next steps.

3.2 CrisisLTLSum Human Annotation

Taking the noisy timelines generated from the pre-

vious step, we leverage human annotations to reﬁne

and generate clean timelines and summarize them.

We, authors of this work, manually selected 1,000

clusters that contain enough tweets describing how

a crisis event evolves, while specifying the “seed

tweet” (i.e. the ﬁrst observed post that describes

the ongoing event) of each timeline. The detailed

process is presented in Appendix B. The selected

clusters cover events mainly from four crisis do-

mains, including wildﬁre, local ﬁre, storm, and

trafﬁc. More data statistics are shared in Section 4.

Procedure

We use the Amazon Mechanical Turk

(MTurk) platform to label and reﬁne the noisy clus-

ters to generate a clean timeline and collect the

summaries. We split the annotation into multiple

batches of Human Intelligence Tasks (HITs), where

each batch contains timelines from the same do-

main. Each HIT contains three noisy timelines, and

we collect annotations from 3 different workers on

each. The workers are given the seed tweet and the

subsequent tweets sorted by time, and they were

asked to read the tweet one by one and answer i)

whether the tweet should be part of the timeline,

and ii) what is the reason if not.

A tweet is labeled as part of the timeline only if

it satisﬁes all the following three conditions:

•

relevant: talks about the same event indicated

in the seed tweet

•

informative: provides facts about the event

but not only contains personal points of view

•

not repetitive: brings in new information

Domain Timelines Tweets 3 7

Wildﬁre 423 4,829 1,961 2,868

Trafﬁc 287 2,340 831 1,509

Fire 155 1,469 640 829

Storm 109 1,767 789 978

Other 26 205 82 123

1,000 10,610 4,303 6,307

Table 1: Data statistics across different crisis domains

in terms of the number of timelines and tweets. 3indi-

cates tweets that are part of the timeline and 7indicates

tweets that are not part of the timeline.

about the ongoing event

After reviewing all the tweets, the worker is ﬁnally

asked to write a concise summary to describe how

the event progresses over time. Detailed instruc-

tions and annotation workﬂows are presented as

Figures 8-14 in Appendix E.

Annotation Workﬂow & Quality Control

Fol-

lowing prior quality control practices (Briakou

et al.,2021), we use multiple quality control (QC)

steps to ensure the recruitment of high-quality an-

notators. First, we use location restriction (QC1)

to limit the pool of workers to countries where na-

tive English speakers are most likely to be found.

Next, we recruit annotators who pass our qualiﬁ-

cation test (QC2), where we ask them to annotate

3 timelines. We run several small pilot tasks, each

with a replication factor of nine. We check an-

notators’ performance on timeline extraction task

against experts’ labels and have experts manually

review (QC3) annotators’ summary quality. Only

workers passing all the quality control steps con-

tribute to the ﬁnal task. During the ﬁnal task, we

perform regular quality checks (QC4), and only use

workers who consistently perform well.

Compensation

We compensate the workers at a

rate of $3 per HIT for the task. Each batch of tasks

is followed by a one-time bonus that makes the

ﬁnal rate over $10 per hour.

4 CrisisLTLSum Statistics & Analysis

In this section, we cover comprehensive statistics

and analysis of CrisisLTLSum to further elaborate

on the statistical characteristics of our dataset.

4.1 Dataset Statistics

Out of these 1,000 annotated timelines (10,610

tweets) in CrisisLTLSum, 423 (42%) are about

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CrisisLTLSum:ABenchmarkforLocalCrisisEventTimelineExtractionandSummarizationHosseinRajabyFaghihi1,BasharAlhafni2,KeZhang3,ShihaoRan3,JoelTetreault3,AlejandroJaimes31MichiganStateUniversity,2NewYorkUniversityAbuDhabi,3Dataminr,Inc.rajabyfa@msu.edu,alhafni@nyu.edu,{kzhang,sran,jtetreault,ajaimes}@da...

展开>> 收起<<

CrisisLTLSum A Benchmark for Local Crisis Event Timeline Extraction and Summarization Hossein Rajaby Faghihi1 Bashar Alhafni2 Ke Zhang3 Shihao Ran3.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CrisisLTLSum A Benchmark for Local Crisis Event Timeline Extraction and Summarization Hossein Rajaby Faghihi1 Bashar Alhafni2 Ke Zhang3 Shihao Ran3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: