CrisisLTLSum A Benchmark for Local Crisis Event Timeline Extraction and Summarization Hossein Rajaby Faghihi1 Bashar Alhafni2 Ke Zhang3 Shihao Ran3

2025-05-06 0 0 2.38MB 23 页 10玖币
侵权投诉
CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction
and Summarization
Hossein Rajaby Faghihi1
, Bashar Alhafni2, Ke Zhang3, Shihao Ran3,
Joel Tetreault3,Alejandro Jaimes3
1Michigan State University, 2New York University Abu Dhabi,
3Dataminr, Inc.
rajabyfa@msu.edu,alhafni@nyu.edu,
{kzhang,sran,jtetreault,ajaimes}@dataminr.com
Abstract
Social media has increasingly played a key
role in emergency response: first responders
can use public posts to better react to ongo-
ing crisis events and deploy the necessary re-
sources where they are most needed. Timeline
extraction and abstractive summarization are
critical technical tasks to leverage large num-
bers of social media posts about events. Un-
fortunately, there are few datasets for bench-
marking technical approaches for those tasks.
This paper presents CrisisLTLSum, the largest
dataset of local crisis event timelines avail-
able to date. CrisisLTLSum contains 1,000 cri-
sis event timelines across four domains: wild-
fires, local fires, traffic, and storms. We built
CrisisLTLSum using a semi-automated cluster-
then-refine approach to collect data from the
public Twitter stream. Our initial experiments
indicate a significant gap between the perfor-
mance of strong baselines compared to the hu-
man performance on both tasks. Our dataset,
code, and models are publicly available.1
1 Introduction
We present CrisisLTLSum, the first dataset on ex-
traction and summarization of local crisis event
timelines from Twitter. An example of an anno-
tated timeline in CrisisLTLSum is shown in Figure
1. A timeline is a chronologically sorted set of
posts, where each brings in new information or up-
dates about an ongoing event (such as a fire, storm,
or traffic incident). CrisisLTLSum supports two
complex downstream tasks:
timeline extraction
and
timeline summarization
. As shown in Figure
1, the
timeline extraction
task is formalized as:
given a seed tweet as the initial mention of a crisis
event, extract relevant tweets with updates on the
same event from the incoming noisy tweet stream.
This task is crucial for real-time event tracking.
Work done as Research Interns at Dataminr, Inc.
1https://github.com/CrisisLTLSum/
CrisisTimelines
Timeline
Part of the
Timeline
June 21
00:04
#BREAKING Smoke from vegetation fire near Friant and Rice
Road fills skies of Fresno amid strong winds (Seed Tweet)
June 21
0:18
#BREAKING A large fire burning in northeast Fresno near
Woodward Lake Thursday sent plumes of smoke into the air
above the city.
June 21
0:24
The season is upon us. Strong winds and fire. Stay safe and stay
inside when you smell/see the smoke and dust. Word is that this
is a grass fire at Friant and Rice.
(Repetitive)
June 21
0:40
A large fire is burning along Friant Road. You can watch
@VanessaABC30 giving an update on what we know right now.
(Repetitive)
June 21
1:08
Winds have picked up immensely. Flames have reached Friant Rd.
Windy conditions making it difficult for firefighters to contain the
blaze.
June 21
1:18
UPDATE: @VanessaABC30 talking to CAL FIRE about the fire
burning along Rice Rd. & Friant Rd. Officials say the fire started
as a commercial fire. Officials say there is a shelter in place order,
but no evacuations at this point.
June 21
1:31
Kevin Larrivee captured this video of a grass fire burning right
now on Friant Rd, north of Woodward Park. The wind is crazy
today. As you can see, that wind is pushing the fire quickly.
June 21
1:33
Fire crews are working a vegetation fire near Friant and Rice
roads. Please use caution when driving in the area and follow all
directions from emergency personnel.
Summary 1
Cal Fire officials have issued a shelter in place for residents
northeast of Fresno near Woodward lake due to a vegetation fire.
Strong winds have made it difficult for firefighters as it pushes it
closer to Friant and Rice roads and officials urge motorists to use
caution when driving in the area.
Summary 2
A vegetation fire northeast of Fresno combined with windy
conditions to cause firefighters problems. There was a shelter in
place order given but no evacuation orders were given.
Figure 1: This is a sample annotated timeline from Cri-
sisLTLSum. The noisy timeline is the set of tweets
sorted chronologically. The first tweet is the seed of
the event. 3means that the tweet is annotated to be
part of the timeline and 7indicates that the tweet is ex-
cluded. The reason for the exclusion is written under
the mark.
The
timeline summarization
task aims to gener-
ate abstractive summaries of evolving events by
aggregating important details from temporal and
incremental information.
CrisisLTLSum can facilitate research in two di-
rections: 1) NLP for Social Good (crisis domain),
and 2) natural language inference and generation,
i.e., timeline extraction and summarization tasks.
Here, we discuss the importance and the differ-
ences of CrisisLTLSum compared to previous work
for both of these aspects. Towards the first direc-
tion, the extraction of real-time crisis-relevant in-
formation from microblogs (Zhang and Eick,2019;
Mamo et al.,2021) plays a vital role in providing
time-sensitive information to help first responders
arXiv:2210.14190v1 [cs.CL] 25 Oct 2022
understand ongoing situations and plan relief ef-
forts accordingly (Sarter and Woods,1991). Cri-
sisLTLSum goes beyond the task of categorizing
each single crisis-relevant post independently (Im-
ran et al.,2013;Olteanu et al.,2014;Imran et al.,
2016;Alam et al.,2018;Wiegmann et al.,2020;
Alam et al.,2021a,b) and enables a more challeng-
ing task for extracting new updates of an ongoing
crisis event from incoming posts and summarizing
them with respect to the important event details.
This can help provide time-sensitive updates while
avoiding missing critical information in the bulk
of the posts in microblogs due to the high volume
of redundant and noisy information (Alam et al.,
2021a). To the best of our knowledge, this is the
first annotated dataset for such an extraction task,
while this problem has been tackled before in unsu-
pervised settings (Zhang et al.,2018).
Moreover, we focus on the extraction of
local
cri-
sis events. The term “local” indicates that an event
is bound to an exact location, such as a building,
a street, or a county, and usually lasts for a short
period. Building a corpus of local crisis events is
particularly useful for first responders but also chal-
lenging because the timelines of these events are
often not captured in existing knowledge sources.
This means one has to design mechanisms for au-
tomatically detecting and tracking events directly
from the Twitter stream, which is especially hard
for existing clustering methods (Guille and Favre,
2015;Asgari-Chenaghlu et al.,2021) given the low
number of available tweets for each local event.
For the second point, CrisisLTLSum enables
NLP research on the complex tasks of timeline
extraction and abstractive summarization. These
tasks are particularly challenging in the context of
social media. First, the process of identifying and
extracting relevant updates for a specific event has
to contend with the large volume of noise (Alam
et al.,2021a) and informal tone (Rudra et al.,2018)
compared to other domains such as news. Addition-
ally, summarizing an on-going event helps toward a
quick and better understanding of its progress. This
requires a good level of abstraction with important
details covered and properly presented (e.g., the
temporal order of event evolution). CrisisLTLSum
is the first dataset to provide human-written time-
line summaries to support research in this direction.
CrisisLTLSum is developed through a two-step
semi-automated process to create 1,000 local crisis
timelines from the public Twitter stream. To our
best knowledge, this is the first timeline dataset
focusing on “local” crisis events with the largest
number of unique events. The contributions of this
paper are as follows:
We propose CrisisLTLSum, which is the
largest dataset over local crisis event timelines.
Notably, this is the first benchmark for ab-
stractive timeline summarization in the crisis
domain or on Twitter.
We develop strong baselines for both tasks,
and our experiments show a considerable
gap between these models and human per-
formance, indicating the importance of this
dataset for enabling future research on extract-
ing timelines of crisis event updates and sum-
marizing them.
2 Related Work
Our work in this paper is related to two main direc-
tions of crisis domain datasets for NLP and timeline
summarization.
Crisis Datasets for NLP:
Prior research has in-
vestigated generating datasets from online social
media (e.g., Twitter) on large scale crisis events,
while providing labels for event categories (Wieg-
mann et al.,2020;Imran et al.,2013), humanitarian
types and sub-types (Olteanu et al.,2014;Imran
et al.,2016;Alam et al.,2018;Wiegmann et al.,
2020;Arachie et al.,2020;Alam et al.,2021a,b),
actionable information (McCreadie et al.,2019),
or witness levels (Zahra et al.,2020) of each cri-
sis related post. While existing datasets on crisis
event timelines (Binh Tran et al.,2013;Tran et al.,
2015;Pasquali et al.,2021) are limited to a small
set of large-scale events, CrisisLTLSum covers a
thousand timelines compared to only tens of events
covered by each of the existing datasets. Addition-
ally, we further go beyond the simple tweet catego-
rization by enabling the extraction of information
that include updates over the events’ progress.
Timeline Summarization:
Timeline summa-
rization (TLS) was firstly proposed in Allan et al.
(2001), which extracts a single sentence from the
news stream of an event topic. In general, the
TLS task aims to summarize the target’s evolution
(e.g., a topic or an entity) in a timeline (Martschat
and Markert,2018;Ghalandari and Ifrim,2020).
Existing approaches of TLS are mainly based on
extractive
methods, which are often grouped into
several categories. For instance, Update Summa-
rization (Dang et al.,2008;Li et al.,2009) aims to
update the previous summary given new informa-
tion at a later time, while Timeline Generation (Yan
et al.,2011;Tran et al.,2015;Martschat and Mark-
ert,2018) aims to generate itemized summaries as
the timeline, where each item is extracted by find-
ing important temporal points (e.g., spikes, changes
or clusters) and selecting representative sentences.
Another category, Temporal Summarization, was
first proposed in the TREC shared task (Aslam
et al.,2013) with follow-up work (Kedzie et al.,
2015), which targets extracting sentences from a
large volume of news streams and social media
posts as updates for large events. Temporal Sum-
marization is close to the first task (Timeline Ex-
traction) proposed in CrisisLTLSum.
There have been a few recent works on
abstrac-
tive
timeline summarization across different do-
mains, e.g., biography (Chen et al.,2019), nar-
ratives (Barros et al.,2019), and news headlines
(Steen and Markert,2019), where the human-
written summaries are directly collected from the
web. The abstractive summarization goal is to gen-
erate a set of sentences summarizing the context of
interest without taking the exact words or phrases
from the original text but rather by combining them
and summarizing the important content. To our best
knowledge, CrisisLTLSum is the first to provide
human-written summaries for crisis event timelines
collected from noisy social media stream. Recent
research (Nguyen et al.,2018) has also investigated
the summarization task based on tweets in other
domains which essentially do not reflect the chal-
lenges in the summarization of an evolving event.
3 CrisisLTLSum Collection
This section presents our semi-automated approach
to collect CrisisLTLSum. We first extract clusters
of tweets as noisy timelines and then refine them
via human annotation to get clean timelines that
only include non-redundant, informative, and rele-
vant tweets.
3.1 Noisy Timeline Collection
Figure 2shows the process for generating a set of
noisy timelines starting from the Twitter stream
and followed by pre-processing and knowledge
enhancement steps, the online clustering method,
and post-processing & cleaning steps.
Location, Time, and Keywords Filtering
We
limit the incoming tweets to specific geographical
areas, periods, and domains of interest.
OpenIE
Massfire
building
Twitter Stream
Location & Date
Filtering Keywords Filtering
Filtering Mechanism
Tweet 1
Tweet N
AllenNLP
CAL FIRE
Biden, USA
Category
#wildfire
#Surfside
Entity Extraction & Augmentation
Timeline 1 Tweet 1 Tweet 2 Tweet 3 Tweet 4
Timeline 2 Tweet 1 Tweet 2 Tweet 3
Timeline N Tweet 1 Tweet 2 Tweet 3 Tweet 4
Tweet 5
Online Clustering
Timeline 1 Tweet 1 Tweet 2 Tweet 3
Timeline 2 Tweet 1 Tweet 2
Timeline N-2 Tweet 1 Tweet 2 Tweet 4
Tweet 5
Merge, Noise/Duplicates Removal
+
Location
38.8794°N 86°0530W
30.3760°N 86°3663W
Figure 2: The process of noisy timeline collection. The
output of this step are noisy clusters which are used to
create the dataset.
The location filtering relies on a list of location
candidates created by gathering cities, towns, and
famous neighborhoods in a big area of interest. A
tweet is considered relevant to our area of interest
if 1) the text mentions one of the candidates, 2)
the geo-tag matches the area of interest, or 3) the
user location matches the area of interest. To limit
the tweets to a specified crisis domain, we curate
domain-specific keywords and only select tweets
with phrases matching one of the keywords. This
approach is not comprehensive or exhaustive but
somewhat representative of each crisis domain. Im-
proving this method to be more encompassing is
an area for future research. The combinations of
(area
a
, domain
d
, time
t
) are manually selected
so that the events of type
d
at location
a
are more
frequent during time period
t
. For instance, wild-
fire events are most likely to happen in California
from May through August, while the same type of
event is more likely from December to Match in
Victoria (Australia). More details with examples
of curated keywords can be found in Appendix A.
Entity Extraction
This step aims to extract en-
tity mentions from the tweet text and provide ad-
ditional information that can be used to help iden-
tify related tweets. We use three different mod-
ules. First, we use a pre-trained neural model
from AllenNLP (Gardner et al.,2018), trained
on CoNLL03 (Tjong Kim Sang and De Meulder,
2003), to extract entities with types of people, lo-
cation, and organization. Although this module
extracts some important entities in the text, it fails
to extract uncommon entities or special mentions
such as the name of a wildfire. To address this,
similar to prior research (Zheng and Kordjamshidi,
2020), we further exploit the extractions from Ope-
nIE (Stanovsky et al.,2018) and select the noun
arguments with less than ten characters as entities.
Lastly, we add the tweet’s hashtags to the entity set.
Since location mentions are crucial in extracting
local events and existing models have low perfor-
mance detecting them from noisy tweets text, we
further developed a BERT-based NER model tuned
on Twitter data to detect location mentions.
Location Augmentation
We use Open-
StreetMap API to map location mentions to
physical addresses.
2
This step provides com-
plementary information about each location
while reducing the noise introduced by the entity
extraction module through removing location
mentions that are wrongly detected or are not
located in the area of interest. This is especially
important since our focus is on local events
happening at specific locations.
Online Clustering
This step aims to mimic the
real-life scenario where tweets are sequentially fed
into a clustering algorithm (Wang et al.,2015).
We further choose this method since this is a lot
faster than the retro-respective (all data available
at the same time) clustering methods for a large
pool of input data. Here, the clustering objective
is to group tweets related to the same local event,
such as a “fire in building A” or a “wildfire in a
specific area”. The online clustering method uti-
lizes a custom similarity metric that combines the
similarity of the entities, the closeness of locations
in the real world, and the existence of shared hash-
tags. Algorithm 1shows the similarity computation
between two tweets. The smallest_distance com-
putes the minimum physical distance between lo-
cation mentions given their augmented real-world
location (the output of the location augmentation
step). As the distance between higher-level loca-
tion mentions such as state/city/country is always
zero, we simply ignore those location types. The
find_matching_entities function follows the ideas
in Faghihi et al. (2020) on creating a unique match-
ing matrix, which we use for extracting the top
matching pairs of entities from tweets. Here, each
entity can only be paired once with the highest
matching-score entity from the other tweet. The
mindist
,
maxdist
,
shashtag
, and
sdist
are hyper-
2https://www.openstreetmap.org/
Algorithm 1 Find Similarity of tweet tiand tj
similarity = 0
if t1.hashtags t2.hashtags 6=then
similarity =similarity +shashtag
end if
distmin =smallest_distance(t1, t2)
if distmin maxdist then
return 0
else if distmin mindist then
similarity =similarity +sdist
end if
top_pairs =find_matching_entities(t1, t2)
norm_factor =
min(len(t1
.entities
), len(t2
.entities
), top_pairs)
sentity =Ptop_pairs.similarity/norm_factor
similarity =similarity +sentity
return similarity
parameters of the clustering algorithm. We have
only used heuristics and a small set of executions
to tune these hyper-parameters.
The pre-processed set of tweets is passed to the
online clustering method, one tweet at a time. For
each new tweet, similarity scores are computed be-
tween the new tweet and all cluster heads. The
new tweet is added to the highest matching-score
cluster where the similarity score is higher than
simthreshold
and the time elapsed between the new
tweet and the last update of the cluster is less than
timethreshold
. If the previous criteria are met for
none of the clusters, a new cluster is created based
on the new tweet. During this process, we re-
move inactive clusters whose last update was at
least
expirationthreshold
minutes ago and have
less than
tweetthreshold
number of tweets available.
A cluster head is always the tweet with the most
entity mentions; In case of a tie, the more recent
tweet becomes the head of the cluster. The hyper-
parameters of this method is noted in Appendix
A.
Cluster Post-Processing
We apply three post-
processing steps to improve the quality of the gen-
erated clusters. First, we manually merge pairs of
clusters with a cluster head similarity higher than
a threshold
headmin
. This step compensates for
some of the errors from missing entities in the pre-
processing step, which affects the intermediate sim-
ilarity scores in the clustering algorithm. Second,
we use a simple fuzzy sequence-matching tech-
nique to remove identical or similar tweets inside
each cluster. Third, we train a BERT-based (De-
vlin et al.,2019) binary classifier to detect infor-
mative content, which can be used to prune out
the noisy tweets that do not include crisis-relevant
information. This classifier is trained on the avail-
able labeled data (Alam et al.,2021a) on tweets’
informativeness. Since most of the available tweets
in Alam et al. (2021a) are specific to storm and
wildfire domains and there are no representative
subsets for our other domains of interest (traffic,
local fire), we only apply this step to the clusters
that are generated for those categories. These post-
processing steps aim not to prune out all the noisy
information but rather to provide a better starting
point for our next steps.
3.2 CrisisLTLSum Human Annotation
Taking the noisy timelines generated from the pre-
vious step, we leverage human annotations to refine
and generate clean timelines and summarize them.
We, authors of this work, manually selected 1,000
clusters that contain enough tweets describing how
a crisis event evolves, while specifying the “seed
tweet” (i.e. the first observed post that describes
the ongoing event) of each timeline. The detailed
process is presented in Appendix B. The selected
clusters cover events mainly from four crisis do-
mains, including wildfire, local fire, storm, and
traffic. More data statistics are shared in Section 4.
Procedure
We use the Amazon Mechanical Turk
(MTurk) platform to label and refine the noisy clus-
ters to generate a clean timeline and collect the
summaries. We split the annotation into multiple
batches of Human Intelligence Tasks (HITs), where
each batch contains timelines from the same do-
main. Each HIT contains three noisy timelines, and
we collect annotations from 3 different workers on
each. The workers are given the seed tweet and the
subsequent tweets sorted by time, and they were
asked to read the tweet one by one and answer i)
whether the tweet should be part of the timeline,
and ii) what is the reason if not.
A tweet is labeled as part of the timeline only if
it satisfies all the following three conditions:
relevant: talks about the same event indicated
in the seed tweet
informative: provides facts about the event
but not only contains personal points of view
not repetitive: brings in new information
Domain Timelines Tweets 3 7
Wildfire 423 4,829 1,961 2,868
Traffic 287 2,340 831 1,509
Fire 155 1,469 640 829
Storm 109 1,767 789 978
Other 26 205 82 123
1,000 10,610 4,303 6,307
Table 1: Data statistics across different crisis domains
in terms of the number of timelines and tweets. 3indi-
cates tweets that are part of the timeline and 7indicates
tweets that are not part of the timeline.
about the ongoing event
After reviewing all the tweets, the worker is finally
asked to write a concise summary to describe how
the event progresses over time. Detailed instruc-
tions and annotation workflows are presented as
Figures 8-14 in Appendix E.
Annotation Workflow & Quality Control
Fol-
lowing prior quality control practices (Briakou
et al.,2021), we use multiple quality control (QC)
steps to ensure the recruitment of high-quality an-
notators. First, we use location restriction (QC1)
to limit the pool of workers to countries where na-
tive English speakers are most likely to be found.
Next, we recruit annotators who pass our qualifi-
cation test (QC2), where we ask them to annotate
3 timelines. We run several small pilot tasks, each
with a replication factor of nine. We check an-
notators’ performance on timeline extraction task
against experts’ labels and have experts manually
review (QC3) annotators’ summary quality. Only
workers passing all the quality control steps con-
tribute to the final task. During the final task, we
perform regular quality checks (QC4), and only use
workers who consistently perform well.
Compensation
We compensate the workers at a
rate of $3 per HIT for the task. Each batch of tasks
is followed by a one-time bonus that makes the
final rate over $10 per hour.
4 CrisisLTLSum Statistics & Analysis
In this section, we cover comprehensive statistics
and analysis of CrisisLTLSum to further elaborate
on the statistical characteristics of our dataset.
4.1 Dataset Statistics
Out of these 1,000 annotated timelines (10,610
tweets) in CrisisLTLSum, 423 (42%) are about
摘要:

CrisisLTLSum:ABenchmarkforLocalCrisisEventTimelineExtractionandSummarizationHosseinRajabyFaghihi1,BasharAlhafni2,KeZhang3,ShihaoRan3,JoelTetreault3,AlejandroJaimes31MichiganStateUniversity,2NewYorkUniversityAbuDhabi,3Dataminr,Inc.rajabyfa@msu.edu,alhafni@nyu.edu,{kzhang,sran,jtetreault,ajaimes}@da...

展开>> 收起<<
CrisisLTLSum A Benchmark for Local Crisis Event Timeline Extraction and Summarization Hossein Rajaby Faghihi1 Bashar Alhafni2 Ke Zhang3 Shihao Ran3.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:2.38MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注