understand ongoing situations and plan relief ef-
forts accordingly (Sarter and Woods,1991). Cri-
sisLTLSum goes beyond the task of categorizing
each single crisis-relevant post independently (Im-
ran et al.,2013;Olteanu et al.,2014;Imran et al.,
2016;Alam et al.,2018;Wiegmann et al.,2020;
Alam et al.,2021a,b) and enables a more challeng-
ing task for extracting new updates of an ongoing
crisis event from incoming posts and summarizing
them with respect to the important event details.
This can help provide time-sensitive updates while
avoiding missing critical information in the bulk
of the posts in microblogs due to the high volume
of redundant and noisy information (Alam et al.,
2021a). To the best of our knowledge, this is the
first annotated dataset for such an extraction task,
while this problem has been tackled before in unsu-
pervised settings (Zhang et al.,2018).
Moreover, we focus on the extraction of
local
cri-
sis events. The term “local” indicates that an event
is bound to an exact location, such as a building,
a street, or a county, and usually lasts for a short
period. Building a corpus of local crisis events is
particularly useful for first responders but also chal-
lenging because the timelines of these events are
often not captured in existing knowledge sources.
This means one has to design mechanisms for au-
tomatically detecting and tracking events directly
from the Twitter stream, which is especially hard
for existing clustering methods (Guille and Favre,
2015;Asgari-Chenaghlu et al.,2021) given the low
number of available tweets for each local event.
For the second point, CrisisLTLSum enables
NLP research on the complex tasks of timeline
extraction and abstractive summarization. These
tasks are particularly challenging in the context of
social media. First, the process of identifying and
extracting relevant updates for a specific event has
to contend with the large volume of noise (Alam
et al.,2021a) and informal tone (Rudra et al.,2018)
compared to other domains such as news. Addition-
ally, summarizing an on-going event helps toward a
quick and better understanding of its progress. This
requires a good level of abstraction with important
details covered and properly presented (e.g., the
temporal order of event evolution). CrisisLTLSum
is the first dataset to provide human-written time-
line summaries to support research in this direction.
CrisisLTLSum is developed through a two-step
semi-automated process to create 1,000 local crisis
timelines from the public Twitter stream. To our
best knowledge, this is the first timeline dataset
focusing on “local” crisis events with the largest
number of unique events. The contributions of this
paper are as follows:
•
We propose CrisisLTLSum, which is the
largest dataset over local crisis event timelines.
Notably, this is the first benchmark for ab-
stractive timeline summarization in the crisis
domain or on Twitter.
•
We develop strong baselines for both tasks,
and our experiments show a considerable
gap between these models and human per-
formance, indicating the importance of this
dataset for enabling future research on extract-
ing timelines of crisis event updates and sum-
marizing them.
2 Related Work
Our work in this paper is related to two main direc-
tions of crisis domain datasets for NLP and timeline
summarization.
Crisis Datasets for NLP:
Prior research has in-
vestigated generating datasets from online social
media (e.g., Twitter) on large scale crisis events,
while providing labels for event categories (Wieg-
mann et al.,2020;Imran et al.,2013), humanitarian
types and sub-types (Olteanu et al.,2014;Imran
et al.,2016;Alam et al.,2018;Wiegmann et al.,
2020;Arachie et al.,2020;Alam et al.,2021a,b),
actionable information (McCreadie et al.,2019),
or witness levels (Zahra et al.,2020) of each cri-
sis related post. While existing datasets on crisis
event timelines (Binh Tran et al.,2013;Tran et al.,
2015;Pasquali et al.,2021) are limited to a small
set of large-scale events, CrisisLTLSum covers a
thousand timelines compared to only tens of events
covered by each of the existing datasets. Addition-
ally, we further go beyond the simple tweet catego-
rization by enabling the extraction of information
that include updates over the events’ progress.
Timeline Summarization:
Timeline summa-
rization (TLS) was firstly proposed in Allan et al.
(2001), which extracts a single sentence from the
news stream of an event topic. In general, the
TLS task aims to summarize the target’s evolution
(e.g., a topic or an entity) in a timeline (Martschat
and Markert,2018;Ghalandari and Ifrim,2020).
Existing approaches of TLS are mainly based on
extractive
methods, which are often grouped into
several categories. For instance, Update Summa-
rization (Dang et al.,2008;Li et al.,2009) aims to