ers is the way the summaries are written. Instead
of containing well-formed sentences, the articles
contain telegram-style bullet-points precisely cap-
turing the important metrics discussed in the earn-
ings calls. A sample reference summary from our
dataset corresponding to the 2nd quarter 2022 earn-
ings call of Apple is shown in Table 1. There are
several other factors that make ECTSum a challeng-
ing dataset. First, the document-to-summary
com-
pression ratio of 103.67
is the
highest
among ex-
isting long document summarization datasets with
comparable document lengths (Table 2). Hence, in
order to do well, trained models need to be highly
precise in capturing the most relevant facts dis-
cussed in the ECTs in as few words as possible.
Second, existing long document summariza-
tion datasets such as Arxiv/PubMed (Cohan et al.,
2018), BigPatent (Sharma et al.,2019), FNS (El-
Haj et al.,2020), and GovReport (Huang et al.,
2021), have fixed document layouts. ECTs, on the
other hand, are free-form documents with salient
information spread throughout the text (please refer
Section 3.3). Hence, models can no longer take ad-
vantage of learning any stylistic signals (Kry´
sci´
nski
et al.,2021). Third, the average length of ECTs is
around 2.9K words (before tokenization). On the
other hand, neural models employing BERT (De-
vlin et al.,2019), T5 (Raffel et al.,2020), or BART
(Lewis et al.,2020) as document encoders cannot
process documents longer than 512/1024 tokens.
Hence, despite achieving state-of-the-art perfor-
mances on short-document summarization datasets
such as CNN/DM (Nallapati et al.,2016), News-
room (Grusky et al.,2018), and XSum (Narayan
et al.,2018), etc., such models cannot be readily
applied to effectively summarize ECTs.
We benchmark the performance of several rep-
resentative supervised and unsupervised summa-
rizers on our newly proposed dataset (Section 5.1).
Among supervised methods, we select state-of-the-
art extractive, abstractive, and long document sum-
marization approaches. Given the pattern of source
transcripts and target summaries, we then present
ECT-BPS
, a simple yet effective pipeline approach
for the task of ECT summarization (Section 4). It
consists of an
extractive summarization
module
followed by a
paraphrasing
module. While, the
former is trained to identify salient sentences from
the source ECT, the latter is trained to paraphrase
ECT sentences to short abstractive telegram-style
bullet-points that precisely capture the numerical
values and facts discussed in the calls.
In order to demonstrate the challenges of the
proposed ECTSum dataset, competing methods are
evaluated on several metrics that assess the con-
tent quality and factual consistency of the model-
generated summaries. These metrics are discussed
in Section 5.2. We discuss the comparative results
of all considered methods against automatic eval-
uation metrics in Section 5.4. Given the complex
nuances of financial reporting, we further conduct
a human evaluation experiment (survey results re-
ported in Section 5.5) where we hire a team of
financial experts to manually assess and compare
the summaries generated by ECT-BPS, and those
of our strongest baseline. Overall, both automatic
and manual evaluation results show
ECT-BPS
to
outperform strong state-of-the-art baselines, which
demonstrates the advantage of a simple approach.
Our contributions can be summarized as follows:
•
We present
ECTSum
, the first long document
summarization dataset in the finance domain that
requires models to process long unstructured
earning call transcripts and summarize them in
a few words while capturing crucial metrics and
maintaining factual consistency.
•
We propose
ECT-BPS
, a simple approach to ef-
fectively summarize ECTs while ensuring fac-
tual correctness of the generated content. We
establish its better efficacy against strong summa-
rization baselines across all considered metrics
evaluating the content quality and factual correct-
ness of model-generated summaries.
•
Our dataset and codes are publicly available at
https://github.com/rajdeep345/ECTSum
2 Related Works
Automatic text summarization, extractive (Nalla-
pati et al.,2017;Zhong et al.,2020), abstractive
(Zhang et al.,2019;Lewis et al.,2020), as well
as long document summarization (Zaheer et al.,
2020;Beltagy et al.,2020) have seen tremendous
progress over the years (Huang et al.,2020). Sev-
eral works also exist on controllable summariza-
tion (Mukherjee et al.,2020;Amplayo et al.,2021)
and, in specific domains, such as disaster (Mukher-
jee et al.,2022), and legal (Shukla et al.,2022).
However, the field of financial data summarization
remains largely unexplored, primarily due to the un-
availability of suitable datasets. Passali et al. (2021)
have recently compiled a financial news summa-
rization dataset consisting of around 2K Bloomberg