ECTSum A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts Rajdeep Mukherjee1Abhinav Bohra1Akash Banerjee1Soumya Sharma1

2025-05-03 0 0 1.1MB 14 页 10玖币
侵权投诉
ECTSum: A New Benchmark Dataset For Bullet Point Summarization of
Long Earnings Call Transcripts
Rajdeep Mukherjee1Abhinav Bohra1Akash Banerjee1Soumya Sharma1
Manjunath Hegde2Afreen Shaikh2Shivani Shrivastava2Koustuv Dasgupta2
Niloy Ganguly1,3Saptarshi Ghosh1Pawan Goyal1
1Department of Computer Science and Engineering, IIT Kharagpur, India
2Goldman Sachs Data Science and Machine Learning Group, India
3Leibniz University of Hannover, Germany
Abstract
Despite tremendous progress in automatic
summarization, state-of-the-art methods are
predominantly trained to excel in summariz-
ing short newswire articles, or documents with
strong layout biases such as scientific arti-
cles or govt reports. Efficient techniques
to summarize financial documents, including
facts and figures, have largely been unex-
plored, majorly due to the unavailability of
suitable datasets. Here, we present ECTSum,
a new dataset with transcripts of earnings calls
(ECTs), hosted by public companies, as doc-
uments, and short experts-written telegram-
style bullet point summaries derived from cor-
responding Reuters articles. ECTs are long un-
structured documents without any prescribed
length limit or format. We benchmark ECT-
Sum with state-of-the-art summarizers across
various metrics evaluating the content qual-
ity and factual consistency of the generated
summaries. Finally, we present a simple-yet-
effective approach, ECT-BPS, to generate a
set of bullet points that precisely capture the
important facts discussed in the calls.
1 Introduction
Earnings Calls, typically a teleconference or a we-
bcast, are hosted by publicly traded companies to
discuss important aspects of their quarterly (10-
Q), or annual (10-K) earnings reports, along with
current trends and future goals that help financial
analysts and investors to review their price targets
and trade decisions (Givoly and Lakonishok,1980;
Richard Frankel and Skinner,1999;Bowen et al.,
2002;Keith and Stent,2019). The corresponding
call transcripts (called
Earnings Call Transcripts
,
abbreviated as
ECT
s) are typically in the form of
long unstructured documents consisting of thou-
sands of words. Hence, it requires a great deal of
time and effort, even on the part of trained analysts,
Corresponding author: rajdeep1989@iitkgp.ac.in.
Mentors’ names are presented in alphabetical order.
QUARTERLY EARNINGS PER SHARE $1.52.
QUARTERLY TOTAL NET SALES $97.28 BILLION
VERSUS $89.58 BILLION REPORTED LAST YEAR.
BOARD OF DIRECTORS AUTHORIZED AN IN-
CREASE OF $90 BILLION TO THE EXISTING
SHARE REPURCHASE PROGRAM.
QUARTERLY IPHONE REVENUE $50.57 BILLION
VERSUS $47.94 BILLION REPORTED LAST YEAR.
Table 1: ECTSum: Excerpt from the Reuters article1
corresponding to the ECT2for Apple Q2 2022.
to quickly summarize the key facts covered in these
transcripts. Given the importance of these calls,
they are often summarized by media houses such
as Reuters and BusinessWire. The scale of such
effort, however, calls for the development of effi-
cient methods to automate this task which in turn
necessitates the creation of a benchmark dataset.
Towards this goal, we present
ECTSum
, a new
benchmark dataset for bullet-point summarization
of long ECTs. As discussed in Section 3.2, first we
crawled around 7.4K ECTs from The Motley Fool
3
,
posted between January 2019 and April 2022, cor-
responding to the Russell 3000 Index companies
4
.
Reuters
was chosen to be the source of our target
summaries, per consultation with domain experts,
since the expert-written articles posted on Reuters
effectively capture the key takeaways from earn-
ings calls. However, searching for Reuters articles
corresponding to the collected ECTs was especially
challenging, since the task was non-trivial. Given
the fact that not all calls are tracked, after care-
fully performing data cleaning and addressing pair-
ing issues, we arrive at a total of
2,425 document-
summary pairs as part of the dataset.
What makes ECTSum truly different from oth-
-1https://tinyurl.com/yc3z9sbj
0https://tinyurl.com/uyby3vh4
1https://tinyurl.com/yc3z9sbj
2https://tinyurl.com/uyby3vh4
3https://www.fool.com/earnings-call-transcripts/
4https://www.investopedia.com/terms/r/russell_3000.asp
arXiv:2210.12467v2 [cs.CL] 26 Oct 2022
ers is the way the summaries are written. Instead
of containing well-formed sentences, the articles
contain telegram-style bullet-points precisely cap-
turing the important metrics discussed in the earn-
ings calls. A sample reference summary from our
dataset corresponding to the 2nd quarter 2022 earn-
ings call of Apple is shown in Table 1. There are
several other factors that make ECTSum a challeng-
ing dataset. First, the document-to-summary
com-
pression ratio of 103.67
is the
highest
among ex-
isting long document summarization datasets with
comparable document lengths (Table 2). Hence, in
order to do well, trained models need to be highly
precise in capturing the most relevant facts dis-
cussed in the ECTs in as few words as possible.
Second, existing long document summariza-
tion datasets such as Arxiv/PubMed (Cohan et al.,
2018), BigPatent (Sharma et al.,2019), FNS (El-
Haj et al.,2020), and GovReport (Huang et al.,
2021), have fixed document layouts. ECTs, on the
other hand, are free-form documents with salient
information spread throughout the text (please refer
Section 3.3). Hence, models can no longer take ad-
vantage of learning any stylistic signals (Kry´
sci´
nski
et al.,2021). Third, the average length of ECTs is
around 2.9K words (before tokenization). On the
other hand, neural models employing BERT (De-
vlin et al.,2019), T5 (Raffel et al.,2020), or BART
(Lewis et al.,2020) as document encoders cannot
process documents longer than 512/1024 tokens.
Hence, despite achieving state-of-the-art perfor-
mances on short-document summarization datasets
such as CNN/DM (Nallapati et al.,2016), News-
room (Grusky et al.,2018), and XSum (Narayan
et al.,2018), etc., such models cannot be readily
applied to effectively summarize ECTs.
We benchmark the performance of several rep-
resentative supervised and unsupervised summa-
rizers on our newly proposed dataset (Section 5.1).
Among supervised methods, we select state-of-the-
art extractive, abstractive, and long document sum-
marization approaches. Given the pattern of source
transcripts and target summaries, we then present
ECT-BPS
, a simple yet effective pipeline approach
for the task of ECT summarization (Section 4). It
consists of an
extractive summarization
module
followed by a
paraphrasing
module. While, the
former is trained to identify salient sentences from
the source ECT, the latter is trained to paraphrase
ECT sentences to short abstractive telegram-style
bullet-points that precisely capture the numerical
values and facts discussed in the calls.
In order to demonstrate the challenges of the
proposed ECTSum dataset, competing methods are
evaluated on several metrics that assess the con-
tent quality and factual consistency of the model-
generated summaries. These metrics are discussed
in Section 5.2. We discuss the comparative results
of all considered methods against automatic eval-
uation metrics in Section 5.4. Given the complex
nuances of financial reporting, we further conduct
a human evaluation experiment (survey results re-
ported in Section 5.5) where we hire a team of
financial experts to manually assess and compare
the summaries generated by ECT-BPS, and those
of our strongest baseline. Overall, both automatic
and manual evaluation results show
ECT-BPS
to
outperform strong state-of-the-art baselines, which
demonstrates the advantage of a simple approach.
Our contributions can be summarized as follows:
We present
ECTSum
, the first long document
summarization dataset in the finance domain that
requires models to process long unstructured
earning call transcripts and summarize them in
a few words while capturing crucial metrics and
maintaining factual consistency.
We propose
ECT-BPS
, a simple approach to ef-
fectively summarize ECTs while ensuring fac-
tual correctness of the generated content. We
establish its better efficacy against strong summa-
rization baselines across all considered metrics
evaluating the content quality and factual correct-
ness of model-generated summaries.
Our dataset and codes are publicly available at
https://github.com/rajdeep345/ECTSum
2 Related Works
Automatic text summarization, extractive (Nalla-
pati et al.,2017;Zhong et al.,2020), abstractive
(Zhang et al.,2019;Lewis et al.,2020), as well
as long document summarization (Zaheer et al.,
2020;Beltagy et al.,2020) have seen tremendous
progress over the years (Huang et al.,2020). Sev-
eral works also exist on controllable summariza-
tion (Mukherjee et al.,2020;Amplayo et al.,2021)
and, in specific domains, such as disaster (Mukher-
jee et al.,2022), and legal (Shukla et al.,2022).
However, the field of financial data summarization
remains largely unexplored, primarily due to the un-
availability of suitable datasets. Passali et al. (2021)
have recently compiled a financial news summa-
rization dataset consisting of around 2K Bloomberg
articles with corresponding human-written sum-
maries. However, similar to other popular newswire
datasets such as CNN/DM (Nallapati et al.,2016),
Newsroom (Grusky et al.,2018), XSum (Narayan
et al.,2018), the documents (news articles) them-
selves are only a few hundred words long, hence
limiting the practical importance of model gener-
ated summaries (Kry´
sci´
nski et al.,2021).
To the best of our knowledge, FNS (El-Haj et al.,
2020) is the only available financial summarization
dataset, released as part of the Financial Narrative
Summarization Shared Task 2020
5
. In FNS, annual
reports of UK firms constitute the documents, and
a subset of narrative sections from the reports are
given verbatim as reference summaries. However,
ECTSum differs from FNS on several accounts.
First, our target summaries consist of a small set
of telegram-style bullet-points, whereas the ones
in FNS are large extractive portions from respec-
tive source documents. Second, ECTSum has a
very high document-to-summary compression ra-
tio (refer Section 3.3), because of which the models
are expected to generate extremely concise sum-
maries of around 50 words from lengthy unstruc-
tured ECTs, around 2.9K words long. In contrast,
the expected length of model-generated summaries
on FNS is around 1000 words. Finally, the models
developed on FNS are specifically trained to iden-
tify and summarize the narrative sections, while
completely ignoring others containing facts, and
figures that reflect the firm’s annual financial per-
formance. Excluding these key performance indi-
cators from summaries limits their practical utility
to stakeholders. Models trained on ECTSum, on
the other hand, are specifically expected to capture
salient financial metrics such as sales, revenues,
current trends, etc. in as few words as possible.
Previously, Cardinaels et al. (2018) had at-
tempted to summarize earnings calls using stan-
dard unsupervised approaches. We are however
the first to propose and exhaustively benchmark a
large scale financial long document summarization
dataset involving earnings call transcripts.
3 Dataset
This section describes our dataset,
ECTSum
, in-
cluding the data sources, and the steps taken to
sanitize the data, in order to obtain the document-
summary pairs. Finally, we conduct an in-depth
analysis of the dataset and report its statistics.
5http://wp.lancs.ac.uk/cfie/fns2020/
3.1 Data Collection
ECTs of listed companies are publicly hosted on
The Motley Fool
6
. We crawled the web pages corre-
sponding to all available ECTs for the Russell 3000
Index companies
7
posted between January 2019
and April 2022. In the process, we obtain a total of
7,389 ECTs. The HTML web pages were parsed
using the BeautifulSoup
8
library. ECTs typically
consist of two broad sections: Prepared Remarks,
where the company’s financial results, for the given
reporting period, are presented; and Question and
Answers, where call participants ask questions re-
garding the presented results. We only consider the
unstructured text corresponding to the Prepared
Remarks section to form the source documents.
Collecting expert-written summaries correspond-
ing to these ECTs was a far more challenging task.
Reuters
9
hosts a huge repository of financial news
articles from around the world. Among these, are
articles, written by analysts, that summarize earn-
ings calls events in the form of a few bulleted
points (see Table 1). After manually going through
several such articles, and after consulting experts
from Goldman Sachs, India, we understood that
these articles precisely capture the key takeaways
10
from earnings calls. Accordingly, using the com-
pany codes and dates of the earnings call events
corresponding to the collected ECTs, we crawled
Reuters web pages to search for relevant articles.
We obtained 3,013 Reuters articles in the process.
3.2 Data Cleaning and Pairing
Cleaning the ECTs:
Almost all earnings calls
(and hence the corresponding transcripts) begin
with an introduction by the call moderator/operator.
We remove these statements since they do not relate
to the financial results discussed thereafter. Some
calls directly start with the Questions and Answers,
in which case we exclude them from the collection.
Cleaning the summaries:
For the Reuters (sum-
mary) articles, first we performed simple pre-
processing to split the text into sentences. In many
articles, we observed sentences ending with the
phrase REFINITIV IBES DATA. Such sentences re-
port estimates made by Refinitiv
11
analysts on the
6https://www.fool.com/earnings-call-transcripts/
7https://www.investopedia.com/terms/r/russell_3000.asp
8https://crummy.com/software/BeautifulSoup/
9https://www.reuters.com/business/
10https://tinyurl.com/27ehcxzf
11https://tinyurl.com/2p9e6kh2
摘要:

ECTSum:ANewBenchmarkDatasetForBulletPointSummarizationofLongEarningsCallTranscriptsRajdeepMukherjee1AbhinavBohra1AkashBanerjee1SoumyaSharma1ManjunathHegde2AfreenShaikh2ShivaniShrivastava2KoustuvDasgupta2yNiloyGanguly1,3ySaptarshiGhosh1yPawanGoyal1y1DepartmentofComputerScienceandEngineering,IITKhara...

展开>> 收起<<
ECTSum A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts Rajdeep Mukherjee1Abhinav Bohra1Akash Banerjee1Soumya Sharma1.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.1MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注