ECTSum A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts Rajdeep Mukherjee1Abhinav Bohra1Akash Banerjee1Soumya Sharma1

2025-05-03 0 0 1.1MB 14 页 10玖币

侵权投诉

ECTSum: A New Benchmark Dataset For Bullet Point Summarization of

Long Earnings Call Transcripts

Rajdeep Mukherjee1∗Abhinav Bohra1Akash Banerjee1Soumya Sharma1

Manjunath Hegde2Afreen Shaikh2Shivani Shrivastava2Koustuv Dasgupta2†

Niloy Ganguly1,3†Saptarshi Ghosh1†Pawan Goyal1†

1Department of Computer Science and Engineering, IIT Kharagpur, India

2Goldman Sachs Data Science and Machine Learning Group, India

3Leibniz University of Hannover, Germany

Abstract

Despite tremendous progress in automatic

summarization, state-of-the-art methods are

predominantly trained to excel in summariz-

ing short newswire articles, or documents with

strong layout biases such as scientiﬁc arti-

cles or govt reports. Efﬁcient techniques

to summarize ﬁnancial documents, including

facts and ﬁgures, have largely been unex-

plored, majorly due to the unavailability of

suitable datasets. Here, we present ECTSum,

a new dataset with transcripts of earnings calls

(ECTs), hosted by public companies, as doc-

uments, and short experts-written telegram-

style bullet point summaries derived from cor-

responding Reuters articles. ECTs are long un-

structured documents without any prescribed

length limit or format. We benchmark ECT-

Sum with state-of-the-art summarizers across

various metrics evaluating the content qual-

ity and factual consistency of the generated

summaries. Finally, we present a simple-yet-

effective approach, ECT-BPS, to generate a

set of bullet points that precisely capture the

important facts discussed in the calls.

1 Introduction

Earnings Calls, typically a teleconference or a we-

bcast, are hosted by publicly traded companies to

discuss important aspects of their quarterly (10-

Q), or annual (10-K) earnings reports, along with

current trends and future goals that help ﬁnancial

analysts and investors to review their price targets

and trade decisions (Givoly and Lakonishok,1980;

Richard Frankel and Skinner,1999;Bowen et al.,

2002;Keith and Stent,2019). The corresponding

call transcripts (called

Earnings Call Transcripts

abbreviated as

ECT

s) are typically in the form of

long unstructured documents consisting of thou-

sands of words. Hence, it requires a great deal of

time and effort, even on the part of trained analysts,

∗Corresponding author: rajdeep1989@iitkgp.ac.in.

†Mentors’ names are presented in alphabetical order.

• QUARTERLY EARNINGS PER SHARE $1.52.

•

QUARTERLY TOTAL NET SALES $97.28 BILLION

VERSUS $89.58 BILLION REPORTED LAST YEAR.

•

BOARD OF DIRECTORS AUTHORIZED AN IN-

CREASE OF $90 BILLION TO THE EXISTING

SHARE REPURCHASE PROGRAM.

•

QUARTERLY IPHONE REVENUE $50.57 BILLION

VERSUS $47.94 BILLION REPORTED LAST YEAR.

Table 1: ECTSum: Excerpt from the Reuters article1

corresponding to the ECT2for Apple Q2 2022.

to quickly summarize the key facts covered in these

transcripts. Given the importance of these calls,

they are often summarized by media houses such

as Reuters and BusinessWire. The scale of such

effort, however, calls for the development of efﬁ-

cient methods to automate this task which in turn

necessitates the creation of a benchmark dataset.

Towards this goal, we present

ECTSum

, a new

benchmark dataset for bullet-point summarization

of long ECTs. As discussed in Section 3.2, ﬁrst we

crawled around 7.4K ECTs from The Motley Fool

posted between January 2019 and April 2022, cor-

responding to the Russell 3000 Index companies

Reuters

was chosen to be the source of our target

summaries, per consultation with domain experts,

since the expert-written articles posted on Reuters

effectively capture the key takeaways from earn-

ings calls. However, searching for Reuters articles

corresponding to the collected ECTs was especially

challenging, since the task was non-trivial. Given

the fact that not all calls are tracked, after care-

fully performing data cleaning and addressing pair-

ing issues, we arrive at a total of

2,425 document-

summary pairs as part of the dataset.

What makes ECTSum truly different from oth-

-1https://tinyurl.com/yc3z9sbj

0https://tinyurl.com/uyby3vh4

1https://tinyurl.com/yc3z9sbj

2https://tinyurl.com/uyby3vh4

3https://www.fool.com/earnings-call-transcripts/

4https://www.investopedia.com/terms/r/russell_3000.asp

arXiv:2210.12467v2 [cs.CL] 26 Oct 2022

ers is the way the summaries are written. Instead

of containing well-formed sentences, the articles

contain telegram-style bullet-points precisely cap-

turing the important metrics discussed in the earn-

ings calls. A sample reference summary from our

dataset corresponding to the 2nd quarter 2022 earn-

ings call of Apple is shown in Table 1. There are

several other factors that make ECTSum a challeng-

ing dataset. First, the document-to-summary

com-

pression ratio of 103.67

is the

highest

among ex-

isting long document summarization datasets with

comparable document lengths (Table 2). Hence, in

order to do well, trained models need to be highly

precise in capturing the most relevant facts dis-

cussed in the ECTs in as few words as possible.

Second, existing long document summariza-

tion datasets such as Arxiv/PubMed (Cohan et al.,

2018), BigPatent (Sharma et al.,2019), FNS (El-

Haj et al.,2020), and GovReport (Huang et al.,

2021), have ﬁxed document layouts. ECTs, on the

other hand, are free-form documents with salient

information spread throughout the text (please refer

Section 3.3). Hence, models can no longer take ad-

vantage of learning any stylistic signals (Kry´

sci´

nski

et al.,2021). Third, the average length of ECTs is

around 2.9K words (before tokenization). On the

other hand, neural models employing BERT (De-

vlin et al.,2019), T5 (Raffel et al.,2020), or BART

(Lewis et al.,2020) as document encoders cannot

process documents longer than 512/1024 tokens.

Hence, despite achieving state-of-the-art perfor-

mances on short-document summarization datasets

such as CNN/DM (Nallapati et al.,2016), News-

room (Grusky et al.,2018), and XSum (Narayan

et al.,2018), etc., such models cannot be readily

applied to effectively summarize ECTs.

We benchmark the performance of several rep-

resentative supervised and unsupervised summa-

rizers on our newly proposed dataset (Section 5.1).

Among supervised methods, we select state-of-the-

art extractive, abstractive, and long document sum-

marization approaches. Given the pattern of source

transcripts and target summaries, we then present

ECT-BPS

, a simple yet effective pipeline approach

for the task of ECT summarization (Section 4). It

consists of an

extractive summarization

module

followed by a

paraphrasing

module. While, the

former is trained to identify salient sentences from

the source ECT, the latter is trained to paraphrase

ECT sentences to short abstractive telegram-style

bullet-points that precisely capture the numerical

values and facts discussed in the calls.

In order to demonstrate the challenges of the

proposed ECTSum dataset, competing methods are

evaluated on several metrics that assess the con-

tent quality and factual consistency of the model-

generated summaries. These metrics are discussed

in Section 5.2. We discuss the comparative results

of all considered methods against automatic eval-

uation metrics in Section 5.4. Given the complex

nuances of ﬁnancial reporting, we further conduct

a human evaluation experiment (survey results re-

ported in Section 5.5) where we hire a team of

ﬁnancial experts to manually assess and compare

the summaries generated by ECT-BPS, and those

of our strongest baseline. Overall, both automatic

and manual evaluation results show

ECT-BPS

outperform strong state-of-the-art baselines, which

demonstrates the advantage of a simple approach.

Our contributions can be summarized as follows:

•

We present

ECTSum

, the ﬁrst long document

summarization dataset in the ﬁnance domain that

requires models to process long unstructured

earning call transcripts and summarize them in

a few words while capturing crucial metrics and

maintaining factual consistency.

•

We propose

ECT-BPS

, a simple approach to ef-

fectively summarize ECTs while ensuring fac-

tual correctness of the generated content. We

establish its better efﬁcacy against strong summa-

rization baselines across all considered metrics

evaluating the content quality and factual correct-

ness of model-generated summaries.

•

Our dataset and codes are publicly available at

https://github.com/rajdeep345/ECTSum

2 Related Works

Automatic text summarization, extractive (Nalla-

pati et al.,2017;Zhong et al.,2020), abstractive

(Zhang et al.,2019;Lewis et al.,2020), as well

as long document summarization (Zaheer et al.,

2020;Beltagy et al.,2020) have seen tremendous

progress over the years (Huang et al.,2020). Sev-

eral works also exist on controllable summariza-

tion (Mukherjee et al.,2020;Amplayo et al.,2021)

and, in speciﬁc domains, such as disaster (Mukher-

jee et al.,2022), and legal (Shukla et al.,2022).

However, the ﬁeld of ﬁnancial data summarization

remains largely unexplored, primarily due to the un-

availability of suitable datasets. Passali et al. (2021)

have recently compiled a ﬁnancial news summa-

rization dataset consisting of around 2K Bloomberg

articles with corresponding human-written sum-

maries. However, similar to other popular newswire

datasets such as CNN/DM (Nallapati et al.,2016),

Newsroom (Grusky et al.,2018), XSum (Narayan

et al.,2018), the documents (news articles) them-

selves are only a few hundred words long, hence

limiting the practical importance of model gener-

ated summaries (Kry´

sci´

nski et al.,2021).

To the best of our knowledge, FNS (El-Haj et al.,

2020) is the only available ﬁnancial summarization

dataset, released as part of the Financial Narrative

Summarization Shared Task 2020

. In FNS, annual

reports of UK ﬁrms constitute the documents, and

a subset of narrative sections from the reports are

given verbatim as reference summaries. However,

ECTSum differs from FNS on several accounts.

First, our target summaries consist of a small set

of telegram-style bullet-points, whereas the ones

in FNS are large extractive portions from respec-

tive source documents. Second, ECTSum has a

very high document-to-summary compression ra-

tio (refer Section 3.3), because of which the models

are expected to generate extremely concise sum-

maries of around 50 words from lengthy unstruc-

tured ECTs, around 2.9K words long. In contrast,

the expected length of model-generated summaries

on FNS is around 1000 words. Finally, the models

developed on FNS are speciﬁcally trained to iden-

tify and summarize the narrative sections, while

completely ignoring others containing facts, and

ﬁgures that reﬂect the ﬁrm’s annual ﬁnancial per-

formance. Excluding these key performance indi-

cators from summaries limits their practical utility

to stakeholders. Models trained on ECTSum, on

the other hand, are speciﬁcally expected to capture

salient ﬁnancial metrics such as sales, revenues,

current trends, etc. in as few words as possible.

Previously, Cardinaels et al. (2018) had at-

tempted to summarize earnings calls using stan-

dard unsupervised approaches. We are however

the ﬁrst to propose and exhaustively benchmark a

large scale ﬁnancial long document summarization

dataset involving earnings call transcripts.

3 Dataset

This section describes our dataset,

ECTSum

, in-

cluding the data sources, and the steps taken to

sanitize the data, in order to obtain the document-

summary pairs. Finally, we conduct an in-depth

analysis of the dataset and report its statistics.

5http://wp.lancs.ac.uk/cﬁe/fns2020/

3.1 Data Collection

ECTs of listed companies are publicly hosted on

The Motley Fool

. We crawled the web pages corre-

sponding to all available ECTs for the Russell 3000

Index companies

posted between January 2019

and April 2022. In the process, we obtain a total of

7,389 ECTs. The HTML web pages were parsed

using the BeautifulSoup

library. ECTs typically

consist of two broad sections: Prepared Remarks,

where the company’s ﬁnancial results, for the given

reporting period, are presented; and Question and

Answers, where call participants ask questions re-

garding the presented results. We only consider the

unstructured text corresponding to the Prepared

Remarks section to form the source documents.

Collecting expert-written summaries correspond-

ing to these ECTs was a far more challenging task.

Reuters

hosts a huge repository of ﬁnancial news

articles from around the world. Among these, are

articles, written by analysts, that summarize earn-

ings calls events in the form of a few bulleted

points (see Table 1). After manually going through

several such articles, and after consulting experts

from Goldman Sachs, India, we understood that

these articles precisely capture the key takeaways

from earnings calls. Accordingly, using the com-

pany codes and dates of the earnings call events

corresponding to the collected ECTs, we crawled

Reuters web pages to search for relevant articles.

We obtained 3,013 Reuters articles in the process.

3.2 Data Cleaning and Pairing

Cleaning the ECTs:

Almost all earnings calls

(and hence the corresponding transcripts) begin

with an introduction by the call moderator/operator.

We remove these statements since they do not relate

to the ﬁnancial results discussed thereafter. Some

calls directly start with the Questions and Answers,

in which case we exclude them from the collection.

Cleaning the summaries:

For the Reuters (sum-

mary) articles, ﬁrst we performed simple pre-

processing to split the text into sentences. In many

articles, we observed sentences ending with the

phrase REFINITIV IBES DATA. Such sentences re-

port estimates made by Reﬁnitiv

analysts on the

6https://www.fool.com/earnings-call-transcripts/

7https://www.investopedia.com/terms/r/russell_3000.asp

8https://crummy.com/software/BeautifulSoup/

9https://www.reuters.com/business/

10https://tinyurl.com/27ehcxzf

11https://tinyurl.com/2p9e6kh2

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ECTSum:ANewBenchmarkDatasetForBulletPointSummarizationofLongEarningsCallTranscriptsRajdeepMukherjee1AbhinavBohra1AkashBanerjee1SoumyaSharma1ManjunathHegde2AfreenShaikh2ShivaniShrivastava2KoustuvDasgupta2yNiloyGanguly1,3ySaptarshiGhosh1yPawanGoyal1y1DepartmentofComputerScienceandEngineering,IITKhara...

展开>> 收起<<

ECTSum A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts Rajdeep Mukherjee1Abhinav Bohra1Akash Banerjee1Soumya Sharma1.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ECTSum A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts Rajdeep Mukherjee1Abhinav Bohra1Akash Banerjee1Soumya Sharma1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: