
aggregating training samples from several smaller
datasets of multiple related tasks. MTL helps the
model learn shared representations between the pri-
mary task (summarization) and the auxiliary task
(rhetorical role identification) to generalize better.
The identification of rhetorical roles involves iden-
tifying the function of different sentences to un-
derstand underlying reasoning and argument pat-
terns in legal decisions. Previous works have of-
ten used rhetorical role labeling as a precursor to
extractive summarization to improve performance
(Zhong et al.,2019;Bhattacharya et al.,2021b). In
this paper, we explore the idea of using rhetorical
role identification as an auxiliary task to augment
our annotated dataset and help generate better sum-
maries.
In brief, we consider our contributions to the
extractive summarization of legal documents as
follows:
•
We generate informative summaries with
maximum information and minimum redun-
dancy in a low-resource setting. Our experi-
ments demonstrate a general improvement in
ROUGE scores for the proposed approaches.
•
We further improve the summarizer using a
multi-task setting by combining extractive
summarization and rhetorical role labeling.
The quantitative evaluation demonstrates that
the multi-task models perform better than the
single-task models.
•
We evaluate the generated summaries qualita-
tively with the help of a legal expert. In con-
trast to the quantitative evaluation, the qualita-
tive results show that our proposed approaches
rank at least as good as human annotators.1
2 Related Work
2.1 Extractive Summarization
Galgani et al. (2012) developed a rule-based ap-
proach to summarization that uses a knowledge
base, statistical information, and other handcrafted
features like POS tags, specific legal terms, and
citations. Kim et al. (2012) propose a graph-
based summarization system that constructs a di-
rected graph for each document where nodes are
assigned weights based on how likely words in a
given sentence appear in the conclusion of judg-
ments. CaseSummarizer (Polsley et al.,2016), an
automated text summarization tool, uses word fre-
quency augmented with additional domain-specific
1Our code is available here
knowledge to score the sentences in the case docu-
ment. Liu and Chen (2019) propose a classification-
based approach that uses several handcrafted fea-
tures as input. However, such techniques require
knowledge engineering of different features and
do not tackle redundancy in legal decisions. Re-
cently, various proposed approaches have tried
to address redundancy in legal decisions for pur-
poses of summarization. Zhong et al. (2019) hy-
pothesize that the iterative selection of predictive
sentences using a CNN-based train-attribute-mask
pipeline followed by a Random Forest classifier
to distinguish between sentences containing Rea-
soning/EvidentialSupport and other types. MMR
then selects the final sentences for the summary.
(Bhattacharya et al.,2021b) demonstrate an unsu-
pervised approach named DELSumm that gener-
ates extractive summaries by incorporating guide-
lines from legal experts into an optimization prob-
lem that maximizes the informativeness and con-
tent words, as well as conciseness. In this work,
we use an MMR-based variant which tackles redun-
dancy explicitly and can be combined with a neu-
ral classifier to generate summaries. It alleviates
the need to engineer handcrafted features or spe-
cific expert guidelines to prevent redundancy.
2.2 Rhetorical Role Labeling
Saravanan and Ravindran (2010) propose a rule-
based system along with a Conditional Random
Field (CRF) approach to identify the different seg-
ments. Nejadgholi et al. (2017) proposed a semi-
supervised approach to searching legal facts in
immigration-specific case documents by using an
unsupervised word embedding model to aid the
training of a supervised fact-detecting classifier us-
ing a small set of annotated sentences. The authors
in (Walker et al.,2019) compare the performance
between rule-based scripts and ML algorithms to
classify sentences that state findings of fact. Bhat-
tacharya et al. (2019) explore the use of hierarchical
BiLSTM models by adding an attention layer and
experiment with the pre-trained word and sentence
embeddings (Bhattacharya et al.,2021a). (Savelka
et al.,2021) annotated legal cases from seven coun-
tries in six languages using a structural type system
and found that Bi-GRU models could be general-
ized for data across different jurisdictions to some
degree. Despite copious work, there are very few
annotated rhetorical role datasets in the legal do-
main. In this work, we use rhetorical role label-