an RNN (Wang et al.,2020;Jing et al.,2021) or
Transformer-based encoder (Cui et al.,2020;Kwon
et al.,2021) to supplement graph-based features.
Some research works integrated neural networks
with reinforcement learning (Dong et al.,2018;Gu
et al.,2022) or unsupervised learning frameworks
(Liang et al.,2021). In general, it can be said that
taking a pre-trained Transformer-based language
model as the starting point to encode textual seg-
ments in a document is currently the state-of-the-art
approach among neural extractive models. There-
fore, the Transformer-based models, i.e., RoBERTa
(Liu et al.,2019) and BART (Lewis et al.,2020),
are used as the basic building blocks in this paper.
2.2 Sub-sentential Extractive Summarization
Most previous works about the extractive task
focused on generating sentence-level summaries,
though some of them (Xiao et al.,2020;Cho et al.,
2020;Ernst et al.,2022) utilized sub-sentential
features. Early works by Marcu (1999); Alonso i
Alemany and Fuentes Fort (2003); Yoshida et al.
(2014); Li et al. (2016) exploited extracting
discourse-level textual segments as the summary
but those approaches were tested on small datasets.
More recent works by Liu and Chen (2019); Xu
et al. (2020); Huang and Kurohashi (2021) were
evaluated on relatively larger datasets. However,
whether the discourse-level textual segments are a
better alternative than sentences as the extractive
text unit was not justified in those works. To fill
this gap, we provide justification for this research
question from both theoretical and experimental
perspectives in this paper.
2.3 Flexible Extractive Summarization
Extractive summarization task is usually formu-
lated as extracting the top-
k
number of salient tex-
tual segments from a document. The fixed
k
value
for all documents results in the lack of variety in
the length of the generated summary. Few works
(Jia et al.,2020;Zhong et al.,2020;Chen et al.,
2021) managed to output summaries with varying
lengths. However, either it requires extra effort for
hyper-parameter searching on validation dataset to
find a valid threshold, or formulating the problem
as selecting a subset of top-
k
sentences makes the
variety of lengths limited to small lengths due to
the explosive nature of combination. In this paper,
we propose a model with varying
k
values but with-
out explicit limitation on the length or the need to
do hyper-parameter searching.
3 Oracle Analysis of EDUs and Sentences
Oracle analysis refers to the analysis of oracle
summary whose definition is stated in Section 3.1.
We conducted oracle analysis from both theoreti-
cal and experimental perspectives to justify and
quantify that discourse-level summary achieves
higher scores on automatic evaluation metrics than
sentence-level summary.
3.1 Theoretical Formulation
Elementary Discourse Unit (EDU), the discourse-
level textual segment in this paper, refers to the
terminal node in the Rhetorical Structure Theory
(RST) (Mann and Thompson,1988) tree which de-
scribes the discourse structure of a piece of text.
EDUs are non-overlapping and adjacent text spans
in the piece of text and a single EDU is essentially
a segment of a complete sentence, i.e., the sen-
tence itself or a clause in the sentence (Zeldes et al.,
2019). Namely, a sentence can always be expressed
with multiple EDUs, i.e., for the
s
-th sentence in
a document, there is
sents= [edus1, . . . , edusm]
.
Consequently, a one-way property from sentence
to EDU regarding expressiveness is derived.
Expressiveness Property
For any given
subset of sentences in a document, i.e.,
[senti, . . . , sentj, . . . , sentk]
, there is al-
ways a subset of EDUs in the document, i.e.,
[edui1, . . . , eduim, . . . , eduj1, . . . , edujm,...,
eduk1, . . . , edukm], having identical content.
Oracle Summary
The set of salient textual seg-
ments that have greedily the highest ROUGE
score(s) with the reference summary is the ora-
cle summary for a document. It signifies the upper
bound of performance that an extractive summa-
rization model could achieve on ROUGE metrics.
Denote the sentence-level oracle summary as
OSsent
and the EDU-level oracle summary as
OSedu
. Based on the aforementioned property and
definition, Theorem 1can be derived and its de-
tailed proof is provided below.
Theorem 1.
Given a document
D
and its reference
summary
R
, for any derived
OSsent
, there is al-
ways an
OSedu
having ROUGE
F1(R,OSedu)≥
ROUGEF1(R,OSsent).
Proof.
For ROUGE-N, let
fn
be a function that
generates the set of n-grams for the string
s
and
g
be a function that calculates the number of
overlapping elements between two sets
x
and
y
,