EDU-level Extractive Summarization with Varying Summary Lengths

2025-04-22 0 0 1.53MB 13 页 10玖币
侵权投诉
EDU-level Extractive Summarization with Varying Summary Lengths
Yuping Wu, Ching-Hsun Tseng, Jiayu Shang, Shengzhong Mao,
Goran Nenadic, Xiao-Jun Zeng
Department of Computer Science, University of Manchester
{yuping.wu-2, ching-hsun.tseng, jiayu.shang,
shengzhong.mao}@postgrad.manchester.ac.uk
gnenadic, x.zeng@manchester.ac.uk
Abstract
Extractive models usually formulate text sum-
marization as extracting fixed top-ksalient sen-
tences from the document as a summary. Few
works exploited extracting finer-grained Ele-
mentary Discourse Unit (EDU) with little anal-
ysis and justification for the extractive unit se-
lection. Further, the selection strategy of the
fixed top-ksalient sentences fits the summa-
rization need poorly, as the number of salient
sentences in different documents varies and
therefore a common or best kdoes not exist in
reality. To fill these gaps, this paper first con-
ducts the comparison analysis of oracle sum-
maries based on EDUs and sentences, which
provides evidence from both theoretical and
experimental perspectives to justify and quan-
tify that EDUs make summaries with higher
automatic evaluation scores than sentences.
Then, considering this merit of EDUs, this pa-
per further proposes an EDU-level extractive
model with Varying summary Lengths (EDU-
VL1) and develops the corresponding learning
algorithm. EDU-VL learns to encode and pre-
dict probabilities of EDUs in the document,
generate multiple candidate summaries with
varying lengths based on various kvalues, and
encode and score candidate summaries, in an
end-to-end training manner. Finally, EDU-VL
is experimented on single and multi-document
benchmark datasets and shows improved per-
formances on ROUGE scores in comparison
with state-of-the-art extractive models, and
further human evaluation suggests that EDU-
constituent summaries maintain good gram-
maticality and readability.
1 Introduction
Automatic text summarization aims at aggregat-
ing information in long document(s) into a shorter
piece of text while keeping important information.
Extractive summarization and abstractive summa-
rization are two categories of it. This paper focuses
Corresponding author.
1https://github.com/yuping-wu/EDU-VL
Document:
(...) [The second audio,] [taken
from dash cam video from inside a patrol car,]
[captures a phone call between Slager and
someone] [CNN believes] [is his wife.] (...)
Reference Summary:
The second audio cap-
tures a phone call between Slager and some-
one CNN believes is his wife.
Table 1: Example to demonstrate redundant informa-
tion in sentence. Content within [] indicates an EDU.
only on the extractive task which formulates sum-
marization as identifying salient textual segments
in document (Lunh,1958). Under the supervised
learning framework, this task is further formulated
as a label classification task, i.e., encoding textual
segments and predicting labels on the encoded vec-
tors. Recent state-of-the-art models (Liu and Lap-
ata,2019;Zhong et al.,2020;Liu et al.,2021;Ruan
et al.,2022) on this task tend to be Transformer-
based since BERT (Devlin et al.,2019) shows sig-
nificantly better performance than RNN on most
natural language understanding tasks.
Most existing works extract sentences from the
document and some works further (Xu and Durrett,
2019) propose post-processing steps to prune the
generated summary. The only exception is the few
works (Liu and Chen,2019;Huang and Kurohashi,
2021), which extract finer-grained textual segments,
i.e., discourse-level text or EDU, with little justifi-
cation. The intuition is that a sentence consisting
of multiple clauses is inevitable to contain less im-
portant information. As demonstrated in Table 1,
partially removing a clause in the sentence is con-
ducive to generating a summary. Certainly, such
an intuitive explanation does not provide enough
evidence and support to justify the use of finer-
grained textual segments such as EDU to substitute
sentences. Considering such a gap in existing re-
search, the first main motivation of this paper is to
propose and conduct the comparison analysis be-
arXiv:2210.04029v2 [cs.CL] 13 Mar 2023
tween sentences and EDUs to disclose and justify
whether using EDU is a theoretically advanced and
application-advantaged extractive unit.
When selecting textual segments, the top-
k
strat-
egy with
k
fixed for all documents is dominant
in deciding the length of the generated summary.
Some works (Zhong et al.,2020;Chen et al.,2021)
manage to output summaries with different lengths,
i.e., various numbers of extracted segments, via
formulating the problem as deriving a subset of
sentences from the combination of top-
k
sentences.
Due to the foreseeing explosion of the combina-
tion of sentences to form subsets, these approaches
are limited to generating summaries with relatively
small values of
k
. To overcome such a weakness,
the second main motivation of this paper is to pro-
pose and develop an approach allowing varying
lengths for extractive summarization without ex-
plicit limitation on the maximum value of
k
, i.e.,
the maximum length.
Following the above motivations, the compari-
son analysis between EDUs and sentences ascer-
tains that EDU is a better text unit for the extrac-
tive task because EDU-level summaries achieve
higher automatic evaluation scores than sentence-
level summaries. This conclusion is justified from
two perspectives. Theoretically, a formal theorem
about this conclusion could be derived from the
property that EDU is essentially part of a sentence.
Experimentally, results of comprehensive analy-
sis about oracle summaries of five datasets fur-
ther quantify this conclusion, i.e., how much the
ROUGE scores of EDU-level oracle summary are
higher than sentence-level oracle summary.
Based on the aforementioned conclusion and
foundation, this paper further proposes and devel-
ops an EDU-level extractive model and algorithm,
which generates summaries with varying lengths,
i.e., EDU-VL. We extend Transformer-based pre-
trained language model with an extra classification
layer to encode EDUs in a document and predict
the corresponding probabilities. Multiple
k
values
are provided to the model to generate a set of can-
didate summaries under the flexible top-
k
strategy
for the document. Multiple Transformer encoder
layers encode the full document and candidate sum-
maries individually. Finally, a similarity score with
the encoded document is calculated for each candi-
date summary and the one with the highest score is
the final output of EDU-VL.
Experiments are conducted on five benchmark
datasets from different domains and with various
writing styles. The experimental results suggest
that EDU-VL achieves better performance than
all state-of-the-art extractive baselines on single-
document summarization datasets CNN/DailyMail,
XSum, Reddit, and WikiHow, in terms of three
ROUGE metrics. With direct comparison to the
multi-document model, EDU-VL still achieves
comparable performance on the multi-document
summarization dataset Multi-News. Human eval-
uation is further carried for the summaries gener-
ated by EDU-VL to assess the syntax structure of
EDU-constituent summaries. The results provide
evidence for the good grammaticality and readabil-
ity of EDU-constituent summaries and therefore
justify the applicability.
The contributions of this paper are threefold:
1)
We justify and quantify that EDU-level
achieves higher automatic evaluation scores
than sentence-level oracle summary from both
theoretical and experimental perspectives, in-
dicating that setting EDU as the extractive
text unit is exploitable and superior in appli-
cations.
2)
We propose a varying summary lengths-
enabled extractive model with EDU-level text
unit. Such a model and its learning algorithm
encodes EDUs in a document and outputs a
summary with varying length by making
k
in
the top-kextraction strategy varying.
3)
Our proposed model achieves superior per-
formance on four single-document summa-
rization datasets on three ROUGE metrics.
Human evaluations show that the generated
EDU-constituent summaries maintain good
grammaticality and readability.
2 Related Work
2.1 Neural Extractive Summarization
The extractive text summarization task aims at ex-
tracting salient textual segments from the original
document(s) as a summary. A tendency observed
among extractive neural models is that the archi-
tecture changes from RNN (Nallapati et al.,2017;
Xu and Durrett,2019) to Transformer-based mod-
els, e.g., BERT (Zhang et al.,2019;Liu and Lap-
ata,2019) and Longformer (Liu et al.,2021;Ruan
et al.,2022). GNN also gained extensive atten-
tion in recent years and is usually stacked after
an RNN (Wang et al.,2020;Jing et al.,2021) or
Transformer-based encoder (Cui et al.,2020;Kwon
et al.,2021) to supplement graph-based features.
Some research works integrated neural networks
with reinforcement learning (Dong et al.,2018;Gu
et al.,2022) or unsupervised learning frameworks
(Liang et al.,2021). In general, it can be said that
taking a pre-trained Transformer-based language
model as the starting point to encode textual seg-
ments in a document is currently the state-of-the-art
approach among neural extractive models. There-
fore, the Transformer-based models, i.e., RoBERTa
(Liu et al.,2019) and BART (Lewis et al.,2020),
are used as the basic building blocks in this paper.
2.2 Sub-sentential Extractive Summarization
Most previous works about the extractive task
focused on generating sentence-level summaries,
though some of them (Xiao et al.,2020;Cho et al.,
2020;Ernst et al.,2022) utilized sub-sentential
features. Early works by Marcu (1999); Alonso i
Alemany and Fuentes Fort (2003); Yoshida et al.
(2014); Li et al. (2016) exploited extracting
discourse-level textual segments as the summary
but those approaches were tested on small datasets.
More recent works by Liu and Chen (2019); Xu
et al. (2020); Huang and Kurohashi (2021) were
evaluated on relatively larger datasets. However,
whether the discourse-level textual segments are a
better alternative than sentences as the extractive
text unit was not justified in those works. To fill
this gap, we provide justification for this research
question from both theoretical and experimental
perspectives in this paper.
2.3 Flexible Extractive Summarization
Extractive summarization task is usually formu-
lated as extracting the top-
k
number of salient tex-
tual segments from a document. The fixed
k
value
for all documents results in the lack of variety in
the length of the generated summary. Few works
(Jia et al.,2020;Zhong et al.,2020;Chen et al.,
2021) managed to output summaries with varying
lengths. However, either it requires extra effort for
hyper-parameter searching on validation dataset to
find a valid threshold, or formulating the problem
as selecting a subset of top-
k
sentences makes the
variety of lengths limited to small lengths due to
the explosive nature of combination. In this paper,
we propose a model with varying
k
values but with-
out explicit limitation on the length or the need to
do hyper-parameter searching.
3 Oracle Analysis of EDUs and Sentences
Oracle analysis refers to the analysis of oracle
summary whose definition is stated in Section 3.1.
We conducted oracle analysis from both theoreti-
cal and experimental perspectives to justify and
quantify that discourse-level summary achieves
higher scores on automatic evaluation metrics than
sentence-level summary.
3.1 Theoretical Formulation
Elementary Discourse Unit (EDU), the discourse-
level textual segment in this paper, refers to the
terminal node in the Rhetorical Structure Theory
(RST) (Mann and Thompson,1988) tree which de-
scribes the discourse structure of a piece of text.
EDUs are non-overlapping and adjacent text spans
in the piece of text and a single EDU is essentially
a segment of a complete sentence, i.e., the sen-
tence itself or a clause in the sentence (Zeldes et al.,
2019). Namely, a sentence can always be expressed
with multiple EDUs, i.e., for the
s
-th sentence in
a document, there is
sents= [edus1, . . . , edusm]
.
Consequently, a one-way property from sentence
to EDU regarding expressiveness is derived.
Expressiveness Property
For any given
subset of sentences in a document, i.e.,
[senti, . . . , sentj, . . . , sentk]
, there is al-
ways a subset of EDUs in the document, i.e.,
[edui1, . . . , eduim, . . . , eduj1, . . . , edujm,...,
eduk1, . . . , edukm], having identical content.
Oracle Summary
The set of salient textual seg-
ments that have greedily the highest ROUGE
score(s) with the reference summary is the ora-
cle summary for a document. It signifies the upper
bound of performance that an extractive summa-
rization model could achieve on ROUGE metrics.
Denote the sentence-level oracle summary as
OSsent
and the EDU-level oracle summary as
OSedu
. Based on the aforementioned property and
definition, Theorem 1can be derived and its de-
tailed proof is provided below.
Theorem 1.
Given a document
D
and its reference
summary
R
, for any derived
OSsent
, there is al-
ways an
OSedu
having ROUGE
F1(R,OSedu)
ROUGEF1(R,OSsent).
Proof.
For ROUGE-N, let
fn
be a function that
generates the set of n-grams for the string
s
and
g
be a function that calculates the number of
overlapping elements between two sets
x
and
y
,
摘要:

EDU-levelExtractiveSummarizationwithVaryingSummaryLengthsYupingWu,Ching-HsunTseng,JiayuShang,ShengzhongMao,GoranNenadic,Xiao-JunZengDepartmentofComputerScience,UniversityofManchester{yuping.wu-2,ching-hsun.tseng,jiayu.shang,shengzhong.mao}@postgrad.manchester.ac.ukgnenadic,x.zeng@manchester.ac.ukAb...

展开>> 收起<<
EDU-level Extractive Summarization with Varying Summary Lengths.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.53MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注