
though it improves the overall performance of ab-
stractive summarization in some cases (Dou et al.,
2021): 1) Extractive summaries are not reliable
guidance. When there are too many summary-
worthy sentences in the document, selecting a part
of them may prone to information loss. When
there are too few or no summary-worthy sentences,
using the selected extractive summaries could be
noisy and confusing to the model. 2) Extractive
summaries are not flexible to adapt to different
cases. The number and allocation of salience con-
tent pieces can vary by documents. Rather than
extracting a fixed number of sentences, a flexible
guidance should select salient content based on
document properties. An imperfect selection pro-
cess may also lead to further model biases, such
as positional biases or length biases (Zhong et al.,
2019). As the summarization process can differ for
distinct documents (Grusky et al.,2018;Koupaee
and Wang,2018), a reliable guidance should al-
low flexible content selection, and be adaptive to
documents with different abstractiveness.
In this paper, we propose a novel summariza-
tion approach with a flexible and reliable salience
guidance, namely
SEASON
(
S
alienc
E A
llocation
as Guidance for Abstractive
S
ummarizati
ON
).
Salience is the degree to which a sentence con-
tributes to the central idea of a document, and
its allocation means how salience is distributed
among all sentences in a document. To estimate
the salience allocation, a linear classifier is trained
on top of the encoder. This estimation is incorpo-
rated into the decoder with Salience-Aware Cross-
Attention (SACA). It provides the flexibility to de-
cide how much signal to accept from the salience
guidance to supervise the abstractive summariza-
tion. The ground-truth salience label is assigned
to each sentence based on its similarity with the
ground-truth summary. Meanwhile, the number
of salience degrees and their cut-off thresholds are
decided based on the corpus to balance informative-
ness and prediction accuracy. To further improve
the robustness of the summarization model, we
apply label smoothing between adjacent salience
degrees during training, and use the expectation of
salience as a more robust salience estimation.
The technical contributions of this work are
three-fold. First, we develop a new method for
abstractive summarization on Transformer-based
encoder-decoder architecture with the allocation
of salience expectation as flexible guidance (§3).
Our method provides reliable guidance that adapts
well to articles in different abstractiveness (§5.1).
Second, we show the effectiveness and reliability
of our proposed method comparing to the existing
methods in both automatic (§4.2) and human evalu-
ation (§5.3). Third, empirical results on more than
one million news articles show a natural fifteen-fifty
salience split for news article sentences (§4.3), pro-
viding a useful insight for composing news articles.
2 Related Work
Joint extractive and abstractive summarization.
Extractive summarization and abstractive summa-
rization are two general paradigms of text summa-
rization (See et al.,2017;Grusky et al.,2018). Ex-
tractive summarization ensures the faithfulness of
the generated summary but is not able to properly
summarize documents when rephrasing is needed
(Liu and Liu,2009). Abstractive summarization,
comparatively, is more flexible but may suffer from
hallucination (Maynez et al.,2020).
A series of studies attempt to benefit from the
advantages of both paradigms by combining them.
Hsu et al. (2018) encourage the word-level atten-
tion of an abstractive summarization model and the
relative sentence-level extraction probability from
an extractive summarization model to be consistent.
More recent studies show that conducting abstrac-
tive summarization with extractive summaries as a
part of the input leads to better performance (Saito
et al.,2020;Pilault et al.,2020;Dou et al.,2021).
Extractive summarization can also work as an ef-
fective content selector for abstractive summariza-
tion when summarizing long documents (Manakul
and Gales,2021). Some studies (Gehrmann et al.,
2018;Li et al.,2020;Saito et al.,2020) also con-
sider to extract key words or phrases instead of
summary worthy sentences as guidance, but their
performances are not as good as those using sen-
tences (Dou et al.,2021).
Our work extends the strict extractive summary
guidance to a soft guidance of salience allocation.
The proposed guidance is more flexible, reliable
and adaptive, leading to better performance.
Selective attention.
Selective attention is a psy-
chological concept referring to the differential pro-
cessing of simultaneous sources of information
(Johnston and Dark,1986). Incorporating prior
knowledge through selective attention is widely ex-
plored in natural language processing, especially in