Salience Allocation as Guidance for Abstractive Summarization Fei Wangy Kaiqiang Songz Hongming Zhangz Lifeng Jinz Sangwoo Choz Wenlin Yaoz Xiaoyang Wangz Muhao Chenyand Dong Yuz

2025-05-03 0 0 435.3KB 13 页 10玖币
侵权投诉
Salience Allocation as Guidance for Abstractive Summarization
Fei Wang†∗, Kaiqiang Song‡∗, Hongming Zhang, Lifeng Jin, Sangwoo Cho
Wenlin Yao, Xiaoyang Wang, Muhao Chenand Dong Yu
University of Southern California; Tecent AI Lab, Seattle
{fwang598,muhaoche}@usc.edu
{riversong,hongmzhang,lifengjin,swcho,wenlinyao,shawnxywang,dyu}@global.tencent.com
Abstract
Abstractive summarization models typically
learn to capture the salient information from
scratch implicitly. Recent literature adds ex-
tractive summaries as guidance for abstrac-
tive summarization models to provide hints
of salient content and achieves better perfor-
mance. However, extractive summaries as
guidance could be over strict, leading to in-
formation loss or noisy signals. Furthermore,
it cannot easily adapt to documents with var-
ious abstractiveness. As the number and al-
location of salience content pieces varies, it
is hard to find a fixed threshold deciding
which content should be included in the guid-
ance. In this paper, we propose a novel
summarization approach with a flexible and
reliable salience guidance, namely SEASON
(SaliencE Allocation as Guidance for Abstrac-
tive SummarizatiON). SEASON utilizes the al-
location of salience expectation to guide ab-
stractive summarization and adapts well to ar-
ticles in different abstractiveness. Automatic
and human evaluations on two benchmark
datasets show that the proposed method is ef-
fective and reliable. Empirical results on more
than one million news articles demonstrate a
natural fifteen-fifty salience split for news ar-
ticle sentences, providing a useful insight for
composing news articles.1
1 Introduction
Abstractive summarization seeks to generate con-
cise descriptions about synoptic information of
longer documents (Rush et al.,2015;Nallapati
et al.,2016;See et al.,2017). Tackling this task
can provide users with improved dissemination and
acquisition of more readable content in long doc-
uments. More concretely, it allows for enhanced
selection, compression and retrieval of Web-scale
*
Work done during Fei Wang’s internship at Tencent AI
Lab Seattle. The first two authors contributed equally.
1
Code and model weights are available at
https://
github.com/tencent-ailab/season.
Between iPhones, flat-screens and …
A new report from Suncorp Bank …
The report found Australians spent …
Men spent twice as much as women …
On average, men spent $2618 over
The report also found that families …
'The report found adults without
Despite the mounting costs, the …
Mobile phone bills were the biggest …
'Call and data plans for phones …
'A quarter of Australians who use …
Document Sentences
Extractive
Summary
Salience
Allocation
Figure 1: Illustration of different guidance. Extractive
summary is a strict guidance consisting of extracted
sentences labeled with check-mark. Salience allocation
is a flexible guidance mapping sentences to different
salience degrees shown as a bar chart.
textual information that benefits other NLP tasks
such as machine reading comprehension (Inoue
et al.,2021), mention linking (Cheng et al.,2015),
claim verification (Yin et al.,2021), and informa-
tion extraction (Lu et al.,2022).
Abstractive summarization models are typically
trained end-to-end using large collections of paired
corpora of raw documents and human-written sum-
maries to directly perform sequence-to-sequence
generation. In terms of deciding what to include
in the generated summaries, these models im-
plicitly learn to capture the salient information
from scratch. Accordingly, recent literature has
attempted to add auxiliary extractive salience guid-
ance for abstractive summarization models to give
them a higher-level understanding of input docu-
ments, among which, extractive summaries appear
to provide the most effective guidance (Li et al.,
2020;Jin et al.,2020;Dou et al.,2021). Methods
following this strategy learn to first perform extrac-
tive summarization, then perform abstraction on
top of the extractive summaries (Hsu et al.,2018;
Pilault et al.,2020;Dou et al.,2021).
However, incorporating extractive summaries as
a form of guidance is evidently imperfect, even
arXiv:2210.12330v1 [cs.CL] 22 Oct 2022
though it improves the overall performance of ab-
stractive summarization in some cases (Dou et al.,
2021): 1) Extractive summaries are not reliable
guidance. When there are too many summary-
worthy sentences in the document, selecting a part
of them may prone to information loss. When
there are too few or no summary-worthy sentences,
using the selected extractive summaries could be
noisy and confusing to the model. 2) Extractive
summaries are not flexible to adapt to different
cases. The number and allocation of salience con-
tent pieces can vary by documents. Rather than
extracting a fixed number of sentences, a flexible
guidance should select salient content based on
document properties. An imperfect selection pro-
cess may also lead to further model biases, such
as positional biases or length biases (Zhong et al.,
2019). As the summarization process can differ for
distinct documents (Grusky et al.,2018;Koupaee
and Wang,2018), a reliable guidance should al-
low flexible content selection, and be adaptive to
documents with different abstractiveness.
In this paper, we propose a novel summariza-
tion approach with a flexible and reliable salience
guidance, namely
SEASON
(
S
alienc
E A
llocation
as Guidance for Abstractive
S
ummarizati
ON
).
Salience is the degree to which a sentence con-
tributes to the central idea of a document, and
its allocation means how salience is distributed
among all sentences in a document. To estimate
the salience allocation, a linear classifier is trained
on top of the encoder. This estimation is incorpo-
rated into the decoder with Salience-Aware Cross-
Attention (SACA). It provides the flexibility to de-
cide how much signal to accept from the salience
guidance to supervise the abstractive summariza-
tion. The ground-truth salience label is assigned
to each sentence based on its similarity with the
ground-truth summary. Meanwhile, the number
of salience degrees and their cut-off thresholds are
decided based on the corpus to balance informative-
ness and prediction accuracy. To further improve
the robustness of the summarization model, we
apply label smoothing between adjacent salience
degrees during training, and use the expectation of
salience as a more robust salience estimation.
The technical contributions of this work are
three-fold. First, we develop a new method for
abstractive summarization on Transformer-based
encoder-decoder architecture with the allocation
of salience expectation as flexible guidance (§3).
Our method provides reliable guidance that adapts
well to articles in different abstractiveness (§5.1).
Second, we show the effectiveness and reliability
of our proposed method comparing to the existing
methods in both automatic (§4.2) and human evalu-
ation (§5.3). Third, empirical results on more than
one million news articles show a natural fifteen-fifty
salience split for news article sentences (§4.3), pro-
viding a useful insight for composing news articles.
2 Related Work
Joint extractive and abstractive summarization.
Extractive summarization and abstractive summa-
rization are two general paradigms of text summa-
rization (See et al.,2017;Grusky et al.,2018). Ex-
tractive summarization ensures the faithfulness of
the generated summary but is not able to properly
summarize documents when rephrasing is needed
(Liu and Liu,2009). Abstractive summarization,
comparatively, is more flexible but may suffer from
hallucination (Maynez et al.,2020).
A series of studies attempt to benefit from the
advantages of both paradigms by combining them.
Hsu et al. (2018) encourage the word-level atten-
tion of an abstractive summarization model and the
relative sentence-level extraction probability from
an extractive summarization model to be consistent.
More recent studies show that conducting abstrac-
tive summarization with extractive summaries as a
part of the input leads to better performance (Saito
et al.,2020;Pilault et al.,2020;Dou et al.,2021).
Extractive summarization can also work as an ef-
fective content selector for abstractive summariza-
tion when summarizing long documents (Manakul
and Gales,2021). Some studies (Gehrmann et al.,
2018;Li et al.,2020;Saito et al.,2020) also con-
sider to extract key words or phrases instead of
summary worthy sentences as guidance, but their
performances are not as good as those using sen-
tences (Dou et al.,2021).
Our work extends the strict extractive summary
guidance to a soft guidance of salience allocation.
The proposed guidance is more flexible, reliable
and adaptive, leading to better performance.
Selective attention.
Selective attention is a psy-
chological concept referring to the differential pro-
cessing of simultaneous sources of information
(Johnston and Dark,1986). Incorporating prior
knowledge through selective attention is widely ex-
plored in natural language processing, especially in
Self
Attention
Feed
Forward
Add & Norm
Input
Embedding
Add & Norm
Self
Attention
Feed
Forward
Add & Norm
×N
Output
Embedding
Add & Norm
Cross
Attention
Add & Norm
Linear
Softmax
Salience
Probabilities
Softmax
Output
Probabilities
Linear
Salience
Embedding
QK
V
𝝵
(x)
Figure 2: Model architecture of SEASON. The
proposed modules are highlighted with bold lines.
SEASON adds a salience predictor on top of the en-
coder, maps (the expectation of) salience degrees to
corresponding embeddings, and adds these salience em-
beddings to the key vectors of cross attention.
recent NLP models with attention mechanism (Lin
et al.,2016;Sukhbaatar et al.,2019;Pruthi et al.,
2020;Beltagy et al.,2020;Wang et al.,2022). To
modify the summarization process with selective
attention, previous studies either adjust the atten-
tion scores based on content selection probabilities
directly (Hsu et al.,2018;Saito et al.,2020;Li
et al.,2021), or appending selected content in the
input (Saito et al.,2020;Dou et al.,2021). Recent
studies show that the latter method with sentence-
level content selection performs better (Dou et al.,
2021).
Different from prior studies,
SEASON
maps
salience degrees to distinct embeddings and adds
them to the encoder outputs as key vector for cross-
attention. This gives our model the flexibility to
decide how much signal to accept from the salience
guidance for supervising the abstractive summa-
rization process. This strategy achieves better per-
formance in comparison with previous salience-
guided selective attention methods.
3 SEASON
In this work, we employ a Transformer-based
encoder-decoder model for abstractive summariza-
tion. As shown in Fig. 2, our model
SEASON
en-
capsulates salience prediction and text summariza-
tion in a single network. We perform multi-task
end-to-end training, and inference via one forward
pass. During training, the model jointly learns to
predict the degree of salience for each sentence
and is guided with ROUGE-based ground-truth
salience allocation to generate the abstractive sum-
mary. During inference,
SEASON
predicts the ex-
pected salience allocation intermediately with the
encoder outputs, and uses this predicted informa-
tion to guide the decoder to generate the summary.
3.1 Problem Formulation
Our assumption comes from an intuition that know-
ing the content salience allocation helps the model
to pay attention to important content and generate
more informative summaries. Although the con-
tent salience allocation is a built-in attribute of the
source document, it is hard for the model to lever-
age this attribute without direct supervision (Li
et al.,2020;Saito et al.,2020;Dou et al.,2021).
Let
x
be the sequence of input tokens in the
source document, and
y
be the sequence of the sum-
mary tokens, where every token
xi
or
yi
is in the
vocabulary
V
. We use
zj
, where
j∈ {1, . . . , N}
,
to represent the salience degree of the
j
-th sen-
tence in the input document. We define
oi
as
the sentence index for the
i
-th token, where
oi
{1, . . . , N}
. The salience allocation is defined as
ζ(x) = [f(zo1),...,f(zo|x|)]
.
2
The problem can
be formulated as follows:
P(y|x) =
|y|
Y
k=1
pθ(yk|y<k,x, ζ(x)).(1)
In Eq. 1, each token prediction is conditioned on
the previously decoded summary tokens, the input
tokens in the source document, and the allocation
of salience of the source document.
3.2 Salience Allocation Prediction
To predict salience degrees of input sentences, we
slightly modify the encoder input sequence by
adding a special token at the beginning of each
sentence, obtaining their last-layer hidden states as
sentence representations:
[hsent
1,...,hsent
n] = Encoder(ˆx),(2)
2f(·)
is a function that maps the sentence salience degree
to an embedding vector. In our implementation, we use the
ground-truth salience embedding for training, and the expected
embedding over the inferred salience distribution for testing.
摘要:

SalienceAllocationasGuidanceforAbstractiveSummarizationFeiWangy,KaiqiangSongz,HongmingZhangz,LifengJinz,SangwooChozWenlinYaoz,XiaoyangWangz,MuhaoChenyandDongYuzyUniversityofSouthernCalifornia;zTecentAILab,Seattle{fwang598,muhaoche}@usc.edu{riversong,hongmzhang,lifengjin,swcho,wenlinyao,shawnxywang...

展开>> 收起<<
Salience Allocation as Guidance for Abstractive Summarization Fei Wangy Kaiqiang Songz Hongming Zhangz Lifeng Jinz Sangwoo Choz Wenlin Yaoz Xiaoyang Wangz Muhao Chenyand Dong Yuz.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:435.3KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注