encoder. Several recent summarization models were proposed from this perspective. For example,
Wu et al. [
9
] proposed a unified semantic graph encoder to learn better semantic meanings and a
graph-aware decoder to utilize the encoded information. Cao et al. [
10
] used contrastive learning
to help the model be aware of the factual information. The second type of error, extrinsic error, is
often introduced by excessive attention paid to the LM, which ensures fluency while neglecting to
summarize the source document. For example, a LM is inclined to generate the commonly-used
phrase “score the winner” while the correct phrase is “score the second highest” which is less
frequently used. This type of error has been studied in the neural machine translation task [
11
], but
has not been addressed in abstractive summarization.
To address these errors, we propose a novel Faithfulness Enhanced Summarization model (FES).
To prevent the intrinsic error problem, we design FES in a multi-task learning paradigm, i.e.,
completing encoding-decoding for the summarization task with an auxiliary QA-based faithfulness
evaluation task. The QA task poses an additional reasoning requirement on the encoder to have a
more comprehensive understanding on the key semantic meanings of the input document and learn
better representations than working only for summarization. The QA attention on the key entities of
the input can also be used to align the decoder state with the encoder outputs for generating a faithful
summary. To address the extrinsic error problem, we propose a max-margin loss to prevent the LM
from being overconfident. Concretely, we define an indicator of the overconfidence degree of the
LM. The risk of outputting extrinsic error tokens with low prediction probabilities is mitigated by
minimizing this overconfidence indicator.
We validate the effectiveness of our FES model by conducting extensive experiments on public
benchmark CNN/DM [
12
] and XSum [
13
] datasets. Experimental results demonstrate that our
faithfulness enhanced summarization model has superior performance on the ROUGE scores and
improves the faithfulness of news summarization over several strong baselines.
Our main contributions can be summarized as follows. (1) We propose a faithfulness enhanced
summarization model, which alleviates the unfaithfulness problem from the encoder side and decoder
side. (2) Concretely, we propose a multi-task framework to enhance the summarization performance
by automatic QA tasks. We also propose a max-margin loss to control the overconfident problem
of the LM. (3) Experimental results demonstrate that our proposed approach brings substantial
improvements over the most recent baselines on benchmark datasets, and can also improve the
faithfulness of the generated summary.
2 Related Work
Abstractive Summarization.
In recent years, the research on text generation has made impressive
progress [
14
,
15
,
16
], which promotes the progress of abstractive summarization. The abstractive
summarization task generates novel words and phrases not featured in the source text to capture the
salient ideas of the source text [
17
]. Most works apply an encoder-decoder architecture to implicitly
learn the summarization procedure [
18
,
19
]. More recently, applying pretrained language models
as encoder [
4
,
20
] or pre-training the generation process by leveraging a large-scale of unlabeled
corpus [
21
,
22
] brings significant improvements. Explicit structure modeling has also been shown to
be effective in summarization tasks. For example, Jin et al. [
23
] incorporated semantic dependency
graphs to help generate sentences with better semantic relevance, and Wu et al. [
9
] came up with a
unified semantic graph to aggregate relevant disjoint context from the input.
Fact Consistency for Abstractive Summarization.
Producing a summary that is entailed by the
information presented in the source document is a key challenge in the summarization task, and less
progress has been made on it. Pioneer works [
24
,
25
] incorporated fact descriptions or entailment
knowledge to enhance faithfulness. More recently, Zhu et al. [
26
] modeled the facts in the source
article with knowledge graphs based on a graph neural network. Cao et al. [
10
] proposed to leverage
reference summaries as positive training data and erroneous summaries as negative data, to train
summarization systems that are better at distinguishing between them. Aralikatte et al. [
27
] introduced
focus attention mechanism to encourage decoders to proactively generate tokens that are similar
or topical to the input document. On the contrary, other works post-edit the generated summaries.
Different from previous works, we enhance the semantic understanding of the document with
faithfulness evaluation as a direct signal and prevent the overconfidence of LM which is not addressed
before.
2