
previous work have attempted to craft adversarial
samples for Seq2Seqs on machine translation (Be-
linkov and Bisk,2018;Cheng et al.,2020), they
have not been extensively studied across various
generation tasks. Furthermore, they have not been
studied on current pre-trained Seq2Seq models.
To address this problem, we provide the first
quantitative analysis on the robustness of pre-
trained Seq2Seqs. Specifically, we quantitatively
analyze the robustness of BART across three gen-
eration tasks, i.e., text summarization, table-to-text,
and dialogue generation. Through slightly mod-
ifying the input content, we find that the corre-
sponding output significantly drops in both infor-
mativeness and faithfulness, which also demon-
strate the close connection between the robustness
of Seq2Seq models and their informativeness and
faithfulness on generation tasks.
Based on the analysis above, we further propose
a novel
Adv
ersarial augmentation framework for
Seq
uence-to-Sequence generation (AdvSeq) to en-
hance its robustness against perturbations and thus
obtain an informative and faithful text generation
model. AdvSeq constructs challenging and factu-
ally consistent adversarial samples and learns to
defend against their attacks. To increase the diver-
sity of the adversarial samples, AdvSeq applies two
types of perturbation strategies, implicit adversarial
samples (AdvGrad) and explicit token swapping
(AdvSwap), efficiently utilizing the back-propagate
gradient during training. AdvGrad directly perturbs
word representations with gradient vectors, while
AdvSwap utilizes gradient directions for searching
for token replacements. To alleviate the vulnera-
bility of NLL, AdvSeq adopts a KL-divergence-
based loss function to train with those adversarial
augmentation samples, which promotes higher in-
variance in the word representation space (Miyato
et al.,2019).
We evaluate AdvSeq by extensive experiments
on three generation tasks: text summarization,
table-to-text, and dialogue generation. Our exper-
iments demonstrate that AdvSeq can effectively
improve Seq2Seq robustness against adversarial
samples, which result in better informativeness
and faithfulness on various text generation tasks.
Comparing to existing adversarial training meth-
ods for language understanding and data augmen-
tation methods for Seq2Seqs, AdvSeq can more
effectively improve both the informativeness and
faithfulness for text generation tasks.
We summarize our contributions as follows:
•
To the best of our knowledge, we are the first
to conduct quantitative analysis on the robust-
ness of pre-trained Seq2Seq models, which
reveal its close connection with their informa-
tiveness and faithfulness on generation tasks.
•
We propose a novel adversarial argumenta-
tion framework for Seq2Seq models, namely
AdvSeq, which effectively improves their in-
formativeness and faithfulness on various gen-
eration tasks via enhancing their robustness.
•
Automatic and human evaluations on three
popular text generation tasks validate that Ad-
vSeq significantly outperforms several strong
baselines in both informativeness and faithful-
ness.
2 Seq2Seq Robustness Analysis
In this section, we analyze the robustness of the
Seq2Seq by evaluating its performance on adversar-
ial samples. In brief, after the input contexts are mi-
norly modified, we check whether the model main-
tains its informativeness and faithfulness. A robust
model should adaptively generate high-quality texts
corresponding to the modified inputs.
Following the definition of adversarial exam-
ples on Seq2Seq models, adversarial examples
should be meaning-preserving on the source side,
but meaning-destroying on the target side (Michel
et al.,2019). Formally, given an input context
x
and its reference text
yref
from the test set of a
task, and a Seq2Seq model
fθ
trained on the train-
ing set, we fist collect the original generated text
y=fθ(x)
. We measure its faithfulness and infor-
mativeness by
Ef(x, y)
and
Ei(x, y, yref )
, where
Ef
and
Ei
are the faithfulness and informativeness
metrics, respectively. Then, we craft an adversarial
sample
x0
by slightly modifying
x
trying not to
alter its original meaning and generate
y0
grounded
on
x0
. Finally, we measure the target relative score
decrease (Michel et al.,2019) of faithfulness after
attacks by:
d=Ef(x, y)−Ef(x0, y0)
Ef(x, y)(1)
We calculate the decrease of informativeness sim-
ilarly. We also report the entailment score of
x0
towards
x
:
S(x, x0)
to check whether the modifica-
tion changes the meaning.