Integrating Translation Memories into Non-Autoregressive Machine
Translation
Jitao Xu†Josep Crego‡François Yvon†
†Université Paris-Saclay, CNRS, LISN, 91400, Orsay, France
‡SYSTRAN, 5 rue Feydeau, 75002, Paris, France
{jitao.xu,francois.yvon}@limsi.fr, josep.crego@systrangroup.com
Abstract
Non-autoregressive machine translation (NAT)
has recently made great progress. However,
most works to date have focused on standard
translation tasks, even though some edit-based
NAT models, such as the Levenshtein Trans-
former (LevT), seem well suited to translate
with a Translation Memory (TM). This is the
scenario considered here. We first analyze the
vanilla LevT model and explain why it does
not do well in this setting. We then propose a
new variant, TM-LevT, and show how to effec-
tively train this model. By modifying the data
presentation and introducing an extra deletion
operation, we obtain performance that are on
par with an autoregressive approach, while re-
ducing the decoding load. We also show that
incorporating TMs during training dispenses
to use knowledge distillation, a well-known
trick used to mitigate the multimodality issue.
1 Introduction
Non-autoregressive neural machine translation
(NAT) has been greatly advanced in recent years
(Xiao et al.,2022). NAT takes advantage from par-
allel decoding to generate multiple tokens simulta-
neously and speed up inference. This is often at the
cost of a loss in translation quality when compared
to autoregressive (AR) models (Gu et al.,2018a).
This gap is slowly closing and methods based on
iterative refinement (Ghazvininejad et al.,2019;
Gu et al.,2019;Saharia et al.,2020) and on con-
nectionist temporal classification (Libovický and
Helcl,2018;Gu and Kong,2021) are now reporting
BLEU scores similar to strong AR baselines.
Most works on NAT focus on the standard ma-
chine translation (MT) task, where the decoder
starts from scratch, with the exception of Susanto
et al. (2020); Xu and Carpuat (2021), who use
NAT to integrate lexical constraints in decoding.
However, edit-based NAT models, such as the Lev-
enshtein Transformer (LevT) of Gu et al. (2019),
seem to be a natural candidate to perform MT with
Translation Memories (TM). LevT is able to itera-
tively edit an initial target sequence by performing
insertion and deletion operations until convergence.
This design also matches the concept of using TMs
in MT, where given a source sentence, we aim to
edit a candidate translation retrieved from the TM.
This idea has been used for decades in the lo-
calization industry and implemented into basic
Computer-Aided Translation tools. Translators
wishing to translate a sentence can benefit from
fuzzy matching techniques to retrieve similar seg-
ments from the TM. These segments can then be
revised, thereby improving productivity and consis-
tency of the translation process (Koehn and Senel-
lart,2010;Yamada,2011). The retrieval of similar
examples from a TM has also proved useful in con-
ventional (AR) neural MT systems; they can be
injected into the encoder (Bulte and Tezcan,2019;
Xu et al.,2020) or as priming signals in the decoder
(Pham et al.,2020) to influence the translation pro-
cess. These studies report significant gains in trans-
lation performance in technical domains, where
the translation of terms and phraseology greatly
benefits from examples found in a TM.
Our main focus in this work is to develop an
improved version of LevT suited to the revision
part of TM use, where the translation retrieved
from TM is modified via edit operations in a non-
autoregressive way. We first show that the original
LevT cannot perform well on this task and explain
that this failure is a direct consequence of its train-
ing design. We propose to fix this issue with TM-
LevT, which includes an additional deletion step.
Next, we propose to further improve the training
procedure in two ways: (a) by also including the re-
trieved candidate translation on the source side, as
done in AR TM-based approaches (Bulte and Tez-
can,2019;Xu et al.,2020); (b) by simultaneously
training with empty and non-empty initial target
sentences. In our experiments, TM-LevT achieves
performance that is on par with a strong AR ap-
arXiv:2210.06020v2 [cs.CL] 18 Feb 2023