
tions and simply copy the tokens in them. By shuf-
fling the retrieved results, we ensure suggestions to
be less similar to the input and train the system to
be more robust to less relevant suggestions at test
time. Our experiment results show that the model
trained with shuffling of suggestions outperforms
the standard RAT model by up to
+5.8
BLEU score
on average when suggestions come from less rele-
vant TMs while dropping
−0.15
BLEU on average
when suggestions come from relevant TMs.
To the best of our knowledge, this is the first
work to consider the robustness of RAT methods,
which we believe is critical for acceptance by hu-
man translators.
2 Related Work
RAT is a form of domain adaptation, which is often
achieved via continued training in NMT (Freitag
and Al-Onaizan,2016;Luong and Manning,2015).
However, RAT differs from standard domain adap-
tation techniques like continued training in that it
is online, that is, the model is not adapted during
training and instead domain adaptation occurs at
inference time. This makes RAT better suited for
some real-world applications e.g. a single server
with a single model loaded in memory can serve
hundreds or thousands of users with custom trans-
lations, adapted to their unique TM. Other works
have considered online adaptation outside the con-
text of RAT, including Vilar (2018), who propose
Learning Hidden Unit Contributions (Swietojan-
ski et al.,2016) as a compact way to store many
adaptations of the same general-domain model.
Previous works in retrieval augmented transla-
tion have mainly explored aspects of filtering fuzzy-
matches by applying similarity thresholds (Xia
et al.,2019;Xu et al.,2020), leveraging word align-
ment information (Zhang et al.,2018;Xu et al.,
2020;He et al.,2021) or re-ranking with additional
score (e.g. word overlapping) (Gu et al.,2018;
Zhang et al.,2018). Our approach do not make
use of any filtering and as such do not require any
ad-hoc optimization. Our work is related to the
use of
k
-nearest-neighbor for NMT (Khandelwal
et al.,2021;Zheng et al.,2021) but it is less expen-
sive and does not require storage and search over
a large data store of context representations and
corresponding target tokens (Meng et al.,2021).
Our work also relates to work in offline adapta-
tion which has addressed catastrophic forgetting
of general domain knowledge during domain adap-
tation (Thompson et al.,2019) via ensembling in-
domain and out-of-domain models (Freitag and
Al-Onaizan,2016), mixing of in-domain and out-
of-domain data during adaptation (Chu et al.,2017),
multi-objective learning and multi-output learning
(Dakwale and Monz,2017), and elastic weight
consolidation (Kirkpatrick et al.,2017;Thompson
et al.,2019;Saunders et al.,2019), or combinations
of these techniques (Hasler et al.,2021).
3 RAT with Shuffling
3.1 Retrieval Module
We use Okapi BM25 (Robertson and Zaragoza,
2009), a classical retrieval algorithm that performs
search by computing lexical matches of the query
with all sentences in the evidence, to obtain top-
ranked sentences for each input.
1
Specifically,
we built an index using source sentences of a
TM. For every input, we collect top-
k
(i.e.
k=
{1,2,3,4,5}
in our experiments) similar source
side sentences and then use their corresponding
target side sentences as fuzzy-matches.
3.2 Shuffling suggestions
We propose to relax the use of top-
k
relevant sug-
gestions during training by training RAT model
with
k
fuzzy-matches randomly sampled from a
larger list. In our experiments, we sample
k
from
the top-
10
matches, where top-
10
is chosen based
on our preliminary experiments.
By shuffling retrieval fuzzy-matches, we ensure
suggestions to be less similar to the target reference.
With that, we expect the model learns to be more
selective in using the suggestions for translation
and thus be more robust to less relevant suggestions
at test time. In fact, training models with noisy data
has been shown to improve model’s robustness to
irrelevant data (Belinkov and Bisk,2018).
4 Data, Models & Experiments
4.1 Data
We conduct experiments in two language direc-
tions: En-De with five domain-specialized TMs
2
and En-Fr with seven domain-specialized TMs.
3
1
To enable fast retrieval, we leverage the imple-
mentation provided by the ElasticSearch library, which
can be found at
https://github.com/elastic/
elasticsearch-py.
2Medical,Law,IT,Religion and Subtitles.
3News
,
Medical
,
Bank
,
Law
,
IT
,
TED
and
Religion.