Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico

2025-05-08 0 0 230.85KB 8 页 10玖币
侵权投诉
Improving Robustness of Retrieval Augmented Translation via
Shuffling of Suggestions
Cuong Hoang
, Devendra Sachan
, Prashant Mathur, Brian Thompson, Marcello Federico
AWS AI Labs
pramathu@amazon.com
Abstract
Several recent studies have reported dramatic
performance improvements in neural machine
translation (NMT) by augmenting translation
at inference time with fuzzy-matches retrieved
from a translation memory (TM). However,
these studies all operate under the assumption
that the TMs available at test time are highly
relevant to the testset. We demonstrate that for
existing retrieval augmented translation meth-
ods, using a TM with a domain mismatch to
the test set can result in substantially worse per-
formance compared to not using a TM at all.
We propose a simple method to expose fuzzy-
match NMT systems during training and show
that it results in a system that is much more tol-
erant (regaining up to 5.8BLEU) to inference
with TMs with domain mismatch. Also, the
model is still competitive to the baseline when
fed with suggestions from relevant TMs.
1 Introduction
Retrieval Augmented Translation (RAT) refers
to a paradigm which combines a translation
model (Vaswani et al.,2017) with an external re-
triever module (Li et al.,2022). The retrieval mod-
ule (e.g. BM25 ranker (Robertson and Zaragoza,
2009) or a neural retriever (Cai et al.,2021;Sachan
et al.,2021)) takes each source sentence as input
and retrieves the top-
k
most similar target transla-
tions from a Translation Memory (TM) (Farajian
et al.,2017;Gu et al.,2017;Bulte and Tezcan,
2019). The translation module then encodes the
input along with the top-
k
fuzzy-matches, either
by appending the suggestions to the input (Bulte
and Tezcan,2019;Xu et al.,2020) or using sepa-
rate encoders for input and suggestions (He et al.,
2021;Cai et al.,2021). Decoder then either learns
to copy or carry over the stylistics from the sug-
gestions while generating the translation. In this
work, we focus only on the translation module of
this paradigm.
Work done while the authors were at AWS AI Labs.
In existing literature, inference with RAT mod-
els have typically assumed that TMs are domain-
matched i.e. the test set is of the same domain as
the translation memory. Many works (e.g. Bulte
and Tezcan (2019), Xu et al. (2020) and Cai et al.
(2021)) have reported dramatic performance im-
provements under such setting. However, it is
not clear how the models perform when there is a
domain-mismatch between the TM and the test set.
In this work, we focus on the problem where the
assumption of TM being domain-matched with test
set does not hold. We explore the conditions where
models are provided suggestions from a TM, but
not from the same domain as the test set. We show
that RAT models suffer performance drop if fed
with suggestions coming from less relevant TMs.
This finding is especially important from a us-
ability standpoint. Often a translator may pick a
TM as a best fit for a translation job if an ideal TM
is not available or created just yet e.g.
IT
domain
TM is picked for
Patent
translation job because
the domains are closer and there no available TM
for
Patent
. This could lead to multiple issues
like ambiguous terminologies or multiple mean-
ings across same context (Jalili Sabet et al.,2016;
Barbu et al.,2016)). RAT model leveraging such
(mismatched) TM ends up producing worse quality
translation than a standard MT system. Therefore,
it is desirable that RAT models not only improve
translation with suggestions from relevant TMs,
but also be more robust to suggestions from less
relevant TMs.
To this end, we propose an enhancement to the
training of RAT models with a simple shuffling
method to mitigate this problem. Instead of always
using
k
most-relevant fuzzy-matches in training,
our method randomly samples
k
from a larger list
(e.g. randomly sample 3 sentences from the top-
10
matches). Our hypothesis is that if we systemati-
cally provide only top similar suggestions during
training, the model will overly rely on the sugges-
arXiv:2210.05059v1 [cs.CL] 11 Oct 2022
tions and simply copy the tokens in them. By shuf-
fling the retrieved results, we ensure suggestions to
be less similar to the input and train the system to
be more robust to less relevant suggestions at test
time. Our experiment results show that the model
trained with shuffling of suggestions outperforms
the standard RAT model by up to
+5.8
BLEU score
on average when suggestions come from less rele-
vant TMs while dropping
0.15
BLEU on average
when suggestions come from relevant TMs.
To the best of our knowledge, this is the first
work to consider the robustness of RAT methods,
which we believe is critical for acceptance by hu-
man translators.
2 Related Work
RAT is a form of domain adaptation, which is often
achieved via continued training in NMT (Freitag
and Al-Onaizan,2016;Luong and Manning,2015).
However, RAT differs from standard domain adap-
tation techniques like continued training in that it
is online, that is, the model is not adapted during
training and instead domain adaptation occurs at
inference time. This makes RAT better suited for
some real-world applications e.g. a single server
with a single model loaded in memory can serve
hundreds or thousands of users with custom trans-
lations, adapted to their unique TM. Other works
have considered online adaptation outside the con-
text of RAT, including Vilar (2018), who propose
Learning Hidden Unit Contributions (Swietojan-
ski et al.,2016) as a compact way to store many
adaptations of the same general-domain model.
Previous works in retrieval augmented transla-
tion have mainly explored aspects of filtering fuzzy-
matches by applying similarity thresholds (Xia
et al.,2019;Xu et al.,2020), leveraging word align-
ment information (Zhang et al.,2018;Xu et al.,
2020;He et al.,2021) or re-ranking with additional
score (e.g. word overlapping) (Gu et al.,2018;
Zhang et al.,2018). Our approach do not make
use of any filtering and as such do not require any
ad-hoc optimization. Our work is related to the
use of
k
-nearest-neighbor for NMT (Khandelwal
et al.,2021;Zheng et al.,2021) but it is less expen-
sive and does not require storage and search over
a large data store of context representations and
corresponding target tokens (Meng et al.,2021).
Our work also relates to work in offline adapta-
tion which has addressed catastrophic forgetting
of general domain knowledge during domain adap-
tation (Thompson et al.,2019) via ensembling in-
domain and out-of-domain models (Freitag and
Al-Onaizan,2016), mixing of in-domain and out-
of-domain data during adaptation (Chu et al.,2017),
multi-objective learning and multi-output learning
(Dakwale and Monz,2017), and elastic weight
consolidation (Kirkpatrick et al.,2017;Thompson
et al.,2019;Saunders et al.,2019), or combinations
of these techniques (Hasler et al.,2021).
3 RAT with Shuffling
3.1 Retrieval Module
We use Okapi BM25 (Robertson and Zaragoza,
2009), a classical retrieval algorithm that performs
search by computing lexical matches of the query
with all sentences in the evidence, to obtain top-
ranked sentences for each input.
1
Specifically,
we built an index using source sentences of a
TM. For every input, we collect top-
k
(i.e.
k=
{1,2,3,4,5}
in our experiments) similar source
side sentences and then use their corresponding
target side sentences as fuzzy-matches.
3.2 Shuffling suggestions
We propose to relax the use of top-
k
relevant sug-
gestions during training by training RAT model
with
k
fuzzy-matches randomly sampled from a
larger list. In our experiments, we sample
k
from
the top-
10
matches, where top-
10
is chosen based
on our preliminary experiments.
By shuffling retrieval fuzzy-matches, we ensure
suggestions to be less similar to the target reference.
With that, we expect the model learns to be more
selective in using the suggestions for translation
and thus be more robust to less relevant suggestions
at test time. In fact, training models with noisy data
has been shown to improve model’s robustness to
irrelevant data (Belinkov and Bisk,2018).
4 Data, Models & Experiments
4.1 Data
We conduct experiments in two language direc-
tions: En-De with five domain-specialized TMs
2
and En-Fr with seven domain-specialized TMs.
3
1
To enable fast retrieval, we leverage the imple-
mentation provided by the ElasticSearch library, which
can be found at
https://github.com/elastic/
elasticsearch-py.
2Medical,Law,IT,Religion and Subtitles.
3News
,
Medical
,
Bank
,
Law
,
IT
,
TED
and
Religion.
摘要:

ImprovingRobustnessofRetrievalAugmentedTranslationviaShufingofSuggestionsCuongHoang,DevendraSachan,PrashantMathur,BrianThompson,MarcelloFedericoAWSAILabspramathu@amazon.comAbstractSeveralrecentstudieshavereporteddramaticperformanceimprovementsinneuralmachinetranslation(NMT)byaugmentingtranslation...

展开>> 收起<<
Improving Robustness of Retrieval Augmented Translation via Shuffling of Suggestions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:230.85KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注