Improving Robustness of Retrieval Augmented Translation via Shufﬂing of Suggestions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico

2025-05-08 0 0 230.85KB 8 页 10玖币

侵权投诉

Improving Robustness of Retrieval Augmented Translation via

Shufﬂing of Suggestions

Cuong Hoang∗

, Devendra Sachan∗

, Prashant Mathur, Brian Thompson, Marcello Federico

AWS AI Labs

pramathu@amazon.com

Abstract

Several recent studies have reported dramatic

performance improvements in neural machine

translation (NMT) by augmenting translation

at inference time with fuzzy-matches retrieved

from a translation memory (TM). However,

these studies all operate under the assumption

that the TMs available at test time are highly

relevant to the testset. We demonstrate that for

existing retrieval augmented translation meth-

ods, using a TM with a domain mismatch to

the test set can result in substantially worse per-

formance compared to not using a TM at all.

We propose a simple method to expose fuzzy-

match NMT systems during training and show

that it results in a system that is much more tol-

erant (regaining up to 5.8BLEU) to inference

with TMs with domain mismatch. Also, the

model is still competitive to the baseline when

fed with suggestions from relevant TMs.

1 Introduction

Retrieval Augmented Translation (RAT) refers

to a paradigm which combines a translation

model (Vaswani et al.,2017) with an external re-

triever module (Li et al.,2022). The retrieval mod-

ule (e.g. BM25 ranker (Robertson and Zaragoza,

2009) or a neural retriever (Cai et al.,2021;Sachan

et al.,2021)) takes each source sentence as input

and retrieves the top-

most similar target transla-

tions from a Translation Memory (TM) (Farajian

et al.,2017;Gu et al.,2017;Bulte and Tezcan,

2019). The translation module then encodes the

input along with the top-

fuzzy-matches, either

by appending the suggestions to the input (Bulte

and Tezcan,2019;Xu et al.,2020) or using sepa-

rate encoders for input and suggestions (He et al.,

2021;Cai et al.,2021). Decoder then either learns

to copy or carry over the stylistics from the sug-

gestions while generating the translation. In this

work, we focus only on the translation module of

this paradigm.

∗Work done while the authors were at AWS AI Labs.

In existing literature, inference with RAT mod-

els have typically assumed that TMs are domain-

matched i.e. the test set is of the same domain as

the translation memory. Many works (e.g. Bulte

and Tezcan (2019), Xu et al. (2020) and Cai et al.

(2021)) have reported dramatic performance im-

provements under such setting. However, it is

not clear how the models perform when there is a

domain-mismatch between the TM and the test set.

In this work, we focus on the problem where the

assumption of TM being domain-matched with test

set does not hold. We explore the conditions where

models are provided suggestions from a TM, but

not from the same domain as the test set. We show

that RAT models suffer performance drop if fed

with suggestions coming from less relevant TMs.

This ﬁnding is especially important from a us-

ability standpoint. Often a translator may pick a

TM as a best ﬁt for a translation job if an ideal TM

is not available or created just yet e.g.

domain

TM is picked for

Patent

translation job because

the domains are closer and there no available TM

for

Patent

. This could lead to multiple issues

like ambiguous terminologies or multiple mean-

ings across same context (Jalili Sabet et al.,2016;

Barbu et al.,2016)). RAT model leveraging such

(mismatched) TM ends up producing worse quality

translation than a standard MT system. Therefore,

it is desirable that RAT models not only improve

translation with suggestions from relevant TMs,

but also be more robust to suggestions from less

relevant TMs.

To this end, we propose an enhancement to the

training of RAT models with a simple shufﬂing

method to mitigate this problem. Instead of always

using

most-relevant fuzzy-matches in training,

our method randomly samples

from a larger list

(e.g. randomly sample 3 sentences from the top-

matches). Our hypothesis is that if we systemati-

cally provide only top similar suggestions during

training, the model will overly rely on the sugges-

arXiv:2210.05059v1 [cs.CL] 11 Oct 2022

tions and simply copy the tokens in them. By shuf-

ﬂing the retrieved results, we ensure suggestions to

be less similar to the input and train the system to

be more robust to less relevant suggestions at test

time. Our experiment results show that the model

trained with shufﬂing of suggestions outperforms

the standard RAT model by up to

+5.8

BLEU score

on average when suggestions come from less rele-

vant TMs while dropping

−0.15

BLEU on average

when suggestions come from relevant TMs.

To the best of our knowledge, this is the ﬁrst

work to consider the robustness of RAT methods,

which we believe is critical for acceptance by hu-

man translators.

2 Related Work

RAT is a form of domain adaptation, which is often

achieved via continued training in NMT (Freitag

and Al-Onaizan,2016;Luong and Manning,2015).

However, RAT differs from standard domain adap-

tation techniques like continued training in that it

is online, that is, the model is not adapted during

training and instead domain adaptation occurs at

inference time. This makes RAT better suited for

some real-world applications e.g. a single server

with a single model loaded in memory can serve

hundreds or thousands of users with custom trans-

lations, adapted to their unique TM. Other works

have considered online adaptation outside the con-

text of RAT, including Vilar (2018), who propose

Learning Hidden Unit Contributions (Swietojan-

ski et al.,2016) as a compact way to store many

adaptations of the same general-domain model.

Previous works in retrieval augmented transla-

tion have mainly explored aspects of ﬁltering fuzzy-

matches by applying similarity thresholds (Xia

et al.,2019;Xu et al.,2020), leveraging word align-

ment information (Zhang et al.,2018;Xu et al.,

2020;He et al.,2021) or re-ranking with additional

score (e.g. word overlapping) (Gu et al.,2018;

Zhang et al.,2018). Our approach do not make

use of any ﬁltering and as such do not require any

ad-hoc optimization. Our work is related to the

use of

-nearest-neighbor for NMT (Khandelwal

et al.,2021;Zheng et al.,2021) but it is less expen-

sive and does not require storage and search over

a large data store of context representations and

corresponding target tokens (Meng et al.,2021).

Our work also relates to work in ofﬂine adapta-

tion which has addressed catastrophic forgetting

of general domain knowledge during domain adap-

tation (Thompson et al.,2019) via ensembling in-

domain and out-of-domain models (Freitag and

Al-Onaizan,2016), mixing of in-domain and out-

of-domain data during adaptation (Chu et al.,2017),

multi-objective learning and multi-output learning

(Dakwale and Monz,2017), and elastic weight

consolidation (Kirkpatrick et al.,2017;Thompson

et al.,2019;Saunders et al.,2019), or combinations

of these techniques (Hasler et al.,2021).

3 RAT with Shufﬂing

3.1 Retrieval Module

We use Okapi BM25 (Robertson and Zaragoza,

2009), a classical retrieval algorithm that performs

search by computing lexical matches of the query

with all sentences in the evidence, to obtain top-

ranked sentences for each input.

Speciﬁcally,

we built an index using source sentences of a

TM. For every input, we collect top-

(i.e.

{1,2,3,4,5}

in our experiments) similar source

side sentences and then use their corresponding

target side sentences as fuzzy-matches.

3.2 Shufﬂing suggestions

We propose to relax the use of top-

relevant sug-

gestions during training by training RAT model

with

fuzzy-matches randomly sampled from a

larger list. In our experiments, we sample

from

the top-

matches, where top-

is chosen based

on our preliminary experiments.

By shufﬂing retrieval fuzzy-matches, we ensure

suggestions to be less similar to the target reference.

With that, we expect the model learns to be more

selective in using the suggestions for translation

and thus be more robust to less relevant suggestions

at test time. In fact, training models with noisy data

has been shown to improve model’s robustness to

irrelevant data (Belinkov and Bisk,2018).

4 Data, Models & Experiments

4.1 Data

We conduct experiments in two language direc-

tions: En-De with ﬁve domain-specialized TMs

and En-Fr with seven domain-specialized TMs.

To enable fast retrieval, we leverage the imple-

mentation provided by the ElasticSearch library, which

can be found at

https://github.com/elastic/

elasticsearch-py.

2Medical,Law,IT,Religion and Subtitles.

3News

Medical

Bank

Law

TED

and

Religion.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingRobustnessofRetrievalAugmentedTranslationviaShufingofSuggestionsCuongHoang,DevendraSachan,PrashantMathur,BrianThompson,MarcelloFedericoAWSAILabspramathu@amazon.comAbstractSeveralrecentstudieshavereporteddramaticperformanceimprovementsinneuralmachinetranslation(NMT)byaugmentingtranslation...

展开>> 收起<<

Improving Robustness of Retrieval Augmented Translation via Shufﬂing of Suggestions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving Robustness of Retrieval Augmented Translation via Shufﬂing of Suggestions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: