Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico

2025-05-08 1 0 247.67KB 7 页 10玖币

侵权投诉

Improving Retrieval Augmented Neural Machine Translation by

Controlling Source and Fuzzy-Match Interactions

Cuong Hoang∗

, Devendra Sachan∗

, Prashant Mathur, Brian Thompson, Marcello Federico

AWS AI Labs

pramathu@amazon.com

Abstract

We explore zero-shot adaptation, where a

general-domain model has access to customer

or domain speciﬁc parallel data at inference

time, but not during training. We build on

the idea of Retrieval Augmented Translation

(RAT) where top-k in-domain fuzzy matches

are found for the source sentence, and target-

language translations of those fuzzy-matched

sentences are provided to the translation model

at inference time. We propose a novel architec-

ture to control interactions between a source

sentence and the top-k fuzzy target-language

matches, and compare it to architectures from

prior work. We conduct experiments in two

language pairs (En-De and En-Fr) by training

models on WMT data and testing them with

ﬁve and seven multi-domain datasets, respec-

tively. Our approach consistently outperforms

the alternative architectures, improving BLEU

across language pair, domain, and number kof

fuzzy matches.

1 Introduction

Domain adaptation techniques such as ﬁne-tuning

(Freitag and Al-Onaizan,2016;Luong and Man-

ning,2015) are highly effective at increasing in-

domain performance of neural machine translation

(NMT) systems, but are impractical in many re-

alistic settings. For example, consider a single

machine serving translations to thousands of cus-

tomers, each with a private Translation Memory

(TM). In this case, adapting, storing and loading

large adapted models for each customer is compu-

tationally infeasible. In this paper we thus consider

zero-shot adaptation instead, with a single general-

domain model trained from heterogeneous sources

that has access to the customer or domain speciﬁc

TM only at inference time.

Our work builds on Retrieval Augmented Trans-

lation (RAT) (Li et al.,2022;Bulte and Tezcan,

∗Work done while the authors were at AWS AI Labs.

2019;Xu et al.,2020;He et al.,2021;Cai et al.,

2021), a paradigm which combines a translation

model (Vaswani et al.,2017) with an external re-

triever module that retrieves the top-

most similar

source sentences from a TM (i.e. "fuzzy matches")

(Farajian et al.,2017;Gu et al.,2017;Bulte and

Tezcan,2019). The encoder encodes the input sen-

tence along with the translations of the top-

fuzzy-

matches and passes the resulting encodings to the

decoder.

Prior RAT methods for NMT have fallen into

two camps: Early work (Bulte and Tezcan,2019;

Zhang et al.,2018) concatenated the source sen-

tence and the top-k fuzzy matches before encoding,

relying on the encoder’s self-attention to compare

the source sentence to each target sentences and

determine which target phrases are relevant for the

translation. More recent work (He et al.,2021;Cai

et al.,2021) has opted to encode the source sen-

tences and the top-k fuzzy matches independently,

effectively shifting the entire burden of determin-

ing which target phrases are relevant to the decoder.

We hypothesize that neither approach is ideal: In

the ﬁrst, the encoder has access to the information

that we expect to be important (namely, the source

and the fuzzy matches), but the self-attention also

has potentially confusing/spurious connections. In

the second, the encoder lacks the self-attention con-

nections between the source and the fuzzy matches.

To address these issues, we propose a novel ar-

chitecture which has self-attention connections be-

tween the source sentence and each fuzzy-match,

but not between fuzzy-matches. We denote this

method

RAT with Selective Interactions (RAT-

SI)

. Our method is illustrated in Figure 1, along

with two previously discussed approaches.

Experiments in ﬁve English-German (En-De)

domain-speciﬁc test sets (Aharoni and Goldberg,

2020) and seven English-French (En-Fr) domain

speciﬁc test sets (Pham et al.,2021), for

{3,4,5}

, demonstrate that our proposed method

arXiv:2210.05047v1 [cs.CL] 10 Oct 2022

Figure 1: Architectures for retrieval augmented NMT. Left: Plain transformer ingesting source and retrieved fuzzy

matches concatenated with a separator symbol (Bulte and Tezcan,2019), denoted herein as RAT-CAT. Center:

Transformer with dual encoder, one for encoding the source and one for encoding each retrieved fuzzy-matches,

inspired by He et al. (2021), denoted herein as RAT-SEP. Right: Transformer separately encoding the source and

each source + fuzzy-match pair (this work), denoted herein as RAT-SI.

outperforms both prior approaches in 32 out of

36 cases considered. The proposed method out-

performs the closest competitor by +0.82 to +1.75

BLEU for En-De and +1.57 to +1.93 for En-Fr.

2 Method

To isolate the effects of the underlying modeling

strategy from the various tricks and implementation

details employed in prior papers, we build baseline

models which distill the two primary modeling

strategies used in prior works:

The ﬁrst concatenates a source sentence with

target-language fuzzy matches and then encodes

the entire sequence, as in Bulte and Tezcan (2019)

and Xu et al. (2020). In this approach, the cross-

attention of the encoder must learn to ﬁnd the rel-

evant parts of target-language fuzzy-matches by

comparing each fuzzy-match to the source sen-

tence, while ignoring potential spurious fuzzy-

match to fuzzy-match interactions (see the left di-

agram in Figure 1). We denote this method

RAT-

CAT.

The second encodes the source and each target-

language fuzzy-match separately (with two distinct

encoders), and instead concatenates the encoded

representations, inspired by He et al. (2021) and

Cai et al. (2021). In this approach, the spurious

connections between the target-language fuzzy-

matches are eliminated, but the connections be-

tween the source and each fuzzy-match are also

eliminated, forcing the attention in the decoder to

ﬁnd the relevant portions in the fuzzy-match that

are relevant to the source (see the center diagram

in Figure 1). We denote this method RAT-SEP.

Finally, we propose a third method which at-

tempts to build on the strengths of each of the prior

methods. As in RAT-SEP, our method separately

encodes (with the same encoder) the source and

each target-language fuzzy-match; however, each

fuzzy-match is jointly encoded with a copy of the

source, as in RAT-CAT, allowing the encoder to

ﬁnd portions of the fuzzy-match which are relevant

to the input. Finally, all the encoded inputs are

concatenated and exposed to the decoder; How-

ever, the encoding of the source is only provided

to the encoder once, to avoid potentially spurious

interactions between copies of the input (see the

right diagram in Figure 1). We denote our proposed

method RAT-SI.

3 Experimental Setup

Our experiments are in two language directions:

English-German (En-De) and English-French (En-

Fr). We train models using the public WMT

2014 (Bojar et al.,2014) data set, with

4.5M

En-

De sentences and 36MEn-Fr sentences.

During training, the model sees target-language

fuzzy-match sentences from the same dataset it is

being trained on (i.e. WMT14), but at inference,

models must perform zero-shot adaptation to ﬁve

En-De domain-specialized TMs

and seven En-Fr

domain-specialized TMs.

En-De data is taken

from Aharoni and Goldberg (2020), which is a

re-split version of the multi-domain data set from

1Medical,Law,IT,Religion and Subtitles.

2News,Medical,Bank,Law,IT,TED and Religion.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingRetrievalAugmentedNeuralMachineTranslationbyControllingSourceandFuzzy-MatchInteractionsCuongHoang,DevendraSachan,PrashantMathur,BrianThompson,MarcelloFedericoAWSAILabspramathu@amazon.comAbstractWeexplorezero-shotadaptation,whereageneral-domainmodelhasaccesstocustomerordomainspecicparalle...

展开>> 收起<<

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: