Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico

2025-05-08 0 0 247.67KB 7 页 10玖币
侵权投诉
Improving Retrieval Augmented Neural Machine Translation by
Controlling Source and Fuzzy-Match Interactions
Cuong Hoang
, Devendra Sachan
, Prashant Mathur, Brian Thompson, Marcello Federico
AWS AI Labs
pramathu@amazon.com
Abstract
We explore zero-shot adaptation, where a
general-domain model has access to customer
or domain specific parallel data at inference
time, but not during training. We build on
the idea of Retrieval Augmented Translation
(RAT) where top-k in-domain fuzzy matches
are found for the source sentence, and target-
language translations of those fuzzy-matched
sentences are provided to the translation model
at inference time. We propose a novel architec-
ture to control interactions between a source
sentence and the top-k fuzzy target-language
matches, and compare it to architectures from
prior work. We conduct experiments in two
language pairs (En-De and En-Fr) by training
models on WMT data and testing them with
five and seven multi-domain datasets, respec-
tively. Our approach consistently outperforms
the alternative architectures, improving BLEU
across language pair, domain, and number kof
fuzzy matches.
1 Introduction
Domain adaptation techniques such as fine-tuning
(Freitag and Al-Onaizan,2016;Luong and Man-
ning,2015) are highly effective at increasing in-
domain performance of neural machine translation
(NMT) systems, but are impractical in many re-
alistic settings. For example, consider a single
machine serving translations to thousands of cus-
tomers, each with a private Translation Memory
(TM). In this case, adapting, storing and loading
large adapted models for each customer is compu-
tationally infeasible. In this paper we thus consider
zero-shot adaptation instead, with a single general-
domain model trained from heterogeneous sources
that has access to the customer or domain specific
TM only at inference time.
Our work builds on Retrieval Augmented Trans-
lation (RAT) (Li et al.,2022;Bulte and Tezcan,
Work done while the authors were at AWS AI Labs.
2019;Xu et al.,2020;He et al.,2021;Cai et al.,
2021), a paradigm which combines a translation
model (Vaswani et al.,2017) with an external re-
triever module that retrieves the top-
k
most similar
source sentences from a TM (i.e. "fuzzy matches")
(Farajian et al.,2017;Gu et al.,2017;Bulte and
Tezcan,2019). The encoder encodes the input sen-
tence along with the translations of the top-
k
fuzzy-
matches and passes the resulting encodings to the
decoder.
Prior RAT methods for NMT have fallen into
two camps: Early work (Bulte and Tezcan,2019;
Zhang et al.,2018) concatenated the source sen-
tence and the top-k fuzzy matches before encoding,
relying on the encoder’s self-attention to compare
the source sentence to each target sentences and
determine which target phrases are relevant for the
translation. More recent work (He et al.,2021;Cai
et al.,2021) has opted to encode the source sen-
tences and the top-k fuzzy matches independently,
effectively shifting the entire burden of determin-
ing which target phrases are relevant to the decoder.
We hypothesize that neither approach is ideal: In
the first, the encoder has access to the information
that we expect to be important (namely, the source
and the fuzzy matches), but the self-attention also
has potentially confusing/spurious connections. In
the second, the encoder lacks the self-attention con-
nections between the source and the fuzzy matches.
To address these issues, we propose a novel ar-
chitecture which has self-attention connections be-
tween the source sentence and each fuzzy-match,
but not between fuzzy-matches. We denote this
method
RAT with Selective Interactions (RAT-
SI)
. Our method is illustrated in Figure 1, along
with two previously discussed approaches.
Experiments in five English-German (En-De)
domain-specific test sets (Aharoni and Goldberg,
2020) and seven English-French (En-Fr) domain
specific test sets (Pham et al.,2021), for
k=
{3,4,5}
, demonstrate that our proposed method
arXiv:2210.05047v1 [cs.CL] 10 Oct 2022
Figure 1: Architectures for retrieval augmented NMT. Left: Plain transformer ingesting source and retrieved fuzzy
matches concatenated with a separator symbol (Bulte and Tezcan,2019), denoted herein as RAT-CAT. Center:
Transformer with dual encoder, one for encoding the source and one for encoding each retrieved fuzzy-matches,
inspired by He et al. (2021), denoted herein as RAT-SEP. Right: Transformer separately encoding the source and
each source + fuzzy-match pair (this work), denoted herein as RAT-SI.
outperforms both prior approaches in 32 out of
36 cases considered. The proposed method out-
performs the closest competitor by +0.82 to +1.75
BLEU for En-De and +1.57 to +1.93 for En-Fr.
2 Method
To isolate the effects of the underlying modeling
strategy from the various tricks and implementation
details employed in prior papers, we build baseline
models which distill the two primary modeling
strategies used in prior works:
The first concatenates a source sentence with
target-language fuzzy matches and then encodes
the entire sequence, as in Bulte and Tezcan (2019)
and Xu et al. (2020). In this approach, the cross-
attention of the encoder must learn to find the rel-
evant parts of target-language fuzzy-matches by
comparing each fuzzy-match to the source sen-
tence, while ignoring potential spurious fuzzy-
match to fuzzy-match interactions (see the left di-
agram in Figure 1). We denote this method
RAT-
CAT.
The second encodes the source and each target-
language fuzzy-match separately (with two distinct
encoders), and instead concatenates the encoded
representations, inspired by He et al. (2021) and
Cai et al. (2021). In this approach, the spurious
connections between the target-language fuzzy-
matches are eliminated, but the connections be-
tween the source and each fuzzy-match are also
eliminated, forcing the attention in the decoder to
find the relevant portions in the fuzzy-match that
are relevant to the source (see the center diagram
in Figure 1). We denote this method RAT-SEP.
Finally, we propose a third method which at-
tempts to build on the strengths of each of the prior
methods. As in RAT-SEP, our method separately
encodes (with the same encoder) the source and
each target-language fuzzy-match; however, each
fuzzy-match is jointly encoded with a copy of the
source, as in RAT-CAT, allowing the encoder to
find portions of the fuzzy-match which are relevant
to the input. Finally, all the encoded inputs are
concatenated and exposed to the decoder; How-
ever, the encoding of the source is only provided
to the encoder once, to avoid potentially spurious
interactions between copies of the input (see the
right diagram in Figure 1). We denote our proposed
method RAT-SI.
3 Experimental Setup
Our experiments are in two language directions:
English-German (En-De) and English-French (En-
Fr). We train models using the public WMT
2014 (Bojar et al.,2014) data set, with
4.5M
En-
De sentences and 36MEn-Fr sentences.
During training, the model sees target-language
fuzzy-match sentences from the same dataset it is
being trained on (i.e. WMT14), but at inference,
models must perform zero-shot adaptation to five
En-De domain-specialized TMs
1
and seven En-Fr
domain-specialized TMs.
2
En-De data is taken
from Aharoni and Goldberg (2020), which is a
re-split version of the multi-domain data set from
1Medical,Law,IT,Religion and Subtitles.
2News,Medical,Bank,Law,IT,TED and Religion.
摘要:

ImprovingRetrievalAugmentedNeuralMachineTranslationbyControllingSourceandFuzzy-MatchInteractionsCuongHoang,DevendraSachan,PrashantMathur,BrianThompson,MarcelloFedericoAWSAILabspramathu@amazon.comAbstractWeexplorezero-shotadaptation,whereageneral-domainmodelhasaccesstocustomerordomainspecicparalle...

展开>> 收起<<
Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions Cuong Hoang Devendra Sachan Prashant Mathur Brian Thompson Marcello Federico.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:247.67KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注