Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation Thuy-Trang Vuand Shahram Khadiviy

2025-04-30 0 0 1.6MB 16 页 10玖币
侵权投诉
Can Domains Be Transferred Across Languages in Multi-Domain
Multilingual Neural Machine Translation?
Thuy-Trang Vuand Shahram Khadivi
Xuanli Heand Dinh Phungand Gholamreza Haffari
Department of Data Science and AI, Monash University, Australia
eBay Inc.
{trang.vu1,xuanli.he1,first.last}@monash.edu
skhadivi@ebay.com
Abstract
Previous works mostly focus on either multi-
lingual or multi-domain aspects of neural ma-
chine translation (NMT). This paper investi-
gates whether the domain information can be
transferred across languages on the composi-
tion of multi-domain and multilingual NMT,
particularly for the incomplete data condition
where in-domain bitext is missing for some
language pairs. Our results in the curated
leave-one-domain-out experiments show that
multi-domain multilingual (MDML) NMT can
boost zero-shot translation performance up to
+10 gains on BLEU, as well as aid the gen-
eralisation of multi-domain NMT to the miss-
ing domain. We also explore strategies for ef-
fective integration of multilingual and multi-
domain NMT, including language and domain
tag combination and auxiliary task training.
We find that learning domain-aware represen-
tations and adding target-language tags to the
encoder leads to effective MDML-NMT.
1 Introduction
Multilingual NMT (MNMT), which enables a sin-
gle model to support translation across multiple
directions, has attracted a lot of interest both in
the research community and industry. The gap be-
tween MNMT and bilingual counterparts has been
reduced significantly, and even for some settings,
it has been shown to surpass bilingual NMT (Tran
et al.,2021). MNMT enables knowledge sharing
among languages, and reduces model training, de-
ployment, and maintenance costs. On the other
hand, multi-domain NMT aims to build robust
NMT models, providing high-quality translation
on diverse domains. While multilingual and multi-
domain NMT are highly appealing in practice, they
are often studied separately.
To accommodate the domain aspect, previous
MNMT works focus on learning a domain-specific
Work done while doing internship at eBay Inc.
Figure 1: An example of the multi-domain multilingual
incomplete data condition (best seen in colours). (a)
The colour indicates the availability of bitext in the cor-
responding domain for each language. (b) Domain and
language-pair matrix for the data condition in (a).
MNMT by finetuning a general NMT model on the
domain of interest (Tran et al.,2021;Bérard et al.,
2020). Recently, Cooper Stickland et al. (2021) pro-
pose to unify multilingual and multi-domain NMT
into a holistic system by stacking language-specific
and domain-specific adapters with a two-phase
training process. Thanks to the plug-and-play abil-
ity of adapters, their system can handle translation
across multiple languages and support multiple do-
mains. However, as each domain adapter is learned
independently, their adapter-based model lacks the
ability of effective knowledge sharing among do-
mains.
In this paper, we take a step further toward uni-
fying multilingual and multi-domain NMT into a
single setting and model, i.e., multi-domain multi-
lingual NMT (MDML-NMT), and enable effective
knowledge sharing across both domains and lan-
guages. Unlike the complete data assumption in the
multi-domain single language-pair setting where
training data is available in all domains, we assume
the existence of bitext in all domains for only a sub-
set of language-pairs, as illustrated in Figure 1(a).
In fact, it is highly improbable to obtain in-domain
bitext for all domains and all language pairs in
arXiv:2210.11628v1 [cs.CL] 20 Oct 2022
many real-life settings. Depending on the avail-
ability of parallel data, we categorise a translation
task from a source to a target language into four
categories based on the following dimensions:
in-domain/out-of-domain, wrt to the domain
of interest, and
seen/unseen, wrt to the translation direction
during training.
Please note the domain and language-pair ma-
trix in Figure 1(b). In this figure, parallel data
available in the training set specifies the group
A, the in-domain seen tasks. Given this training
dataset, most MNMT research focuses on cross-
lingual transfer to in-domain unseen translation
tasks (A
C), while the studies on multi-domain
NMT and domain adaptation seek to generalise to
out-of-domain seen translation tasks (A
B). In-
tegrating domain and language aspects in the in-
complete data condition gives rise to an interesting
and more challenging setting that transfers to out-
of-domain unseen translation tasks (A
D). We
hypothesise that the out-of-domain “seen and un-
seen” translation tasks (A
B+D) can benefit from
the in-domain translation tasks if there exists the
domain transfer across languages in MDML-NMT.
Specifically, we ask the following research ques-
tions: (1) Do out-of-domain translation tasks bene-
fit from the out-of-domain and in-domain bitext in
other seen translation pairs? and (2) What is effec-
tive method to handle the composition of domains
and languages? Furthermore, beyond the cross-
lingual transfer (A
C) and the out-of-domain gen-
eralisation (A
B), we also consider the challeng-
ing setting where the translation direction of inter-
est may not have any bitext in any domain, i.e. the
zero-shot setting (AD).
In general, we can vary the degree of domain
transfer based on the number of domains in which
parallel data for a translation task is available. Com-
bining with the number of language pairs of inter-
est, there are large numbers of incomplete data
conditions, even for our toy examples in Figure 1.
In this study, we assume the highest degree of do-
main transfer and carefully design controlled ex-
periments where one domain is left out for some
language pairs (Table 1). We then examine the
potential of MDML-NMT on this incomplete data
condition. We also explore training strategies for
effective integration of multi-domain and multi-
lingual NMT, mainly on (i) how to combine the
LAW IT KORAN MED SUB
En-Fr 4 4 4 4 4
En-De 4 4 4 4 4
De-Fr 4 4 4 4 4
En-Cs 74 4 4 4
En-Pl 74 4 4 4
Table 1: Illustration of leave-one-out LAW experiment
setting. 7,4describes whether there is bitext in the
corresponding domain for the given language pairs.
language and domain tags, and (ii) using auxiliary
task training to learn effective representations. Our
contributions are as follows:
We investigate effective strategies to jointly
learn multi-domain and multilingual NMT
models under the incomplete data condition.
Our empirical results show that MDML-NMT
model can improve translation quality in
the zero-shot directions by mitigating the
off-target translation
issue that an MNMT
model translates the input sentence to a wrong
target language. Additionally, MDML-NMT
exhibits domain transfer ability by achiev-
ing up to +4 BLEU improvement over the
multi-domain NMT on the translation direc-
tion where in-domain training data is ab-
sent. Thanks to the effective cross-domain
and cross-lingual knowledge sharing, MDML-
NMT outperforms the adapter-based method
(Cooper Stickland et al.,2021) by a large mar-
gin in the language-domain zero-shot setting.
Our study sheds light on effective MDML-
NMT training. Our experimental results re-
veal that: (i) for the domain, it is important
to make the encoder domain-aware by either
providing the domain tags or training with the
auxiliary task; and (ii) for the language, the
best practice is to prepend the target language
tag to the encoder.
2 Multi-domain Multilingual NMT
In this section, we first provide the necessary back-
ground on multilingual NMT (MNMT) and multi-
domain NMT individually. We then describe ef-
fective modelling approaches for the integration
of multi-domain and multilingual NMT (MDML-
NMT).
2.1 Multilingual NMT
Given a set of languages
L
, the primary goal of
MNMT is to learn a single NMT model that can
handle all translation directions of interest in this
set of languages (Dabre et al.,2020). According to
the parameter sharing strategy, MNMT can be cat-
egorised into: 1) partial parameter sharing (Dong
et al.,2015;Firat et al.,2016;Zhang et al.,2021),
and 2) full parameter sharing (Ha et al.,2016;
Johnson et al.,2017). The latter has been widely
adopted because of its simplicity, lightweight, and
its zero-shot capability. Thus, we adopt the full
parameter sharing strategy in our work.
In the fully parameter-shared MNMT, all pa-
rameters of encoders, decoders and attentions are
shared across tasks. Special language tags are in-
troduced to indicate the target languages. One can
prepend the target language tags to either the source
or target sentences. The model is then trained
jointly to minimise the negative log-likelihood
across all training instances:
LML(θ
θ
θ) := X
(s,t)T
X
(x
x
x,y
y
y)∈Cs,t
log P(y
y
y|x
x
x;θ
θ
θ)(1)
where
θ
θ
θ
is model parameters,
Cs,t
denotes a bilin-
gual corpus for the source language
s
and the target
language
t
,
(x
x
x, y
y
y)
is a pair of parallel sentences in
the source and target language, and
T
denotes the
translation tasks for which we have bitext available.
Among all possible language pairs
(s, t)L×L
,
we often only have access to bilingual data for a
subset of them. We denote these pairs as seen (ob-
served) translation tasks, and the rest as unseen
tasks corresponding to the zero-shot setting.
2.2 Multi-domain NMT
Multi-domain NMT aims to handle translation
tasks across multiple domains for a given language
pair. Similar to MNMT, tagging the training corpus
is the most popular approach, where a tag indicates
the domain of a sentence pair. We also minimise
the negative log-likelihood across all domains to
train the model:
LMD(θ
θ
θ) := X
dD
X
(x
x
x,y
y
y)∈Cd
s,t
log P(y
y
y|x
x
x;θ
θ
θ)(2)
where
D
is the set of domains, and
Cd
s,t
denotes
the parallel bitext in the source language
s
, target
language t, and the domain d.
Apart from tagging, some auxiliary tasks have
also been incorporated into the training process. A
common practice is the use of domain discrimina-
tion, which aims to force the encoder to capture
domain-aware characteristics (Britz et al.,2017).
For this purpose, a domain discriminator is added
to the NMT model at training time. The input to the
discriminator is the encoder output, and its output
predicts the probability of the domain of the source
sentence. The discriminator is jointly trained with
the NMT model, and is discarded at inference time.
Let
h=enc(x
x
x)
be the representation of sen-
tence
x
x
x
computed by the mean-pooling over the
hidden states of the top layer of the encoder. The
training objective for the domain-aware encoder is
as follows:
Ldisc(θ
θ
θ, ψ
ψ
ψ) := X
dD
X
(x
x
x,y
y
y)∈Cd
s,t
log Pr(d|h;ψ
ψ
ψ)(3)
LMD-aware(θ
θ
θ, ψ
ψ
ψ) := LMD(θ
θ
θ) + λLdisc(θ
θ
θ, ψ
ψ
ψ)(4)
where
ψ
ψ
ψ
is the parameter of the domain discrimina-
tor classifier, and
λ
controls the contribution of the
domain discriminator into the training objective of
the multi-domain NMT model.
Alternatively, one can design an adversarial train-
ing objective in order to learn domain-agnostic rep-
resentations by the encoder. This is achieved by
inserting a gradient reversal layer (Ganin and Lem-
pitsky,2015) between the encoder and the domain
discriminator. The gradient reversal layer behaves
as an identity layer in the forward pass but reverses
the gradient sign during back-propagation. It has
the opposite effect on the encoder, forcing it to
learn domain-agnostic representations. This en-
courages the domain specific characteristic to be
learned mainly by the decoder of the NMT model.
2.3 Composition of Domains and Languages
In this paper, we explore strategies for composing
multi-domain and multilingual NMT. We consider
the incomplete multi-domain multilingual data con-
dition where in-domain data may be only available
in a subset of language pairs. For example, Ta-
ble 1shows one of the data conditions explored in
our experiments in Section 3. Given the five lan-
guage pairs and five domains, we assume that the
domain data in some language pairs are missing.
Our goal is to investigate effective techniques to
train a high-quality MDML-NMT model covering
all combinations of domains and language pairs.
Given a specific domain, we define in-domain
languages as those having data available in the
domain as part of some bilingual corpora; the rest
Figure 2: Illustration of domain and languages composition strategies: (a) prepending domain (D) and target
language (T) tag to encoder (ENC) or decoder (DEC). This example shows a T-ENC D-DEC model where the
target language tag and domain tag are added to encoder and decoder respectively; (b) combining the tagging
method with the domain aware auxiliary task (MDML + aware) to learn domain-aware representation; and (c)
combining the tagging method with the domain adversarial auxiliary task (MDML + adv) to learn domain-agnostic
representation.
Trans. direction Eval. domain MDML task type
EnDe LAW seen inin
EnCs LAW seen inout
PlEn LAW seen outin
DeCs LAW unseen (zero-shot) inout
CsDe LAW unseen (zero-shot) outin
PlCs LAW unseen (zero-shot) outout
Table 2: Examples of MDML task types in the leave-
one-domain-out LAW training scenario of Table 1.
Please refer to Table 1for the in/out and seen/unseen
settings.
of the languages are referred to as out-of-domain
languages. We consider all combinations of in-
domain/out-of-domain source/target languages for
both seen and unseen translation directions (see
examples in Table 2) in Section 3.
We investigate different combinations of the tag-
ging strategy and auxiliary task training to effec-
tively train MDML-NMT models, as shown in Fig-
ure 2.
Language and Domain Tags.
We explore differ-
ent ways of injecting the target language tags and
domain tags into the translation process. Following
the standard convention, we explore inserting the
target language tag at the beginning of either the
source sentence or the translation. Furthermore, the
domain tag can also be added to either the source
or the target side.
Auxiliary Task Training.
We investigate the ef-
fect of encoder-based auxiliary tasks on MDML-
NMT. As described in Section 2.2, we consider
two types of auxiliary objectives to train encoder
which are domain-aware or domain-agnostic. The
former aims to amplify the domain-related features,
while the latter focuses on the domain invariant
representation in the encoder.
3 Experiments
In this section, we evaluate the MDML-NMT ap-
proaches and seek to answer the following research
questions (RQs):
RQ1
:Do out-of-domain translation tasks ben-
efit from the out-of-domain and in-domain bi-
text in other translation pairs?
We explore the benefits of having a single
MDML model trained on all available train-
ing data from multiple languages and domains
over the multi-domain bilingual (MDBL) and
摘要:

CanDomainsBeTransferredAcrossLanguagesinMulti-DomainMultilingualNeuralMachineTranslation?Thuy-TrangVu}andShahramKhadiviyXuanliHe}andDinhPhung}andGholamrezaHaffari}}DepartmentofDataScienceandAI,MonashUniversity,AustraliayeBayInc.{trang.vu1,xuanli.he1,first.last}@monash.eduskhadivi@ebay.comAbstractPr...

展开>> 收起<<
Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation Thuy-Trang Vuand Shahram Khadiviy.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.6MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注