Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation Thuy-Trang Vuand Shahram Khadiviy

2025-04-30 0 0 1.6MB 16 页 10玖币

侵权投诉

Can Domains Be Transferred Across Languages in Multi-Domain

Multilingual Neural Machine Translation?

Thuy-Trang Vu♦∗and Shahram Khadivi†

Xuanli He♦and Dinh Phung♦and Gholamreza Haffari♦

♦Department of Data Science and AI, Monash University, Australia

†eBay Inc.

{trang.vu1,xuanli.he1,first.last}@monash.edu

skhadivi@ebay.com

Abstract

Previous works mostly focus on either multi-

lingual or multi-domain aspects of neural ma-

chine translation (NMT). This paper investi-

gates whether the domain information can be

transferred across languages on the composi-

tion of multi-domain and multilingual NMT,

particularly for the incomplete data condition

where in-domain bitext is missing for some

language pairs. Our results in the curated

leave-one-domain-out experiments show that

multi-domain multilingual (MDML) NMT can

boost zero-shot translation performance up to

+10 gains on BLEU, as well as aid the gen-

eralisation of multi-domain NMT to the miss-

ing domain. We also explore strategies for ef-

fective integration of multilingual and multi-

domain NMT, including language and domain

tag combination and auxiliary task training.

We ﬁnd that learning domain-aware represen-

tations and adding target-language tags to the

encoder leads to effective MDML-NMT.

1 Introduction

Multilingual NMT (MNMT), which enables a sin-

gle model to support translation across multiple

directions, has attracted a lot of interest both in

the research community and industry. The gap be-

tween MNMT and bilingual counterparts has been

reduced signiﬁcantly, and even for some settings,

it has been shown to surpass bilingual NMT (Tran

et al.,2021). MNMT enables knowledge sharing

among languages, and reduces model training, de-

ployment, and maintenance costs. On the other

hand, multi-domain NMT aims to build robust

NMT models, providing high-quality translation

on diverse domains. While multilingual and multi-

domain NMT are highly appealing in practice, they

are often studied separately.

To accommodate the domain aspect, previous

MNMT works focus on learning a domain-speciﬁc

∗Work done while doing internship at eBay Inc.

Figure 1: An example of the multi-domain multilingual

incomplete data condition (best seen in colours). (a)

The colour indicates the availability of bitext in the cor-

responding domain for each language. (b) Domain and

language-pair matrix for the data condition in (a).

MNMT by ﬁnetuning a general NMT model on the

domain of interest (Tran et al.,2021;Bérard et al.,

2020). Recently, Cooper Stickland et al. (2021) pro-

pose to unify multilingual and multi-domain NMT

into a holistic system by stacking language-speciﬁc

and domain-speciﬁc adapters with a two-phase

training process. Thanks to the plug-and-play abil-

ity of adapters, their system can handle translation

across multiple languages and support multiple do-

mains. However, as each domain adapter is learned

independently, their adapter-based model lacks the

ability of effective knowledge sharing among do-

mains.

In this paper, we take a step further toward uni-

fying multilingual and multi-domain NMT into a

single setting and model, i.e., multi-domain multi-

lingual NMT (MDML-NMT), and enable effective

knowledge sharing across both domains and lan-

guages. Unlike the complete data assumption in the

multi-domain single language-pair setting where

training data is available in all domains, we assume

the existence of bitext in all domains for only a sub-

set of language-pairs, as illustrated in Figure 1(a).

In fact, it is highly improbable to obtain in-domain

bitext for all domains and all language pairs in

arXiv:2210.11628v1 [cs.CL] 20 Oct 2022

many real-life settings. Depending on the avail-

ability of parallel data, we categorise a translation

task from a source to a target language into four

categories based on the following dimensions:

•

in-domain/out-of-domain, wrt to the domain

of interest, and

•

seen/unseen, wrt to the translation direction

during training.

Please note the domain and language-pair ma-

trix in Figure 1(b). In this ﬁgure, parallel data

available in the training set speciﬁes the group

A, the in-domain seen tasks. Given this training

dataset, most MNMT research focuses on cross-

lingual transfer to in-domain unseen translation

tasks (A

→

C), while the studies on multi-domain

NMT and domain adaptation seek to generalise to

out-of-domain seen translation tasks (A

→

B). In-

tegrating domain and language aspects in the in-

complete data condition gives rise to an interesting

and more challenging setting that transfers to out-

of-domain unseen translation tasks (A

→

D). We

hypothesise that the out-of-domain “seen and un-

seen” translation tasks (A

→

B+D) can beneﬁt from

the in-domain translation tasks if there exists the

domain transfer across languages in MDML-NMT.

Speciﬁcally, we ask the following research ques-

tions: (1) Do out-of-domain translation tasks bene-

ﬁt from the out-of-domain and in-domain bitext in

other seen translation pairs? and (2) What is effec-

tive method to handle the composition of domains

and languages? Furthermore, beyond the cross-

lingual transfer (A

→

C) and the out-of-domain gen-

eralisation (A

→

B), we also consider the challeng-

ing setting where the translation direction of inter-

est may not have any bitext in any domain, i.e. the

zero-shot setting (A→D).

In general, we can vary the degree of domain

transfer based on the number of domains in which

parallel data for a translation task is available. Com-

bining with the number of language pairs of inter-

est, there are large numbers of incomplete data

conditions, even for our toy examples in Figure 1.

In this study, we assume the highest degree of do-

main transfer and carefully design controlled ex-

periments where one domain is left out for some

language pairs (Table 1). We then examine the

potential of MDML-NMT on this incomplete data

condition. We also explore training strategies for

effective integration of multi-domain and multi-

lingual NMT, mainly on (i) how to combine the

LAW IT KORAN MED SUB

En-Fr 4 4 4 4 4

En-De 4 4 4 4 4

De-Fr 4 4 4 4 4

En-Cs 74 4 4 4

En-Pl 74 4 4 4

Table 1: Illustration of leave-one-out LAW experiment

setting. 7,4describes whether there is bitext in the

corresponding domain for the given language pairs.

language and domain tags, and (ii) using auxiliary

task training to learn effective representations. Our

contributions are as follows:

•

We investigate effective strategies to jointly

learn multi-domain and multilingual NMT

models under the incomplete data condition.

•

Our empirical results show that MDML-NMT

model can improve translation quality in

the zero-shot directions by mitigating the

off-target translation

issue that an MNMT

model translates the input sentence to a wrong

target language. Additionally, MDML-NMT

exhibits domain transfer ability by achiev-

ing up to +4 BLEU improvement over the

multi-domain NMT on the translation direc-

tion where in-domain training data is ab-

sent. Thanks to the effective cross-domain

and cross-lingual knowledge sharing, MDML-

NMT outperforms the adapter-based method

(Cooper Stickland et al.,2021) by a large mar-

gin in the language-domain zero-shot setting.

•

Our study sheds light on effective MDML-

NMT training. Our experimental results re-

veal that: (i) for the domain, it is important

to make the encoder domain-aware by either

providing the domain tags or training with the

auxiliary task; and (ii) for the language, the

best practice is to prepend the target language

tag to the encoder.

2 Multi-domain Multilingual NMT

In this section, we ﬁrst provide the necessary back-

ground on multilingual NMT (MNMT) and multi-

domain NMT individually. We then describe ef-

fective modelling approaches for the integration

of multi-domain and multilingual NMT (MDML-

NMT).

2.1 Multilingual NMT

Given a set of languages

, the primary goal of

MNMT is to learn a single NMT model that can

handle all translation directions of interest in this

set of languages (Dabre et al.,2020). According to

the parameter sharing strategy, MNMT can be cat-

egorised into: 1) partial parameter sharing (Dong

et al.,2015;Firat et al.,2016;Zhang et al.,2021),

and 2) full parameter sharing (Ha et al.,2016;

Johnson et al.,2017). The latter has been widely

adopted because of its simplicity, lightweight, and

its zero-shot capability. Thus, we adopt the full

parameter sharing strategy in our work.

In the fully parameter-shared MNMT, all pa-

rameters of encoders, decoders and attentions are

shared across tasks. Special language tags are in-

troduced to indicate the target languages. One can

prepend the target language tags to either the source

or target sentences. The model is then trained

jointly to minimise the negative log-likelihood

across all training instances:

LML(θ

θ) := −X

(s,t)∈T

x,y

y)∈Cs,t

log P(y

y|x

x;θ

θ)(1)

where

is model parameters,

Cs,t

denotes a bilin-

gual corpus for the source language

and the target

language

x, y

is a pair of parallel sentences in

the source and target language, and

denotes the

translation tasks for which we have bitext available.

Among all possible language pairs

(s, t)∈L×L

we often only have access to bilingual data for a

subset of them. We denote these pairs as seen (ob-

served) translation tasks, and the rest as unseen

tasks corresponding to the zero-shot setting.

2.2 Multi-domain NMT

Multi-domain NMT aims to handle translation

tasks across multiple domains for a given language

pair. Similar to MNMT, tagging the training corpus

is the most popular approach, where a tag indicates

the domain of a sentence pair. We also minimise

the negative log-likelihood across all domains to

train the model:

LMD(θ

θ) := −X

d∈D

x,y

y)∈Cd

s,t

log P(y

y|x

x;θ

θ)(2)

where

is the set of domains, and

s,t

denotes

the parallel bitext in the source language

, target

language t, and the domain d.

Apart from tagging, some auxiliary tasks have

also been incorporated into the training process. A

common practice is the use of domain discrimina-

tion, which aims to force the encoder to capture

domain-aware characteristics (Britz et al.,2017).

For this purpose, a domain discriminator is added

to the NMT model at training time. The input to the

discriminator is the encoder output, and its output

predicts the probability of the domain of the source

sentence. The discriminator is jointly trained with

the NMT model, and is discarded at inference time.

Let

h=enc(x

be the representation of sen-

tence

computed by the mean-pooling over the

hidden states of the top layer of the encoder. The

training objective for the domain-aware encoder is

as follows:

Ldisc(θ

θ, ψ

ψ) := −X

d∈D

x,y

y)∈Cd

s,t

log Pr(d|h;ψ

ψ)(3)

LMD-aware(θ

θ, ψ

ψ) := LMD(θ

θ) + λLdisc(θ

θ, ψ

ψ)(4)

where

is the parameter of the domain discrimina-

tor classiﬁer, and

controls the contribution of the

domain discriminator into the training objective of

the multi-domain NMT model.

Alternatively, one can design an adversarial train-

ing objective in order to learn domain-agnostic rep-

resentations by the encoder. This is achieved by

inserting a gradient reversal layer (Ganin and Lem-

pitsky,2015) between the encoder and the domain

discriminator. The gradient reversal layer behaves

as an identity layer in the forward pass but reverses

the gradient sign during back-propagation. It has

the opposite effect on the encoder, forcing it to

learn domain-agnostic representations. This en-

courages the domain speciﬁc characteristic to be

learned mainly by the decoder of the NMT model.

2.3 Composition of Domains and Languages

In this paper, we explore strategies for composing

multi-domain and multilingual NMT. We consider

the incomplete multi-domain multilingual data con-

dition where in-domain data may be only available

in a subset of language pairs. For example, Ta-

ble 1shows one of the data conditions explored in

our experiments in Section 3. Given the ﬁve lan-

guage pairs and ﬁve domains, we assume that the

domain data in some language pairs are missing.

Our goal is to investigate effective techniques to

train a high-quality MDML-NMT model covering

all combinations of domains and language pairs.

Given a speciﬁc domain, we deﬁne in-domain

languages as those having data available in the

domain as part of some bilingual corpora; the rest

Figure 2: Illustration of domain and languages composition strategies: (a) prepending domain (D) and target

language (T) tag to encoder (ENC) or decoder (DEC). This example shows a T-ENC D-DEC model where the

target language tag and domain tag are added to encoder and decoder respectively; (b) combining the tagging

method with the domain aware auxiliary task (MDML + aware) to learn domain-aware representation; and (c)

combining the tagging method with the domain adversarial auxiliary task (MDML + adv) to learn domain-agnostic

representation.

Trans. direction Eval. domain MDML task type

En→De LAW seen in→in

En→Cs LAW seen in→out

Pl→En LAW seen out→in

De→Cs LAW unseen (zero-shot) in→out

Cs→De LAW unseen (zero-shot) out→in

Pl→Cs LAW unseen (zero-shot) out→out

Table 2: Examples of MDML task types in the leave-

one-domain-out LAW training scenario of Table 1.

Please refer to Table 1for the in/out and seen/unseen

settings.

of the languages are referred to as out-of-domain

languages. We consider all combinations of in-

domain/out-of-domain source/target languages for

both seen and unseen translation directions (see

examples in Table 2) in Section 3.

We investigate different combinations of the tag-

ging strategy and auxiliary task training to effec-

tively train MDML-NMT models, as shown in Fig-

ure 2.

Language and Domain Tags.

We explore differ-

ent ways of injecting the target language tags and

domain tags into the translation process. Following

the standard convention, we explore inserting the

target language tag at the beginning of either the

source sentence or the translation. Furthermore, the

domain tag can also be added to either the source

or the target side.

Auxiliary Task Training.

We investigate the ef-

fect of encoder-based auxiliary tasks on MDML-

NMT. As described in Section 2.2, we consider

two types of auxiliary objectives to train encoder

which are domain-aware or domain-agnostic. The

former aims to amplify the domain-related features,

while the latter focuses on the domain invariant

representation in the encoder.

3 Experiments

In this section, we evaluate the MDML-NMT ap-

proaches and seek to answer the following research

questions (RQs):

•RQ1

:Do out-of-domain translation tasks ben-

eﬁt from the out-of-domain and in-domain bi-

text in other translation pairs?

We explore the beneﬁts of having a single

MDML model trained on all available train-

ing data from multiple languages and domains

over the multi-domain bilingual (MDBL) and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CanDomainsBeTransferredAcrossLanguagesinMulti-DomainMultilingualNeuralMachineTranslation?Thuy-TrangVu}andShahramKhadiviyXuanliHe}andDinhPhung}andGholamrezaHaffari}}DepartmentofDataScienceandAI,MonashUniversity,AustraliayeBayInc.{trang.vu1,xuanli.he1,first.last}@monash.eduskhadivi@ebay.comAbstractPr...

展开>> 收起<<

Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation Thuy-Trang Vuand Shahram Khadiviy.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation Thuy-Trang Vuand Shahram Khadiviy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: