m4Adapter Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter Wen Lai1 Alexandra Chronopoulou12 Alexander Fraser12

2025-05-02 0 0 1.12MB 15 页 10玖币

侵权投诉

m4Adapter: Multilingual Multi-Domain Adaptation for Machine

Translation with a Meta-Adapter

Wen Lai1, Alexandra Chronopoulou1,2, Alexander Fraser1,2

1Center for Information and Language Processing, LMU Munich, Germany

2Munich Center for Machine Learning, Germany

{lavine, achron, fraser}@cis.lmu.de

Abstract

Multilingual neural machine translation mod-

els (MNMT) yield state-of-the-art perfor-

mance when evaluated on data from a domain

and language pair seen at training time. How-

ever, when a MNMT model is used to trans-

late under domain shift or to a new language

pair, performance drops dramatically. We con-

sider a very challenging scenario: adapting the

MNMT model both to a new domain and to

a new language pair at the same time. In this

paper, we propose m4Adapter (Multilingual

Multi-Domain Adaptation for Machine Trans-

lation with a Meta-Adapter), which combines

domain and language knowledge using meta-

learning with adapters. We present results

showing that our approach is a parameter-

efﬁcient solution which effectively adapts a

model to both a new language pair and a new

domain, while outperforming other adapter

methods. An ablation study also shows that

our approach more effectively transfers do-

main knowledge across different languages

and language information across different do-

mains.1

1 Introduction

Multilingual neural machine translation (MNMT;

Johnson et al.,2017;Aharoni et al.,2019;Fan et al.,

2021), uses a single model to handle translation

between multiple language pairs. There are two

reasons why MNMT is appealing: ﬁrst, it has been

proved to be effective on transferring knowledge

from high-resource languages to low-resource lan-

guages, especially in zero-shot scenarios (Gu et al.,

2019;Zhang et al.,2020); second, it signiﬁcantly

reduces training and inference cost, as it requires

training only a single multilingual model, instead

of a separate model for each language pair.

Adapting MNMT models to multiple domains

is still a challenging task, particularly when do-

Our source code is available at

https://github.com/

lavine-lmu/m4Adapter

mains are distant to the domain of the training cor-

pus. One approach to address this is ﬁne-tuning

the model on out-of-domain data for NMT (Freitag

and Al-Onaizan,2016;Dakwale and Monz,2017).

Another approach is to use lightweight, learnable

units inserted between transformer layers, which

are called adapters (Bapna and Firat,2019) for

each new domain. Similarly, there is research work

on adapting MNMT models to a new language

pair using ﬁne-tuning (Neubig and Hu,2018) and

adapters (Bapna and Firat,2019;Philip et al.,2020;

Cooper Stickland et al.,2021b).

Although effective, the above approaches have

some limitations: i) Fine-tuning methods require

updating the parameters of the whole model for

each new domain, which is costly; ii) when ﬁne-

tuning on a new domain, catastrophic forgetting

(McCloskey and Cohen,1989) reduces the perfor-

mance on all other domains, and proves to be a sig-

niﬁcant issue when data resources are limited. iii)

adapter-based approaches require training domain

adapters for each domain and language adapters

for all languages, which also becomes parameter-

inefﬁcient when adapting to a new domain and a

new language because the parameters scale linearly

with the number of domains and languages.

In recent work, Cooper Stickland et al. (2021a)

compose language adapters and domain adapters in

MNMT and explore to what extent domain knowl-

edge can be transferred across languages. They

ﬁnd that it is hard to decouple language knowl-

edge from domain knowledge and that adapters of-

ten cause the ‘off-target’ problem (i.e., translating

into a wrong target language (Zhang et al.,2020))

when new domains and new language pairs are

combined together. They address this problem by

using additional in-domain monolingual data to

generate synthetic data (i.e., back-translation; Sen-

nrich et al.,2016) and randomly dropping some

domain adapter layers (AdapterDrop; Rücklé et al.,

2021).

arXiv:2210.11912v1 [cs.CL] 21 Oct 2022

Motivated by Cooper Stickland et al. (2021a),

we consider a challenging scenario: adapting a

MNMT model to multiple new domains and new

language directions simultaneously in low-resource

settings without using extra monolingual data for

back-translation. This scenario could arise when

one tries to translate a domain-speciﬁc corpus with

a commercial translation system. Using our ap-

proach, we adapt a model to a new domain and

a new language pair using just 500 domain- and

language-speciﬁc sentences.

To this end, we propose

m4Adapter

(

ultilingual

ulti-Domain Adaptation for

achine Translation with

eta-

Adapter

), which

facilitates the transfer between different domains

and languages using meta-learning (Finn et al.,

2017) with adapters. Our hypothesis is that we

can formulate the task, which is to adapt to new

languages and domains, as a multi-task learning

problem (and denote it as

, which stands

for translating from a language

to a language

in a speciﬁc domain

). Our approach is

two-step: initially, we perform meta-learning with

adapters to efﬁciently learn parameters in a shared

representation space across multiple tasks using

a small amount of training data (5000 samples);

we refer to this as the meta-training step. Then,

we ﬁne-tune the trained model to a new domain

and language pair simultaneously using an even

smaller dataset (500 samples); we refer to this as

the meta-adaptation step.

In this work, we make the following con-

tributions: i) We present

m4Adapter

, a meta-

learning approach with adapters that can easily

adapt to new domains and languages using a sin-

gle MNMT model. Experimental results show

that

m4Adapter

outperforms strong baselines. ii)

Through an ablation study, we show that using

m4Adapter

, domain knowledge can be transferred

across languages and language knowledge can also

be transferred across domains without using target-

language monolingual data for back-translation (un-

like the work of Cooper Stickland et al.,2021a).

iii) To the best of our knowledge, this paper is

the ﬁrst work to explore meta-learning for MNMT

adaptation.

2 Related Work

Domain Adaptation in NMT.

Existing work on

domain adaptation for machine translation can be

categorized into two types: data-centric and model-

centric approaches (Chu and Wang,2018). The

former focus on maximizing the use of in-domain

monolingual, synthetic, and parallel data (Domhan

and Hieber,2017;Park et al.,2017;van der Wees

et al.,2017), while the latter design speciﬁc training

objectives, model architectures or decoding algo-

rithms for domain adaptation (Khayrallah et al.,

2017;Gu et al.,2019;Park et al.,2022). In the

case of MNMT, adapting to new domains is more

challenging because it needs to take into account

transfer between languages (Chu and Dabre,2019;

Cooper Stickland et al.,2021a).

Meta-Learning for NMT.

Meta-learning (Finn

et al.,2017), which aims to learn a generally use-

ful model by training on a distribution of tasks, is

highly effective for fast adaptation and has recently

been shown to be beneﬁcial for many NLP tasks

(Lee et al.,2022). Gu et al. (2018) ﬁrst introduce a

model-agnostic meta-learning algorithm (MAML;

Finn et al.,2017) for low-resource machine trans-

lation. Sharaf et al. (2020), Zhan et al. (2021) and

Lai et al. (2022) formulate domain adaptation for

NMT as a meta-learning task, and show effective

performance on adapting to new domains. Our ap-

proach leverages meta-learning to adapt a MNMT

model to a new domain and to a new language pair

at the same time.

Adapters for NMT.

Bapna and Firat (2019) train

language-pair adapters on top of a pre-trained

generic MNMT model, in order to recover lost

performance on high-resource language pairs com-

pared to bilingual NMT models. Philip et al. (2020)

train adapters for each language and show that

adding them to a trained model improves the per-

formance of zero-shot translation. Chronopoulou

et al. (2022) train adapters for each language fam-

ily and show promising results on multilingual ma-

chine translation. Cooper Stickland et al. (2021b)

train language-agnostic adapters to efﬁciently ﬁne-

tune a pre-trained model for many language pairs.

More recently, Cooper Stickland et al. (2021a)

stack language adapters and domain adapters on

top of an MNMT model and they conclude that it is

not possible to transfer domain knowledge across

languages, except by employing back-translation

which requires signiﬁcant in-domain resources. In

this work, we introduce adapters into the meta-

learning algorithm and show that this approach

permits transfer between domains and languages.

Our work is mostly related to Cooper Stickland

et al. (2021a), however we note several differences:

i) we study a more realistic scenario: the corpus

of each domain and language pair is low-resource

(i.e., the meta-training corpus in each domain for

each language pair is limited to 5000 sentences and

the ﬁne-tuning corpus to 500 sentences), which

is easier to obtain; ii) our approach can simultane-

ously adapt to new domains and new language pairs

without using back-translation. iii) we also show

that

m4Adapter

can transfer domain information

across different languages and language knowledge

across different domains through a detailed abla-

tion analysis.

3 Method

Our goal is to efﬁciently adapt an MNMT model to

new domains and languages. We propose a novel

approach,

m4Adapter

, which formulates the mul-

tilingual multi-domain adaptation task as a multi-

task learning problem. To address it, we propose

a 2-step approach, which combines meta-learning

and meta-adaptation with adapters. Our approach

permits sharing parameters across different tasks.

The two steps are explained in Subsections 3.1 and

3.2.

3.1 Meta-Training

The goal of meta-learning is to obtain a model that

can easily adapt to new tasks. To this end, we meta-

train adapters in order to ﬁnd a good initialization

of our model’s parameters using a small training

dataset of source tasks {T1,...,Tt}.

We ﬁrst select

tasks, as we describe in

3.1.1.

Then, for each of the

sampled tasks, we sample

examples. We explain the task sampling strategy in

3.1.2. This way, we set up the m-way-n-shot task.

After setting up the task, we use a meta-learning

algorithm, which we describe in

3.1.3, to meta-

learn the parameters of the adapter layers. The

architecture of the adapters and their optimization

objective are presented in

3.1.4. Algorithm 1

details the meta-training process of our approach.

3.1.1 Task Deﬁnition

Motivated by the work of Tarunesh et al. (2021),

where a multilingual multi-task NLP task is re-

garded as a Task-Language pair (TLP), we address

multilingual multi-domain translation as a multi-

task learning problem. Speciﬁcally, a translation

task in a speciﬁc textual domain corresponds to

a Domain-Language-Pair (

DLP

). For example, an

English-Serbian translation task in the ‘Ubuntu’ do-

main is denoted as a DLP ‘Ubuntu-en-sr’. Given

domains and

languages, we have

d·l·(l−1)

tasks of this form.

We denote the proportion of

the dataset size of all DLPs for the

ith

DLP as

si=|Di

train|/Pn

a=1 |Da

train|

, where

will be

used in temperature-based sampling (see more de-

tails in

3.1.2). The probability of sampling a

batch from the

ith

DLP during meta-training is de-

noted as

PD(i)

. The distribution over all DLPs, is a

multinomial (which we denote as

) over

PD(i)

M ∼ PD(i).

3.1.2 Task Sampling

Given

domains and

languages, we sample some

DLPs per batch among all

d·l·(l−1)

tasks. We

consider a standard m-way-n-shot meta-learning

scenario: assuming access to

d·l·(l−1)

DLPs, a

m-way-n-shot task is created by ﬁrst sampling

DLPs (

ml·(l−1)

); then, for each of the

sampled DLPs, (

n+q

) examples of each DLP are

selected; the

examples for each DLP serve as the

support set to update the parameter of pre-trained

model, while

examples constitute the query set

to evaluate the model.

Task sampling is an essential step for meta-

learning. Traditional meta-learning methods sam-

ple the tasks uniformly (Sharaf et al.,2020),

through ordered curriculum (Zhan et al.,2021), or

dynamically adjust the sampled dataset according

to the model parameters (parameterized sampling

strategy, Tarunesh et al.,2021). We do not employ

these strategies for the following reasons: i) sam-

pling uniformly is simple but does not consider the

distribution of the unbalanced data; ii) Although

effective, curriculum-based and parameterized sam-

pling consider features of all

d·l·(l−1)

DLPs.

Because of this, the amount of DLPs is growing

exponentially with the number of languages and do-

mains. In contrast, we follow a temperature-based

heuristic sampling strategy (Aharoni et al.,2019),

which deﬁnes the probability of any dataset as a

function of its size. Speciﬁcally, given

as the

percentage of the

ith

DLP in all DLPs, we com-

pute the following probability of the

ith

DLP to be

sampled:

PD(i) = s1/τ

i/ n

a=1

s1/τ

where

is a temperature parameter.

τ= 1

means

that each DLP is sampled in proportion to the size

Given

languages, we focus on complete translation be-

tween l·(l−1) language directions.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

m4Adapter:MultilingualMulti-DomainAdaptationforMachineTranslationwithaMeta-AdapterWenLai1,AlexandraChronopoulou1;2,AlexanderFraser1;21CenterforInformationandLanguageProcessing,LMUMunich,Germany2MunichCenterforMachineLearning,Germany{lavine,achron,fraser}@cis.lmu.deAbstractMultilingualneuralmachinetr...

展开>> 收起<<

m4Adapter Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter Wen Lai1 Alexandra Chronopoulou12 Alexander Fraser12.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

m4Adapter Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter Wen Lai1 Alexandra Chronopoulou12 Alexander Fraser12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: