
Motivated by Cooper Stickland et al. (2021a),
we consider a challenging scenario: adapting a
MNMT model to multiple new domains and new
language directions simultaneously in low-resource
settings without using extra monolingual data for
back-translation. This scenario could arise when
one tries to translate a domain-specific corpus with
a commercial translation system. Using our ap-
proach, we adapt a model to a new domain and
a new language pair using just 500 domain- and
language-specific sentences.
To this end, we propose
m4Adapter
(
M
ultilingual
M
ulti-Domain Adaptation for
M
achine Translation with
M
eta-
Adapter
), which
facilitates the transfer between different domains
and languages using meta-learning (Finn et al.,
2017) with adapters. Our hypothesis is that we
can formulate the task, which is to adapt to new
languages and domains, as a multi-task learning
problem (and denote it as
Di
-
L1
-
L2
, which stands
for translating from a language
L1
to a language
L2
in a specific domain
Di
). Our approach is
two-step: initially, we perform meta-learning with
adapters to efficiently learn parameters in a shared
representation space across multiple tasks using
a small amount of training data (5000 samples);
we refer to this as the meta-training step. Then,
we fine-tune the trained model to a new domain
and language pair simultaneously using an even
smaller dataset (500 samples); we refer to this as
the meta-adaptation step.
In this work, we make the following con-
tributions: i) We present
m4Adapter
, a meta-
learning approach with adapters that can easily
adapt to new domains and languages using a sin-
gle MNMT model. Experimental results show
that
m4Adapter
outperforms strong baselines. ii)
Through an ablation study, we show that using
m4Adapter
, domain knowledge can be transferred
across languages and language knowledge can also
be transferred across domains without using target-
language monolingual data for back-translation (un-
like the work of Cooper Stickland et al.,2021a).
iii) To the best of our knowledge, this paper is
the first work to explore meta-learning for MNMT
adaptation.
2 Related Work
Domain Adaptation in NMT.
Existing work on
domain adaptation for machine translation can be
categorized into two types: data-centric and model-
centric approaches (Chu and Wang,2018). The
former focus on maximizing the use of in-domain
monolingual, synthetic, and parallel data (Domhan
and Hieber,2017;Park et al.,2017;van der Wees
et al.,2017), while the latter design specific training
objectives, model architectures or decoding algo-
rithms for domain adaptation (Khayrallah et al.,
2017;Gu et al.,2019;Park et al.,2022). In the
case of MNMT, adapting to new domains is more
challenging because it needs to take into account
transfer between languages (Chu and Dabre,2019;
Cooper Stickland et al.,2021a).
Meta-Learning for NMT.
Meta-learning (Finn
et al.,2017), which aims to learn a generally use-
ful model by training on a distribution of tasks, is
highly effective for fast adaptation and has recently
been shown to be beneficial for many NLP tasks
(Lee et al.,2022). Gu et al. (2018) first introduce a
model-agnostic meta-learning algorithm (MAML;
Finn et al.,2017) for low-resource machine trans-
lation. Sharaf et al. (2020), Zhan et al. (2021) and
Lai et al. (2022) formulate domain adaptation for
NMT as a meta-learning task, and show effective
performance on adapting to new domains. Our ap-
proach leverages meta-learning to adapt a MNMT
model to a new domain and to a new language pair
at the same time.
Adapters for NMT.
Bapna and Firat (2019) train
language-pair adapters on top of a pre-trained
generic MNMT model, in order to recover lost
performance on high-resource language pairs com-
pared to bilingual NMT models. Philip et al. (2020)
train adapters for each language and show that
adding them to a trained model improves the per-
formance of zero-shot translation. Chronopoulou
et al. (2022) train adapters for each language fam-
ily and show promising results on multilingual ma-
chine translation. Cooper Stickland et al. (2021b)
train language-agnostic adapters to efficiently fine-
tune a pre-trained model for many language pairs.
More recently, Cooper Stickland et al. (2021a)
stack language adapters and domain adapters on
top of an MNMT model and they conclude that it is
not possible to transfer domain knowledge across
languages, except by employing back-translation
which requires significant in-domain resources. In
this work, we introduce adapters into the meta-
learning algorithm and show that this approach
permits transfer between domains and languages.
Our work is mostly related to Cooper Stickland
et al. (2021a), however we note several differences: