m4Adapter Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter Wen Lai1 Alexandra Chronopoulou12 Alexander Fraser12

2025-05-02 0 0 1.12MB 15 页 10玖币
侵权投诉
m4Adapter: Multilingual Multi-Domain Adaptation for Machine
Translation with a Meta-Adapter
Wen Lai1, Alexandra Chronopoulou1,2, Alexander Fraser1,2
1Center for Information and Language Processing, LMU Munich, Germany
2Munich Center for Machine Learning, Germany
{lavine, achron, fraser}@cis.lmu.de
Abstract
Multilingual neural machine translation mod-
els (MNMT) yield state-of-the-art perfor-
mance when evaluated on data from a domain
and language pair seen at training time. How-
ever, when a MNMT model is used to trans-
late under domain shift or to a new language
pair, performance drops dramatically. We con-
sider a very challenging scenario: adapting the
MNMT model both to a new domain and to
a new language pair at the same time. In this
paper, we propose m4Adapter (Multilingual
Multi-Domain Adaptation for Machine Trans-
lation with a Meta-Adapter), which combines
domain and language knowledge using meta-
learning with adapters. We present results
showing that our approach is a parameter-
efficient solution which effectively adapts a
model to both a new language pair and a new
domain, while outperforming other adapter
methods. An ablation study also shows that
our approach more effectively transfers do-
main knowledge across different languages
and language information across different do-
mains.1
1 Introduction
Multilingual neural machine translation (MNMT;
Johnson et al.,2017;Aharoni et al.,2019;Fan et al.,
2021), uses a single model to handle translation
between multiple language pairs. There are two
reasons why MNMT is appealing: first, it has been
proved to be effective on transferring knowledge
from high-resource languages to low-resource lan-
guages, especially in zero-shot scenarios (Gu et al.,
2019;Zhang et al.,2020); second, it significantly
reduces training and inference cost, as it requires
training only a single multilingual model, instead
of a separate model for each language pair.
Adapting MNMT models to multiple domains
is still a challenging task, particularly when do-
1
Our source code is available at
https://github.com/
lavine-lmu/m4Adapter
mains are distant to the domain of the training cor-
pus. One approach to address this is fine-tuning
the model on out-of-domain data for NMT (Freitag
and Al-Onaizan,2016;Dakwale and Monz,2017).
Another approach is to use lightweight, learnable
units inserted between transformer layers, which
are called adapters (Bapna and Firat,2019) for
each new domain. Similarly, there is research work
on adapting MNMT models to a new language
pair using fine-tuning (Neubig and Hu,2018) and
adapters (Bapna and Firat,2019;Philip et al.,2020;
Cooper Stickland et al.,2021b).
Although effective, the above approaches have
some limitations: i) Fine-tuning methods require
updating the parameters of the whole model for
each new domain, which is costly; ii) when fine-
tuning on a new domain, catastrophic forgetting
(McCloskey and Cohen,1989) reduces the perfor-
mance on all other domains, and proves to be a sig-
nificant issue when data resources are limited. iii)
adapter-based approaches require training domain
adapters for each domain and language adapters
for all languages, which also becomes parameter-
inefficient when adapting to a new domain and a
new language because the parameters scale linearly
with the number of domains and languages.
In recent work, Cooper Stickland et al. (2021a)
compose language adapters and domain adapters in
MNMT and explore to what extent domain knowl-
edge can be transferred across languages. They
find that it is hard to decouple language knowl-
edge from domain knowledge and that adapters of-
ten cause the ‘off-target’ problem (i.e., translating
into a wrong target language (Zhang et al.,2020))
when new domains and new language pairs are
combined together. They address this problem by
using additional in-domain monolingual data to
generate synthetic data (i.e., back-translation; Sen-
nrich et al.,2016) and randomly dropping some
domain adapter layers (AdapterDrop; Rücklé et al.,
2021).
arXiv:2210.11912v1 [cs.CL] 21 Oct 2022
Motivated by Cooper Stickland et al. (2021a),
we consider a challenging scenario: adapting a
MNMT model to multiple new domains and new
language directions simultaneously in low-resource
settings without using extra monolingual data for
back-translation. This scenario could arise when
one tries to translate a domain-specific corpus with
a commercial translation system. Using our ap-
proach, we adapt a model to a new domain and
a new language pair using just 500 domain- and
language-specific sentences.
To this end, we propose
m4Adapter
(
M
ultilingual
M
ulti-Domain Adaptation for
M
achine Translation with
M
eta-
Adapter
), which
facilitates the transfer between different domains
and languages using meta-learning (Finn et al.,
2017) with adapters. Our hypothesis is that we
can formulate the task, which is to adapt to new
languages and domains, as a multi-task learning
problem (and denote it as
Di
-
L1
-
L2
, which stands
for translating from a language
L1
to a language
L2
in a specific domain
Di
). Our approach is
two-step: initially, we perform meta-learning with
adapters to efficiently learn parameters in a shared
representation space across multiple tasks using
a small amount of training data (5000 samples);
we refer to this as the meta-training step. Then,
we fine-tune the trained model to a new domain
and language pair simultaneously using an even
smaller dataset (500 samples); we refer to this as
the meta-adaptation step.
In this work, we make the following con-
tributions: i) We present
m4Adapter
, a meta-
learning approach with adapters that can easily
adapt to new domains and languages using a sin-
gle MNMT model. Experimental results show
that
m4Adapter
outperforms strong baselines. ii)
Through an ablation study, we show that using
m4Adapter
, domain knowledge can be transferred
across languages and language knowledge can also
be transferred across domains without using target-
language monolingual data for back-translation (un-
like the work of Cooper Stickland et al.,2021a).
iii) To the best of our knowledge, this paper is
the first work to explore meta-learning for MNMT
adaptation.
2 Related Work
Domain Adaptation in NMT.
Existing work on
domain adaptation for machine translation can be
categorized into two types: data-centric and model-
centric approaches (Chu and Wang,2018). The
former focus on maximizing the use of in-domain
monolingual, synthetic, and parallel data (Domhan
and Hieber,2017;Park et al.,2017;van der Wees
et al.,2017), while the latter design specific training
objectives, model architectures or decoding algo-
rithms for domain adaptation (Khayrallah et al.,
2017;Gu et al.,2019;Park et al.,2022). In the
case of MNMT, adapting to new domains is more
challenging because it needs to take into account
transfer between languages (Chu and Dabre,2019;
Cooper Stickland et al.,2021a).
Meta-Learning for NMT.
Meta-learning (Finn
et al.,2017), which aims to learn a generally use-
ful model by training on a distribution of tasks, is
highly effective for fast adaptation and has recently
been shown to be beneficial for many NLP tasks
(Lee et al.,2022). Gu et al. (2018) first introduce a
model-agnostic meta-learning algorithm (MAML;
Finn et al.,2017) for low-resource machine trans-
lation. Sharaf et al. (2020), Zhan et al. (2021) and
Lai et al. (2022) formulate domain adaptation for
NMT as a meta-learning task, and show effective
performance on adapting to new domains. Our ap-
proach leverages meta-learning to adapt a MNMT
model to a new domain and to a new language pair
at the same time.
Adapters for NMT.
Bapna and Firat (2019) train
language-pair adapters on top of a pre-trained
generic MNMT model, in order to recover lost
performance on high-resource language pairs com-
pared to bilingual NMT models. Philip et al. (2020)
train adapters for each language and show that
adding them to a trained model improves the per-
formance of zero-shot translation. Chronopoulou
et al. (2022) train adapters for each language fam-
ily and show promising results on multilingual ma-
chine translation. Cooper Stickland et al. (2021b)
train language-agnostic adapters to efficiently fine-
tune a pre-trained model for many language pairs.
More recently, Cooper Stickland et al. (2021a)
stack language adapters and domain adapters on
top of an MNMT model and they conclude that it is
not possible to transfer domain knowledge across
languages, except by employing back-translation
which requires significant in-domain resources. In
this work, we introduce adapters into the meta-
learning algorithm and show that this approach
permits transfer between domains and languages.
Our work is mostly related to Cooper Stickland
et al. (2021a), however we note several differences:
i) we study a more realistic scenario: the corpus
of each domain and language pair is low-resource
(i.e., the meta-training corpus in each domain for
each language pair is limited to 5000 sentences and
the fine-tuning corpus to 500 sentences), which
is easier to obtain; ii) our approach can simultane-
ously adapt to new domains and new language pairs
without using back-translation. iii) we also show
that
m4Adapter
can transfer domain information
across different languages and language knowledge
across different domains through a detailed abla-
tion analysis.
3 Method
Our goal is to efficiently adapt an MNMT model to
new domains and languages. We propose a novel
approach,
m4Adapter
, which formulates the mul-
tilingual multi-domain adaptation task as a multi-
task learning problem. To address it, we propose
a 2-step approach, which combines meta-learning
and meta-adaptation with adapters. Our approach
permits sharing parameters across different tasks.
The two steps are explained in Subsections 3.1 and
3.2.
3.1 Meta-Training
The goal of meta-learning is to obtain a model that
can easily adapt to new tasks. To this end, we meta-
train adapters in order to find a good initialization
of our model’s parameters using a small training
dataset of source tasks {T1,...,Tt}.
We first select
m
tasks, as we describe in
§
3.1.1.
Then, for each of the
m
sampled tasks, we sample
n
examples. We explain the task sampling strategy in
§
3.1.2. This way, we set up the m-way-n-shot task.
After setting up the task, we use a meta-learning
algorithm, which we describe in
§
3.1.3, to meta-
learn the parameters of the adapter layers. The
architecture of the adapters and their optimization
objective are presented in
§
3.1.4. Algorithm 1
details the meta-training process of our approach.
3.1.1 Task Definition
Motivated by the work of Tarunesh et al. (2021),
where a multilingual multi-task NLP task is re-
garded as a Task-Language pair (TLP), we address
multilingual multi-domain translation as a multi-
task learning problem. Specifically, a translation
task in a specific textual domain corresponds to
a Domain-Language-Pair (
DLP
). For example, an
English-Serbian translation task in the ‘Ubuntu’ do-
main is denoted as a DLP ‘Ubuntu-en-sr’. Given
d
domains and
l
languages, we have
d·l·(l1)
tasks of this form.
2
We denote the proportion of
the dataset size of all DLPs for the
ith
DLP as
si=|Di
train|/Pn
a=1 |Da
train|
, where
si
will be
used in temperature-based sampling (see more de-
tails in
§
3.1.2). The probability of sampling a
batch from the
ith
DLP during meta-training is de-
noted as
PD(i)
. The distribution over all DLPs, is a
multinomial (which we denote as
M
) over
PD(i)
:
M ∼ PD(i).
3.1.2 Task Sampling
Given
d
domains and
l
languages, we sample some
DLPs per batch among all
d·l·(l1)
tasks. We
consider a standard m-way-n-shot meta-learning
scenario: assuming access to
d·l·(l1)
DLPs, a
m-way-n-shot task is created by first sampling
m
DLPs (
ml·(l1)
); then, for each of the
m
sampled DLPs, (
n+q
) examples of each DLP are
selected; the
n
examples for each DLP serve as the
support set to update the parameter of pre-trained
model, while
q
examples constitute the query set
to evaluate the model.
Task sampling is an essential step for meta-
learning. Traditional meta-learning methods sam-
ple the tasks uniformly (Sharaf et al.,2020),
through ordered curriculum (Zhan et al.,2021), or
dynamically adjust the sampled dataset according
to the model parameters (parameterized sampling
strategy, Tarunesh et al.,2021). We do not employ
these strategies for the following reasons: i) sam-
pling uniformly is simple but does not consider the
distribution of the unbalanced data; ii) Although
effective, curriculum-based and parameterized sam-
pling consider features of all
d·l·(l1)
DLPs.
Because of this, the amount of DLPs is growing
exponentially with the number of languages and do-
mains. In contrast, we follow a temperature-based
heuristic sampling strategy (Aharoni et al.,2019),
which defines the probability of any dataset as a
function of its size. Specifically, given
si
as the
percentage of the
ith
DLP in all DLPs, we com-
pute the following probability of the
ith
DLP to be
sampled:
PD(i) = s1
i/ n
X
a=1
s1
a!
where
τ
is a temperature parameter.
τ= 1
means
that each DLP is sampled in proportion to the size
2
Given
l
languages, we focus on complete translation be-
tween l·(l1) language directions.
摘要:

m4Adapter:MultilingualMulti-DomainAdaptationforMachineTranslationwithaMeta-AdapterWenLai1,AlexandraChronopoulou1;2,AlexanderFraser1;21CenterforInformationandLanguageProcessing,LMUMunich,Germany2MunichCenterforMachineLearning,Germany{lavine,achron,fraser}@cis.lmu.deAbstractMultilingualneuralmachinetr...

展开>> 收起<<
m4Adapter Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter Wen Lai1 Alexandra Chronopoulou12 Alexander Fraser12.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.12MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注