Domain Specific Sub-network for Multi-Domain Neural Machine
Translation
Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify and Ahmed Y. Tawfik
Microsoft Egypt Development Center, Cairo, Egypt
{amrhendy,mohamed.abdelghaar,mafify,atawfik}@microsoft.com
Abstract
This paper presents Domain-Specific Sub-
network (DoSS). It uses a set of masks ob-
tained through pruning to define a sub-network
for each domain and finetunes the sub-network
parameters on domain data. This performs
very closely and drastically reduces the num-
ber of parameters compared to finetuning the
whole network on each domain. Also a
method to make masks unique per domain is
proposed and shown to greatly improve the
generalization to unseen domains. In our
experiments on German to English machine
translation the proposed method outperforms
the strong baseline of continue training on
multi-domain (medical, tech and religion) data
by 1.47 BLEU points. Also continue training
DoSS on new domain (legal) outperforms the
multi-domain (medical, tech, religion, legal)
baseline by 1.52 BLEU points.
1 Introduction
Neural machine translation (NMT) has witnessed
significant advances based on transformer models
(Vaswani et al.,2017). These models are typically
trained on large amounts of data from different
sources, i.e. general data, from a single language
pair or multiple languages (Aharoni et al.,2019).
The fact that the models are trained on general data
usually leads to poor, or less than average, perfor-
mance on specific domains. This has a lot of practi-
cal implication since many users of machine trans-
lation are interested in the performance on some
specific domain(s). Therefore, improving the per-
formance of NMT on specific domains has become
an active area of research. We refer the reader to
(Chu and Wang,2018) for a review. Broadly speak-
ing, the proposed techniques could be divided into
data-centric and model-centric approaches. The
goal of the former methods is to acquire, often au-
tomatically, monolingual and bilingual data that is
representative of the domain of interest. The latter
techniques, on the other hand, focus on modifying
the model to perform well on the domain of inter-
est without sacrificing the performance on general
data.
Finetuning of the model parameters using do-
main data is perhaps one of the earliest and most
popular techniques for domain adaptation (Freitag
and Al-Onaizan,2016). Parallel domain data is
usually limited and to avoid overfitting different
techniques as model interpolation (Wortsman et al.,
2021), regularization (Miceli Barone et al.,2017)
and mixing domain and general data (Chu et al.,
2017) are used. Also other methods that intro-
duce additional parameters in a controllable way
have been successfully introduced such as adapters
(Bapna and Firat,2019) and low-rank adaptation
(LoRA) (Hu et al.,2021).
In (Frankle and Carbin,2018) it is shown that
identifying sub-networks by pruning a large net-
work, referred to as winning tickets, and retraining
them leads to equal accuracy to the original net-
work. This idea is explored for multilingual neural
machine translation (MNMT) using the so-called
language specific sub-networks (LaSS) (Lin et al.,
2021). Here we further explore the idea for domain
finetuning and refer to it as Domain Specific Sub-
network (DoSS). The basic idea is to identify a
sub-network per domain via pruning and masking.
The sub-network has both shared parameters with
other domains as well as domain-specific parame-
ters. It should be noted that the mask can overlap
for multiple domains which results in some param-
eters shared by multiple domains. We also explore
using constrained masks where we ensure that each
mask represents only one domain. The latter is
expected to work better for adding unseen domains.
In contrast to language, domain information may
not be necessarily known at inference time. In this
work, similar to common domain fientuning se-
tups, we assume the domain information is known
but using a domain classifier at runtime should be
straight forward. Given the domain information,
arXiv:2210.09805v1 [cs.CL] 18 Oct 2022