
SMaLL-100: Introducing Shallow
Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi∗1,2,3Vassilina Nikoulina1Alexandre Berard1
Caroline Brun1James Henderson2Laurent Besacier1
1NAVER LABS Europe 2IDIAP Research Institute 3EPFL
{first.last}@naverlabs.com
{alireza.mohammadshahi,james.henderson}@idiap.ch
Abstract
In recent years, multilingual machine trans-
lation models have achieved promising
performance on low-resource language pairs
by sharing information between similar lan-
guages, thus enabling zero-shot translation. To
overcome the "curse of multilinguality", these
models often opt for scaling up the number of
parameters, which makes their use in resource-
constrained environments challenging. We
introduce SMaLL-100, a distilled version of
the M2M-100 (12B) model, a massively mul-
tilingual machine translation model covering
100 languages. We train SMaLL-100 with
uniform sampling across all language pairs and
therefore focus on preserving the performance
of low-resource languages. We evaluate
SMaLL-100 on different low-resource bench-
marks: FLORES-101, Tatoeba, and TICO-19
and demonstrate that it outperforms previous
massively multilingual models of comparable
sizes (200-600M) while improving inference
latency and memory usage. Additionally,
our model achieves comparable results to
M2M-100 (1.2B), while being 3.6×smaller
and 4.3×faster at inference.1
1 Introduction
Neural Machine Translation (NMT) systems are
usually trained on datasets consisting of millions
of parallel sentences, thus still performing poorly
on low-resource languages, i.e., languages without
a large amount of training data. Over the past
few years, previous work has proposed several
approaches to improve the quality of translations in
low-resource languages, e.g., Multilingual Neural
Machine Translation (MNMT) models (Johnson
et al.,2017;Fan et al.,2020;Tang et al.,2021;
Goyal et al.,2021), back-translation (Sennrich
et al.,2016;Edunov et al.,2018) and unsupervised
∗
Work done during an internship at NAVER LABS Europe.
1
The code and pre-trained SMaLL-100 model is available
at https://github.com/alirezamshi/small100.
machine translation (Garcia et al.,2021;Ko et al.,
2021). Massively MNMT models are particularly
interesting for low-resource languages as they ben-
efit the most from knowledge transfer from related
languages (Arivazhagan et al.,2019). However,
it is also seen that curse of multilinguality hurts
the performance of high-resource languages. So,
previous work attempted to increase the model size
to maintain the translation performance in both
high and low-resource languages. This makes the
use of these massively MNMT models challenging
in real-world resource-constrained environments.
To overcome this problem, we propose SMaLL-
100, a
S
hallow
M
ultilingual M
a
chine Translation
Model for
L
ow-Resource
L
anguages covering
100 languages, which is a distilled alternative of
M2M-100 (12B) (Fan et al.,2020), the most recent
and biggest available multilingual NMT model. In
this paper, we focus on very-low and low-resource
language pairs as there is no reasonable-size uni-
versal model that achieves acceptable performance
over a great number of low-resource languages.
We do so by training SMaLL-100 on a perfectly
balanced dataset.
2
While this leads to lower
performance on the high-resource languages, we
claim that this loss is easily recoverable through
further fine-tuning. We evaluate SMaLL-100 on
different low-resource benchmarks, e.g., FLORES-
101 (Goyal et al.,2021), Tatoeba (Tiedemann,
2020), and TICO-19 (Anastasopoulos et al.,2020).
To summarize, our contributions are as follows:
•
We propose SMaLL-100, a shallow multilin-
gual NMT model, focusing on low-resource
language pairs.
•
We evaluate SMaLL-100 on several low-
resource NMT benchmarks.
•
We show that our model significantly out-
performs previous multilingual models of
comparable size while being faster at infer-
2
All language pairs have the same sampling probability,
regardless of their training data size.
arXiv:2210.11621v1 [cs.CL] 20 Oct 2022