
task, i.e. “wmt21-dense-24-wide-en-X” and
“wmt21-dense-24-wide-X-en” which were pre-
trained for 7 languages Hausa (ha), Icelandic
(is), Japanese (ja), Czech (cs), Russian (ru),
Chinese (zh), German (de) to English (en),
and backward (Tran et al.,2021). We use a
well-prepared 250k pairs of English-Spanish
(en-es) clinical domain corpus and demonstrate
that not only it is possible to achieve successful
transfer-learning on this explicit new language
pair, i.e. the Spanish language is totally
unseen among the languages in the MPLM,
but also the domain knowledge transfer from
general and mixed domain to the clinical
domain is very successful. In comparison to
the massively MPLM (MMPLM) NLLB which
covers Spanish as a high-resource language at
its pre-training stage, our transfer-learning
model achieves very close evaluation scores
in most sub-tasks (clinical cases and clinical
terms translation) and even wins NLLB in
ontology concept translation task by the metric
COMET (Rei et al.,2020) using ClinSpEn2022
testing data at WMT22. This is a follow-up
work reporting further findings based on our
previous shared task participation (Han et al.,
2022).
2 Related Work
Regarding the early usage of special tokens
in NMT, Sennrich et al. (2016) designed the
token T from Latin Tu and V from Latin
Vos for familiar and polite indicators attached
to the source sentences towards English-
to-German NMT. Yamagishi et al. (2016)
designed tokens <all-active>, <all-passive>,
<reference> and <predict> to control of voice
of Japanese-to-English NMT; either they are
active, passive, reference aware or prediction
guided. Subsequently, Google’s MNMT system
designed target language indicators, e.g. <2en>
and <2jp> controlling the translation towards
English and Japanese respectively (Johnson
et al.,2017). Google’s MNMT also designed
mixed target language translation control, e.g.
(1-
α
)<2ko> +
α
<2jp> tells a mixed language
translation into Korean and Japanese with a
weighting mechanism. We take one step further
to use an existing language controller token
from a MPLM as a pseudo code to fine-tune
an external language translation model, which
was entirely not seen during the pre-training
stage.
Regarding transfer-learning applications for
downstream NLP tasks other than MT, Muller
et al. (2021) applied transfer learning from
MPLMs towards unseen languages of different
typologies on dependency parsing (DEP),
named entity recognition (NER), and part-
of-speech (POS) tagging. Ahuja et al. (2022)
carried out zero-shot transfer learning for
natural language inference (NLI) tasks such
as question answering.
In this paper, we ask this research question
(RQ): Can Massive Multilingual Pre-Trained
Language Models Create a Knowledge Space
Transferring to Entirely New Language (Pairs)
and New (clinical) Domains for Machine
Translation Task via Fine-Tuning?
3 Model Settings
To investigate into our RQ, we take Meta-
AI’s MNMT submission to WMT21 shared
task on news translation, i.e. the MMPLM
“wmt21-dense-24-wide-en-X” and “wmt21-dense-
24-wide-X-en” as our test-base, and we name
them as WMT21fb models (Tran et al.,
2021)
1
. They are conditional generation models
from the same structure of massive M2M-
100 (Fan et al.,2021) having a total number
of 4.7 billion parameters which demand high
computational cost for fine-tuning. WMT21fb
models were trained on mixed domain data
using “all available resources” they had, for
instances, from historical WMT challenges,
large-scale data mining, and their in-domain
back-translation. Then these models were fine-
tuned in news domain for 7 languages including
Hausa, Icelandic, Japanese, Czech, Russian,
Chinese, German from and to English.
The challenging language we choose is
Spanish, which did not appear in the training
stage of WMT21fb models. The fine-tuning
corpus we use is extracted from MeSpEn
(Villegas et al.,2018) clinical domain data,
of which we managed to extract 250k pairs
of English-Spanish segments after data
cleaning. They are from IBECS-descriptions,
IBECS-titles, MedlinePlus-health_topics-
titles, MedlinePlus-health_topics-descriptions,
1https://github.com/facebookresearch/fairseq/
tree/main/examples/wmt21