
Synergy with Translation Artifacts for
Training and Inference in Multilingual Tasks
Jaehoon Oh∗
Graduate School of DS, KAIST
jhoon.oh@kaist.ac.kr
Jongwoo Ko∗, Se-Young Yun
Graduate School of AI, KAIST
{jongwoo.ko, yunseyoung}@kaist.ac.kr
Abstract
Translation has played a crucial role in im-
proving the performance on multilingual tasks:
(1) to generate the target language data from
the source language data for training and (2)
to generate the source language data from the
target language data for inference. However,
prior works have not considered the use of
both translations simultaneously. This paper
shows that combining them can synergize the
results on various multilingual sentence classi-
fication tasks. We empirically find that trans-
lation artifacts stylized by translators are the
main factor of the performance gain. Based on
this analysis, we adopt two training methods,
SupCon and MixUp, considering translation
artifacts. Furthermore, we propose a cross-
lingual fine-tuning algorithm called MUSC,
which uses SupCon and MixUp jointly and im-
proves the performance. Our code is available
at https://github.com/jongwooko/MUSC.
1 Introduction
Large-scale pre-trained multilingual language mod-
els (Devlin et al.,2019;Conneau and Lample,2019;
Huang et al.,2019;Conneau et al.,2020;Luo et al.,
2021) have shown promising transferability in zero-
shot cross-lingual transfer (ZSXLT), where pre-
trained language models (PLMs) are fine-tuned
using a labeled task-specific dataset from a rich-
resource source language (e.g., English or Span-
ish) and then evaluated on zero-resource target lan-
guages. Multilingual PLMs yield a universal repre-
sentation space across different languages, thereby
improving multilingual task performance (Pires
et al.,2019;Chen et al.,2019). Recent work has en-
hanced cross-lingual transferability by reducing the
discrepancies between languages based on trans-
lation approaches during fine-tuning (Fang et al.,
2021;Zheng et al.,2021;Yang et al.,2022). Our
paper focuses on when translated datasets are avail-
able for cross-lingual transfer (XLT).
∗Equal contribution
Conneau et al. (2018) provided two translation-
based XLT baselines:
translate-train
and
translate-test
. The former fine-tunes a mul-
tilingual PLM (e.g., multilingual BERT) using the
original source language and machine-translated
target languages simultaneously and then evaluates
it on the target languages. Meanwhile, the latter
fine-tunes a source language-based PLM (e.g., En-
glish BERT) using the original source language
and then evaluates it on the machine-translated
source language. Both baselines improve the per-
formance compared to ZSXLT; however, they are
sensitive to the translator, including translation arti-
facts, which are characteristics stylized by the trans-
lator (Conneau et al.,2018;Artetxe et al.,2020).
Artetxe et al. (2020) showed that matching
the types of text (i.e., origin or translationese
1
)
between training and inference is essential due
to the presence of translation artifacts under
translate-test
. Recently, Yu et al. (2022) pro-
posed a training method that projects the original
and translated texts into the same representation
space under
translate-train
. However, prior
works have not considered the two baselines simul-
taneously.
In this paper, we combine
translate-train
and
translate-test
using a pre-trained multilin-
gual BERT, to improve the performance. Next, we
identify that fine-tuning using the translated tar-
get dataset is required to improve the performance
on the translated source dataset due to translation
artifacts even if the languages for training and infer-
ence are different. Finally, to consider translation
artifacts during fine-tuning, we adopt two training
methods, supervised contrastive learning (SupCon;
Khosla et al. 2020) and MixUp (Zhang et al.,2018)
and propose MUSC, which combines them and im-
proves the performance for multilingual sentence
classification tasks.
1
Original text is directly written by humans. Translationese
includes both human-translated and machine-translated texts.
arXiv:2210.09588v1 [cs.CL] 18 Oct 2022