Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks Jaehoon Oh

2025-05-02 0 0 620.53KB 8 页 10玖币
侵权投诉
Synergy with Translation Artifacts for
Training and Inference in Multilingual Tasks
Jaehoon Oh
Graduate School of DS, KAIST
jhoon.oh@kaist.ac.kr
Jongwoo Ko, Se-Young Yun
Graduate School of AI, KAIST
{jongwoo.ko, yunseyoung}@kaist.ac.kr
Abstract
Translation has played a crucial role in im-
proving the performance on multilingual tasks:
(1) to generate the target language data from
the source language data for training and (2)
to generate the source language data from the
target language data for inference. However,
prior works have not considered the use of
both translations simultaneously. This paper
shows that combining them can synergize the
results on various multilingual sentence classi-
fication tasks. We empirically find that trans-
lation artifacts stylized by translators are the
main factor of the performance gain. Based on
this analysis, we adopt two training methods,
SupCon and MixUp, considering translation
artifacts. Furthermore, we propose a cross-
lingual fine-tuning algorithm called MUSC,
which uses SupCon and MixUp jointly and im-
proves the performance. Our code is available
at https://github.com/jongwooko/MUSC.
1 Introduction
Large-scale pre-trained multilingual language mod-
els (Devlin et al.,2019;Conneau and Lample,2019;
Huang et al.,2019;Conneau et al.,2020;Luo et al.,
2021) have shown promising transferability in zero-
shot cross-lingual transfer (ZSXLT), where pre-
trained language models (PLMs) are fine-tuned
using a labeled task-specific dataset from a rich-
resource source language (e.g., English or Span-
ish) and then evaluated on zero-resource target lan-
guages. Multilingual PLMs yield a universal repre-
sentation space across different languages, thereby
improving multilingual task performance (Pires
et al.,2019;Chen et al.,2019). Recent work has en-
hanced cross-lingual transferability by reducing the
discrepancies between languages based on trans-
lation approaches during fine-tuning (Fang et al.,
2021;Zheng et al.,2021;Yang et al.,2022). Our
paper focuses on when translated datasets are avail-
able for cross-lingual transfer (XLT).
Equal contribution
Conneau et al. (2018) provided two translation-
based XLT baselines:
translate-train
and
translate-test
. The former fine-tunes a mul-
tilingual PLM (e.g., multilingual BERT) using the
original source language and machine-translated
target languages simultaneously and then evaluates
it on the target languages. Meanwhile, the latter
fine-tunes a source language-based PLM (e.g., En-
glish BERT) using the original source language
and then evaluates it on the machine-translated
source language. Both baselines improve the per-
formance compared to ZSXLT; however, they are
sensitive to the translator, including translation arti-
facts, which are characteristics stylized by the trans-
lator (Conneau et al.,2018;Artetxe et al.,2020).
Artetxe et al. (2020) showed that matching
the types of text (i.e., origin or translationese
1
)
between training and inference is essential due
to the presence of translation artifacts under
translate-test
. Recently, Yu et al. (2022) pro-
posed a training method that projects the original
and translated texts into the same representation
space under
translate-train
. However, prior
works have not considered the two baselines simul-
taneously.
In this paper, we combine
translate-train
and
translate-test
using a pre-trained multilin-
gual BERT, to improve the performance. Next, we
identify that fine-tuning using the translated tar-
get dataset is required to improve the performance
on the translated source dataset due to translation
artifacts even if the languages for training and infer-
ence are different. Finally, to consider translation
artifacts during fine-tuning, we adopt two training
methods, supervised contrastive learning (SupCon;
Khosla et al. 2020) and MixUp (Zhang et al.,2018)
and propose MUSC, which combines them and im-
proves the performance for multilingual sentence
classification tasks.
1
Original text is directly written by humans. Translationese
includes both human-translated and machine-translated texts.
arXiv:2210.09588v1 [cs.CL] 18 Oct 2022
Table 1: Notations of datasets.
Notation Description
Strn given source dataset for training
Ttrn given target dataset for training
TMT
trn
machine-translated target dataset
from Strn for training
TBT
trn
back-translated target dataset
from Ttrn for training
Ttst given target dataset for inference
SMT
tst
machine-translated source dataset
from Ttst for inference
Table 2: Algorithm comparison.
Algorithm PLM Training Inference
ZSXLT Multilingual Strn Ttst
translate-train Multilingual Strn &TMT
trn Ttst
translate-test English Strn SMT
tst
translate-all Multilingual Strn &TMT
trn Ttst &SMT
tst
2 Scope of the Study
In this study, four datasets are used: MARC and
MLDoc for single sentence classification, and
PAWSX and XNLI from XTREME (Hu et al.,
2020) for sentence pair classification. The de-
tails of datasets are provided in Appendix A. Each
dataset consists of the source dataset for training
Strn
and the target dataset for inference
Ttst
, where
Strn
is original and
Ttst
is original (for MARC
and MLDoc) or human-translated (for PAWSX and
XNLI). For MARC and MLDoc, the original target
dataset for training Ttrn is additionally given.
We use the given translated datasets
TMT
trn
for
PAWSX and XNLI. However, for MARC and ML-
Doc, the translated datasets are not given. There-
fore, we use an m2m_100_418M translator (Fan
et al.,2021) from the open-source library
EasyNMT2
to create the translated datasets.
TMT
trn
is translated
from
Strn
(i.e.,
Strn T MT
trn
), and
TBT
trn
is back-
translated from
Ttrn
(i.e.,
Ttrn → SMT
trn T BT
trn
;
Sennrich et al. 2016). Similarly, for inference,
SMT
tst
is translated from
Ttst
. The notations used in this
paper are listed in Table 1.
We use the pre-trained cased multilingual
BERT (Devlin et al.,2019) from HuggingFace
Transformers (Wolf et al.,2020) and use accuracy
as a metric. Detailed information for fine-tuning is
provided in Appendix B.
2https://github.com/UKPLab/EasyNMT
Table 3: Results according to the inference datasets
(Acc. in %). Strn and TMT
trn are used for training. The
number in the parenthesis of MLDoc is the number of
training samples. ‘Ens. indicates the ensemble of re-
sults on the two different test datasets in the inference.
XNLI results are reported in Appendix C.
Dataset Inference EN ZH FR DE RU ES IT KO JA Avg.
MARC
Ttst 65.2 47.8 55.4 59.1 - 55.8 - - 47.8 55.1
SMT
tst 65.2 44.9 54.4 59.8 - 55.4 - - 44.9 54.5
Ens. 65.2 49.3 56.1 61.2 - 56.2 - - 48.8 56.1
MLDoc
(1000)
Ttst 91.1 77.4 74.5 84.0 67.9 74.4 65.0 - 74.4 76.1
SMT
tst 91.1 77.6 79.0 88.1 61.3 76.4 72.3 - 67.3 76.6
Ens. 91.1 78.9 78.3 87.9 66.1 76.2 71.2 - 74.9 78.1
MLDoc
(10000)
Ttst 97.4 82.6 91.1 91.0 72.2 85.9 78.0 - 72.6 83.8
SMT
tst 97.4 86.4 92.0 92.6 72.4 88.2 79.0 - 71.0 84.9
Ens. 97.4 87.7 92.2 92.6 72.1 88.0 80.6 - 75.9 85.8
PAWSX
Ttst 94.5 85.0 91.2 89.0 - 90.5 - 83.1 83.3 88.1
SMT
tst 94.5 84.5 91.7 90.6 - 91.3 - 83.1 80.9 88.1
Ens. 94.5 86.1 92.0 91.2 - 91.6 - 85.3 82.8 89.1
3 Original and Translationese Ensemble
In this section, we demonstrate that the two base-
lines,
translate-train
and
translate-test
,
are easily combined to improve performance,
which we call it
translate-all
. Table 2describes
the differences between algorithms.
Table 3presents the results according to the infer-
ence dataset when the models are fine-tuned using
Strn
and
TMT
trn
. Inference on
Ttst
is a general way
to evaluate the models, i.e.,
translate-train
.
In addition, we evaluate the models on
SMT
tst
like
translate-test
. Furthermore, we ensemble the
two results from different test datasets by averag-
ing the predicted predictions, i.e.,
translate-all
,
because averaging the predictions over models or
data points is widely used to improve predictive
performance and uncertainty estimation of mod-
els (Gontijo-Lopes et al.,2022;Kim et al.,2020a).
From Table 3, it is shown that even if the mul-
tilingual PLMs are fine-tuned with
Strn
and
TMT
trn
,
the performance on the translated source data
SMT
tst
is competitive with that on the target data
Ttst
. Fur-
thermore, ensemble inference increases the perfor-
mance on all datasets. This can be interpreted as the
effectiveness of the test time augmentation (Kim
et al.,2020a;Ashukha et al.,2021), because the
results on the two test datasets,
Ttst
and
SMT
tst
(aug-
mented from Ttst), are combined.
To explain the changes in inferences via test time
augmentation, we describe the predicted probabil-
ity values on the correct label when the models are
evaluated on
Ttst
and
SMT
tst
, as depicted in Figure 1.
The green and orange dots represent the benefits
摘要:

SynergywithTranslationArtifactsforTrainingandInferenceinMultilingualTasksJaehoonOhGraduateSchoolofDS,KAISTjhoon.oh@kaist.ac.krJongwooKo,Se-YoungYunGraduateSchoolofAI,KAIST{jongwoo.ko,yunseyoung}@kaist.ac.krAbstractTranslationhasplayedacrucialroleinim-provingtheperformanceonmultilingualtasks:(1)tog...

展开>> 收起<<
Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks Jaehoon Oh.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:620.53KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注