Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks Jaehoon Oh

2025-05-02 0 0 620.53KB 8 页 10玖币

侵权投诉

Synergy with Translation Artifacts for

Training and Inference in Multilingual Tasks

Jaehoon Oh∗

Graduate School of DS, KAIST

jhoon.oh@kaist.ac.kr

Jongwoo Ko∗, Se-Young Yun

Graduate School of AI, KAIST

{jongwoo.ko, yunseyoung}@kaist.ac.kr

Abstract

Translation has played a crucial role in im-

proving the performance on multilingual tasks:

(1) to generate the target language data from

the source language data for training and (2)

to generate the source language data from the

target language data for inference. However,

prior works have not considered the use of

both translations simultaneously. This paper

shows that combining them can synergize the

results on various multilingual sentence classi-

ﬁcation tasks. We empirically ﬁnd that trans-

lation artifacts stylized by translators are the

main factor of the performance gain. Based on

this analysis, we adopt two training methods,

SupCon and MixUp, considering translation

artifacts. Furthermore, we propose a cross-

lingual ﬁne-tuning algorithm called MUSC,

which uses SupCon and MixUp jointly and im-

proves the performance. Our code is available

at https://github.com/jongwooko/MUSC.

1 Introduction

Large-scale pre-trained multilingual language mod-

els (Devlin et al.,2019;Conneau and Lample,2019;

Huang et al.,2019;Conneau et al.,2020;Luo et al.,

2021) have shown promising transferability in zero-

shot cross-lingual transfer (ZSXLT), where pre-

trained language models (PLMs) are ﬁne-tuned

using a labeled task-speciﬁc dataset from a rich-

resource source language (e.g., English or Span-

ish) and then evaluated on zero-resource target lan-

guages. Multilingual PLMs yield a universal repre-

sentation space across different languages, thereby

improving multilingual task performance (Pires

et al.,2019;Chen et al.,2019). Recent work has en-

hanced cross-lingual transferability by reducing the

discrepancies between languages based on trans-

lation approaches during ﬁne-tuning (Fang et al.,

2021;Zheng et al.,2021;Yang et al.,2022). Our

paper focuses on when translated datasets are avail-

able for cross-lingual transfer (XLT).

∗Equal contribution

Conneau et al. (2018) provided two translation-

based XLT baselines:

translate-train

and

translate-test

. The former ﬁne-tunes a mul-

tilingual PLM (e.g., multilingual BERT) using the

original source language and machine-translated

target languages simultaneously and then evaluates

it on the target languages. Meanwhile, the latter

ﬁne-tunes a source language-based PLM (e.g., En-

glish BERT) using the original source language

and then evaluates it on the machine-translated

source language. Both baselines improve the per-

formance compared to ZSXLT; however, they are

sensitive to the translator, including translation arti-

facts, which are characteristics stylized by the trans-

lator (Conneau et al.,2018;Artetxe et al.,2020).

Artetxe et al. (2020) showed that matching

the types of text (i.e., origin or translationese

)

between training and inference is essential due

to the presence of translation artifacts under

translate-test

. Recently, Yu et al. (2022) pro-

posed a training method that projects the original

and translated texts into the same representation

space under

translate-train

. However, prior

works have not considered the two baselines simul-

taneously.

In this paper, we combine

translate-train

and

translate-test

using a pre-trained multilin-

gual BERT, to improve the performance. Next, we

identify that ﬁne-tuning using the translated tar-

get dataset is required to improve the performance

on the translated source dataset due to translation

artifacts even if the languages for training and infer-

ence are different. Finally, to consider translation

artifacts during ﬁne-tuning, we adopt two training

methods, supervised contrastive learning (SupCon;

Khosla et al. 2020) and MixUp (Zhang et al.,2018)

and propose MUSC, which combines them and im-

proves the performance for multilingual sentence

classiﬁcation tasks.

Original text is directly written by humans. Translationese

includes both human-translated and machine-translated texts.

arXiv:2210.09588v1 [cs.CL] 18 Oct 2022

Table 1: Notations of datasets.

Notation Description

Strn given source dataset for training

Ttrn given target dataset for training

TMT

trn

machine-translated target dataset

from Strn for training

TBT

trn

back-translated target dataset

from Ttrn for training

Ttst given target dataset for inference

SMT

tst

machine-translated source dataset

from Ttst for inference

Table 2: Algorithm comparison.

Algorithm PLM Training Inference

ZSXLT Multilingual Strn Ttst

translate-train Multilingual Strn &TMT

trn Ttst

translate-test English Strn SMT

tst

translate-all Multilingual Strn &TMT

trn Ttst &SMT

tst

2 Scope of the Study

In this study, four datasets are used: MARC and

MLDoc for single sentence classiﬁcation, and

PAWSX and XNLI from XTREME (Hu et al.,

2020) for sentence pair classiﬁcation. The de-

tails of datasets are provided in Appendix A. Each

dataset consists of the source dataset for training

Strn

and the target dataset for inference

Ttst

, where

Strn

is original and

Ttst

is original (for MARC

and MLDoc) or human-translated (for PAWSX and

XNLI). For MARC and MLDoc, the original target

dataset for training Ttrn is additionally given.

We use the given translated datasets

TMT

trn

for

PAWSX and XNLI. However, for MARC and ML-

Doc, the translated datasets are not given. There-

fore, we use an m2m_100_418M translator (Fan

et al.,2021) from the open-source library

EasyNMT2

to create the translated datasets.

TMT

trn

is translated

from

Strn

(i.e.,

Strn → T MT

trn

), and

TBT

trn

is back-

translated from

Ttrn

(i.e.,

Ttrn → SMT

trn → T BT

trn

;

Sennrich et al. 2016). Similarly, for inference,

SMT

tst

is translated from

Ttst

. The notations used in this

paper are listed in Table 1.

We use the pre-trained cased multilingual

BERT (Devlin et al.,2019) from HuggingFace

Transformers (Wolf et al.,2020) and use accuracy

as a metric. Detailed information for ﬁne-tuning is

provided in Appendix B.

2https://github.com/UKPLab/EasyNMT

Table 3: Results according to the inference datasets

(Acc. in %). Strn and TMT

trn are used for training. The

number in the parenthesis of MLDoc is the number of

training samples. ‘Ens.’ indicates the ensemble of re-

sults on the two different test datasets in the inference.

XNLI results are reported in Appendix C.

Dataset Inference EN ZH FR DE RU ES IT KO JA Avg.

MARC

Ttst 65.2 47.8 55.4 59.1 - 55.8 - - 47.8 55.1

SMT

tst 65.2 44.9 54.4 59.8 - 55.4 - - 44.9 54.5

Ens. 65.2 49.3 56.1 61.2 - 56.2 - - 48.8 56.1

MLDoc

(1000)

Ttst 91.1 77.4 74.5 84.0 67.9 74.4 65.0 - 74.4 76.1

SMT

tst 91.1 77.6 79.0 88.1 61.3 76.4 72.3 - 67.3 76.6

Ens. 91.1 78.9 78.3 87.9 66.1 76.2 71.2 - 74.9 78.1

MLDoc

(10000)

Ttst 97.4 82.6 91.1 91.0 72.2 85.9 78.0 - 72.6 83.8

SMT

tst 97.4 86.4 92.0 92.6 72.4 88.2 79.0 - 71.0 84.9

Ens. 97.4 87.7 92.2 92.6 72.1 88.0 80.6 - 75.9 85.8

PAWSX

Ttst 94.5 85.0 91.2 89.0 - 90.5 - 83.1 83.3 88.1

SMT

tst 94.5 84.5 91.7 90.6 - 91.3 - 83.1 80.9 88.1

Ens. 94.5 86.1 92.0 91.2 - 91.6 - 85.3 82.8 89.1

3 Original and Translationese Ensemble

In this section, we demonstrate that the two base-

lines,

translate-train

and

translate-test

are easily combined to improve performance,

which we call it

translate-all

. Table 2describes

the differences between algorithms.

Table 3presents the results according to the infer-

ence dataset when the models are ﬁne-tuned using

Strn

and

TMT

trn

. Inference on

Ttst

is a general way

to evaluate the models, i.e.,

translate-train

In addition, we evaluate the models on

SMT

tst

translate-test

. Furthermore, we ensemble the

two results from different test datasets by averag-

ing the predicted predictions, i.e.,

translate-all

because averaging the predictions over models or

data points is widely used to improve predictive

performance and uncertainty estimation of mod-

els (Gontijo-Lopes et al.,2022;Kim et al.,2020a).

From Table 3, it is shown that even if the mul-

tilingual PLMs are ﬁne-tuned with

Strn

and

TMT

trn

the performance on the translated source data

SMT

tst

is competitive with that on the target data

Ttst

. Fur-

thermore, ensemble inference increases the perfor-

mance on all datasets. This can be interpreted as the

effectiveness of the test time augmentation (Kim

et al.,2020a;Ashukha et al.,2021), because the

results on the two test datasets,

Ttst

and

SMT

tst

(aug-

mented from Ttst), are combined.

To explain the changes in inferences via test time

augmentation, we describe the predicted probabil-

ity values on the correct label when the models are

evaluated on

Ttst

and

SMT

tst

, as depicted in Figure 1.

The green and orange dots represent the beneﬁts

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SynergywithTranslationArtifactsforTrainingandInferenceinMultilingualTasksJaehoonOhGraduateSchoolofDS,KAISTjhoon.oh@kaist.ac.krJongwooKo,Se-YoungYunGraduateSchoolofAI,KAIST{jongwoo.ko,yunseyoung}@kaist.ac.krAbstractTranslationhasplayedacrucialroleinim-provingtheperformanceonmultilingualtasks:(1)tog...

展开>> 收起<<

Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks Jaehoon Oh.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Synergy with Translation Artifacts for Training and Inference in Multilingual Tasks Jaehoon Oh

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: