Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models Lifu Tu and Caiming Xiong and Yingbo Zhou

2025-05-02 0 0 1.99MB 8 页 10玖币

侵权投诉

Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual

Understanding With Multilingual Language Models

Lifu Tu and Caiming Xiong and Yingbo Zhou

Salesforce AI Research

{ltu,cxiong,yingbo.zhou}@salesforce.com

Abstract

Pre-trained multilingual language models

show signiﬁcant performance gains for zero-

shot cross-lingual model transfer on a wide

range of natural language understanding

(NLU) tasks. Previously, for zero-shot cross-

lingual evaluation, pre-trained models are only

ﬁne-tuned on English data and tested on a va-

riety of target languages. In this paper, we

do cross-lingual evaluation on various NLU

tasks (sentence classiﬁcation, sequence label-

ing, question answering) using prompt-tuning

and compare it with ﬁne-tuning. The results

show that prompt tuning achieves much better

cross-lingual transfer than ﬁne-tuning across

datasets, with only 0.1% to 0.3% tuned param-

eters. Additionally, we demonstrate through

the analysis that prompt tuning can have better

cross-lingual transferability of representations

on downstream tasks with better aligned deci-

sion boundaries1.

1 Introduction

Large Multilingual language models (Pires et al.,

2019;Wu and Dredze,2019;Conneau et al.,

2020) show surprisingly impressive zero-shot cross-

lingual transfer on NLP tasks, even though they are

only trained from monolingual corpora. Recently,

large-scale benchmarks such as XTREME (Hu

et al.,2020) and XGLUE (Liang et al.,2020) are

introduced for cross-lingual evaluation.

In a cross-lingual transfer setting, models are

only ﬁne-tuned on the task-speciﬁc annotations

in one language and evaluated in other languages.

During ﬁne-tuning, pre-trained language models

are used for initialization and the entire model pa-

rameters are tuned on downstream tasks. While

ﬁne-tuning obtains strong performance, it is inef-

ﬁcient. Also as shown in (Hu et al.,2020), the

cross-lingual transfer gap between the performance

Code is available at

https://github.com/

salesforce/MPT

on the English test set and all other languages is

large even with the best baseline XLM-R (Conneau

et al.,2020).

Recently, prompt tuning, where only a small

amount of additional parameters (i.e. prompts) is

added and tuned, but the original model is kept

frozen. Much fewer parameters or no parameters

are tuned and thus the training is a lot more ef-

ﬁcient. Prompt tuning still performs worse than

ﬁne-tuning in lots of NLP tasks(Brown et al.,2020;

Shin et al.,2020;Zhong et al.,2021). More re-

cently, Li and Liang (2021); Lester et al. (2021);

Hambardzumyan et al. (2021) indicate prompt tun-

ing is competitive with ﬁne tuning on some of the

NLU tasks. Language model capacity (e.g., 10

billion parameters) is a key ingredient for these

approaches to succeed. More recently, (Liu et al.,

2022) shows prompt tuning can also be compara-

ble on several hard monolingual sequence labeling

tasks such as extractive question answers.

In this paper, we aim to investigate the effect of

prompt tuning in cross-lingual tasks.We freeze the

entire multilingual language model and tune task

prompts on the English training set for downstream

tasks (sentence classiﬁcation, structure prediction,

question answering). Even with medium size mul-

tilingual language model (less than 1 billion param-

eters), prompt tuning achieves much higher perfor-

mance than ﬁne-tuning on various NLU tasks.

According to the analysis results, prompt tun-

ing does fewer changes to sentence representations

than ﬁne-tuning and keeps good cross-lingual sen-

tence representations. We also ﬁnd that the decision

boundaries of different language sentence represen-

tations after prompt tuning on English data are al-

most aligned well. However, these decision bound-

aries of different languages after ﬁne-tuning are a

large difference. These aligned decision boundaries

can lead to stronger cross-lingual transfer.

This work sheds light on the strong cross-lingual

ability of prompt tuning. Our results suggest

arXiv:2210.12360v2 [cs.CL] 13 Dec 2022

prompt tuning is better than ﬁne-tuning on cross-

lingual transfer. Our contributions are summarized

as follows: we show that prompt tuning can per-

form much better as compared to ﬁne-tuning for

cross-lingual transfer; we also show prompt tuning

works better in the case of the cross-lingual transfer

due to the relative small robust changes it brings to

the originally learned representations.

2 Prompt-Tuning for Cross-Lingual

Tasks

Multilingual Language Models.

In the past

years, lots of pre-trained multilingual language

models come out: mBERT, XLM (CONNEAU and

Lample,2019), XLM-R (Conneau et al.,2020),

etc. XLM-R (Conneau et al.,2020) signiﬁcantly

outperforms multilingual BERT (mBERT; Devlin

et al.,2019) on a variety of cross-lingual bench-

marks XTREME (Hu et al.,2020). In some pre-

vious work (Luo et al.,2021;Zhang et al.,2019),

XLM-R is also used for initialization to do another

round of pretraining with parallel data to get the

stronger cross-lingual ability. Previously, in the

cross-lingual evaluation, models are ﬁne-tuned on

the English training data but evaluated on all tar-

get languages. As far as we know, we are the ﬁrst

to explore prompt tuning on several hard multilin-

gual NLP tasks including structure prediction and

question answering

Figure 1: Two different approaches for cross-lingual

evaluation when using large multilingual language

model. Left: In ﬁne-tuning, all model parameters are

tuned on English task data. This setting is used in cross-

lingual evaluation before. Right: In prompt tuning,

only small ratio parameters are tuned. We use preﬁx

prompts and use layer prompts in our experiments.

Prompt Tuning.

Fine-tuning on large pre-

trained language models leads to strong perfor-

mance on downstream tasks, however, it is memory-

consuming and lots of parameters need to save for

each task. In prompt tuning, only a small part of

the parameters ( e.g., prompts or task classiﬁer

) are tuned during learning. However, it usually

performs not as good as compared to ﬁne-tuning.

Recently, Lester et al. (2021) ﬁnd prompt tuning

can be better than ﬁne-tuning when the model

size is not extremely large (10 billion parameters).

Preﬁx-tuning (Li and Liang,2021) obtains compa-

rable performance for natural language generation

tasks. Liu et al. (2022) shows prompt tuning can be

matched to ﬁne-tuning on language understanding

tasks even at hard sequence tagging tasks.

We investigate prompt tuning on cross-lingual

understanding on a pre-trained multilingual lan-

guage model. The framework is shown in Figure 1.

Our setting is similar to Li and Liang (2021); Liu

et al. (2022). The continuous prompts are added

as preﬁx tokens and tuned during learning. In the

implementation, the prompts are operated as past

keys and values in each transformer layer. Each

transformer layer has separated prompts. These

continuous prompts are optimized, but multilingual

language model parameters are frozen.

3 Experiments Setup

3.1 Datasets.

We perform experiments on four datasets included

in XTREME: cross-lingual natural language infer-

ence (XNLI; Conneau et al.,2018), cross-lingual

adversarial dataset for paraphrase identiﬁcation

(PAWS-X; Yang et al.,2019), part-of-speech tag-

ging on the Universal Dependencies (UD-POS;

Nivre et al.,2018), cross-lingual question answer-

ing on XQuAD (Artetxe et al.,2020) and TyDiQA-

GoldP (Clark et al.,2020). Three categories of

downstream tasks are included: (1) sentence clas-

siﬁcation); (2) structure prediction; (3) question

answering.

3.2 Training Details.

Our frozen models are built on the top of the pre-

trained XLM-R checkpoint of LARGE size with

about 560M parameters. Previous work (Hu et al.,

2020) shows it achieves stronger performance than

mBERT

. All our experiments were run with Hug-

gingface (Wolf et al.,2020). More details are in

the appendix.

Prompt Length.

Prompt length usually plays an

important role in prompt tuning. In our experi-

ments, we treat this as a hyper-parameter. Longer

2Some preliminary results are obtained with mBERT.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Prompt-TuningCanBeMuchBetterThanFine-TuningonCross-lingualUnderstandingWithMultilingualLanguageModelsLifuTuandCaimingXiongandYingboZhouSalesforceAIResearch{ltu,cxiong,yingbo.zhou}@salesforce.comAbstractPre-trainedmultilinguallanguagemodelsshowsignicantperformancegainsforzero-shotcross-lingualmodelt...

展开>> 收起<<

Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models Lifu Tu and Caiming Xiong and Yingbo Zhou.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models Lifu Tu and Caiming Xiong and Yingbo Zhou

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: