Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models Lifu Tu and Caiming Xiong and Yingbo Zhou

2025-05-02 0 0 1.99MB 8 页 10玖币
侵权投诉
Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
Understanding With Multilingual Language Models
Lifu Tu and Caiming Xiong and Yingbo Zhou
Salesforce AI Research
{ltu,cxiong,yingbo.zhou}@salesforce.com
Abstract
Pre-trained multilingual language models
show significant performance gains for zero-
shot cross-lingual model transfer on a wide
range of natural language understanding
(NLU) tasks. Previously, for zero-shot cross-
lingual evaluation, pre-trained models are only
fine-tuned on English data and tested on a va-
riety of target languages. In this paper, we
do cross-lingual evaluation on various NLU
tasks (sentence classification, sequence label-
ing, question answering) using prompt-tuning
and compare it with fine-tuning. The results
show that prompt tuning achieves much better
cross-lingual transfer than fine-tuning across
datasets, with only 0.1% to 0.3% tuned param-
eters. Additionally, we demonstrate through
the analysis that prompt tuning can have better
cross-lingual transferability of representations
on downstream tasks with better aligned deci-
sion boundaries1.
1 Introduction
Large Multilingual language models (Pires et al.,
2019;Wu and Dredze,2019;Conneau et al.,
2020) show surprisingly impressive zero-shot cross-
lingual transfer on NLP tasks, even though they are
only trained from monolingual corpora. Recently,
large-scale benchmarks such as XTREME (Hu
et al.,2020) and XGLUE (Liang et al.,2020) are
introduced for cross-lingual evaluation.
In a cross-lingual transfer setting, models are
only fine-tuned on the task-specific annotations
in one language and evaluated in other languages.
During fine-tuning, pre-trained language models
are used for initialization and the entire model pa-
rameters are tuned on downstream tasks. While
fine-tuning obtains strong performance, it is inef-
ficient. Also as shown in (Hu et al.,2020), the
cross-lingual transfer gap between the performance
1
Code is available at
https://github.com/
salesforce/MPT
on the English test set and all other languages is
large even with the best baseline XLM-R (Conneau
et al.,2020).
Recently, prompt tuning, where only a small
amount of additional parameters (i.e. prompts) is
added and tuned, but the original model is kept
frozen. Much fewer parameters or no parameters
are tuned and thus the training is a lot more ef-
ficient. Prompt tuning still performs worse than
fine-tuning in lots of NLP tasks(Brown et al.,2020;
Shin et al.,2020;Zhong et al.,2021). More re-
cently, Li and Liang (2021); Lester et al. (2021);
Hambardzumyan et al. (2021) indicate prompt tun-
ing is competitive with fine tuning on some of the
NLU tasks. Language model capacity (e.g., 10
billion parameters) is a key ingredient for these
approaches to succeed. More recently, (Liu et al.,
2022) shows prompt tuning can also be compara-
ble on several hard monolingual sequence labeling
tasks such as extractive question answers.
In this paper, we aim to investigate the effect of
prompt tuning in cross-lingual tasks.We freeze the
entire multilingual language model and tune task
prompts on the English training set for downstream
tasks (sentence classification, structure prediction,
question answering). Even with medium size mul-
tilingual language model (less than 1 billion param-
eters), prompt tuning achieves much higher perfor-
mance than fine-tuning on various NLU tasks.
According to the analysis results, prompt tun-
ing does fewer changes to sentence representations
than fine-tuning and keeps good cross-lingual sen-
tence representations. We also find that the decision
boundaries of different language sentence represen-
tations after prompt tuning on English data are al-
most aligned well. However, these decision bound-
aries of different languages after fine-tuning are a
large difference. These aligned decision boundaries
can lead to stronger cross-lingual transfer.
This work sheds light on the strong cross-lingual
ability of prompt tuning. Our results suggest
arXiv:2210.12360v2 [cs.CL] 13 Dec 2022
prompt tuning is better than fine-tuning on cross-
lingual transfer. Our contributions are summarized
as follows: we show that prompt tuning can per-
form much better as compared to fine-tuning for
cross-lingual transfer; we also show prompt tuning
works better in the case of the cross-lingual transfer
due to the relative small robust changes it brings to
the originally learned representations.
2 Prompt-Tuning for Cross-Lingual
Tasks
Multilingual Language Models.
In the past
years, lots of pre-trained multilingual language
models come out: mBERT, XLM (CONNEAU and
Lample,2019), XLM-R (Conneau et al.,2020),
etc. XLM-R (Conneau et al.,2020) significantly
outperforms multilingual BERT (mBERT; Devlin
et al.,2019) on a variety of cross-lingual bench-
marks XTREME (Hu et al.,2020). In some pre-
vious work (Luo et al.,2021;Zhang et al.,2019),
XLM-R is also used for initialization to do another
round of pretraining with parallel data to get the
stronger cross-lingual ability. Previously, in the
cross-lingual evaluation, models are fine-tuned on
the English training data but evaluated on all tar-
get languages. As far as we know, we are the first
to explore prompt tuning on several hard multilin-
gual NLP tasks including structure prediction and
question answering
Figure 1: Two different approaches for cross-lingual
evaluation when using large multilingual language
model. Left: In fine-tuning, all model parameters are
tuned on English task data. This setting is used in cross-
lingual evaluation before. Right: In prompt tuning,
only small ratio parameters are tuned. We use prefix
prompts and use layer prompts in our experiments.
Prompt Tuning.
Fine-tuning on large pre-
trained language models leads to strong perfor-
mance on downstream tasks, however, it is memory-
consuming and lots of parameters need to save for
each task. In prompt tuning, only a small part of
the parameters ( e.g., prompts or task classifier
) are tuned during learning. However, it usually
performs not as good as compared to fine-tuning.
Recently, Lester et al. (2021) find prompt tuning
can be better than fine-tuning when the model
size is not extremely large (10 billion parameters).
Prefix-tuning (Li and Liang,2021) obtains compa-
rable performance for natural language generation
tasks. Liu et al. (2022) shows prompt tuning can be
matched to fine-tuning on language understanding
tasks even at hard sequence tagging tasks.
We investigate prompt tuning on cross-lingual
understanding on a pre-trained multilingual lan-
guage model. The framework is shown in Figure 1.
Our setting is similar to Li and Liang (2021); Liu
et al. (2022). The continuous prompts are added
as prefix tokens and tuned during learning. In the
implementation, the prompts are operated as past
keys and values in each transformer layer. Each
transformer layer has separated prompts. These
continuous prompts are optimized, but multilingual
language model parameters are frozen.
3 Experiments Setup
3.1 Datasets.
We perform experiments on four datasets included
in XTREME: cross-lingual natural language infer-
ence (XNLI; Conneau et al.,2018), cross-lingual
adversarial dataset for paraphrase identification
(PAWS-X; Yang et al.,2019), part-of-speech tag-
ging on the Universal Dependencies (UD-POS;
Nivre et al.,2018), cross-lingual question answer-
ing on XQuAD (Artetxe et al.,2020) and TyDiQA-
GoldP (Clark et al.,2020). Three categories of
downstream tasks are included: (1) sentence clas-
sification); (2) structure prediction; (3) question
answering.
3.2 Training Details.
Our frozen models are built on the top of the pre-
trained XLM-R checkpoint of LARGE size with
about 560M parameters. Previous work (Hu et al.,
2020) shows it achieves stronger performance than
mBERT
2
. All our experiments were run with Hug-
gingface (Wolf et al.,2020). More details are in
the appendix.
Prompt Length.
Prompt length usually plays an
important role in prompt tuning. In our experi-
ments, we treat this as a hyper-parameter. Longer
2Some preliminary results are obtained with mBERT.
摘要:

Prompt-TuningCanBeMuchBetterThanFine-TuningonCross-lingualUnderstandingWithMultilingualLanguageModelsLifuTuandCaimingXiongandYingboZhouSalesforceAIResearch{ltu,cxiong,yingbo.zhou}@salesforce.comAbstractPre-trainedmultilinguallanguagemodelsshowsignicantperformancegainsforzero-shotcross-lingualmodelt...

展开>> 收起<<
Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models Lifu Tu and Caiming Xiong and Yingbo Zhou.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.99MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注