Clip-Tuning Towards Derivative-free Prompt Learning with a Mixture of Rewards Yekun Chai Shuohuan Wang

2025-04-29 1 0 677.2KB 10 页 10玖币

侵权投诉

Clip-Tuning: Towards Derivative-free Prompt Learning

with a Mixture of Rewards

Yekun Chai Shuohuan Wang

Yu Sun Hao Tian Hua Wu Haifeng Wang

Baidu

{chaiyekun,wangshuohuan}@baidu.com

{sunyu02,tianhao,wu_hua,wanghaifeng}@baidu.com

Abstract

Derivative-free prompt learning has emerged

as a lightweight alternative to prompt tun-

ing, which only requires model inference

to optimize the prompts. However, exist-

ing work did not take full advantage of the

over-parameterized characteristics of large pre-

trained language models (PLMs). In this paper,

we propose Clip-Tuning, a simple yet effective

method that adopts diverse frozen “thinned”

networks of PLMs to obtain a mixture of re-

wards and thus advance the derivative-free

prompt learning. The thinned networks consist

of all the hidden units that survive a station-

ary dropout strategy, whose inference predic-

tions reﬂect an ensemble of partial views over

prompted training samples. Our method out-

performs previous gradient-free prompt learn-

ing methods and achieves parity with gradient-

based counterparts on seven language under-

standing benchmarks under few-shot settings.

1 Introduction

Extensive research has shown that prompt tuning

achieves parity with model tuning (a.k.a., full ﬁne-

tuning) in few-shot scenarios (Li and Liang,2021;

Lester et al.,2021). However, prompt tuning that re-

lies on backpropagation from very deep pre-trained

transformers requires prohibitive computation and

time costs, especially those with billions of param-

eters (Brown et al.,2020;Rae et al.,2021;Wang

et al.,2021;Chowdhery et al.,2022). Meanwhile,

for many inference-API-based PLMs, researchers

either do not have full access to model weights

due to commercial restrictions, or cannot afford

the training of enormous parameters, which signiﬁ-

cantly limits the power of derivative-based tuning.

Therefore, derivative-free prompt learning has been

a promising alternative (Sun et al.,2022;Diao et al.,

2022).

By treating PLMs as black boxes, derivative-free

approaches seem to be a feasible solution to har-

nessing large PLMs. Sun et al. (2022) leveraged

evolutionary algorithms to optimize the continuous

prompts by repeatedly calling the inference APIs of

PLMs. It adopts the model performance over a spot

of samples as the optimization feedback. Never-

theless, few-shot demonstrations can only result in

sparse reward, preventing the prompts from taking

sufﬁcient informative signals. Hence, our goal is to

acquire a mixture of diversiﬁed rewards for prompt

optimization, using colossal PLMs with millions

or billions of parameters in few-shot settings.

Recent work on lottery ticket hypothesis (Frankle

and Carbin,2019;Chen et al.,2020) states that an

over-parameterized PLMs contain matching subnet-

works capable of reaching the full test performance

comparable to the original model. Then, Kobayashi

et al. (2022); Havasi et al. (2021) ﬁnd that ensem-

bling a mixture of subnetworks can improve the

diversity of model predictions and achieve the per-

formance gain. As such, the ensemble of subnet-

work predictions is a particularly interesting set-

ting to increase the diversity of learning signals for

derivative-free prompt optimization.

Since derivative-free prompt learning only con-

ducts model forward pass without backpropaga-

tion, clipping a large proportion of model weights

will heavily hurt the overall performance. Srivas-

tava et al. (2014) states that applying dropout to

a network amounts to “sampling” a thinned net-

work from it. Moreover, previous work (Gao

et al.,2021b;Liang et al.,2021) ﬁnds that standard

dropout can act as “minimal data augmentation” to

construct different sample representations. There-

fore, we employ dropout during model inference

to “sample” different “thinned” subnetworks and

diversify the data representations. Note that our

subnetworks are deterministic, while the original

dropout is random at each time. For each training

example, diverse subnetworks produce a variety of

hidden representations and ﬁtness rewards, which

diversiﬁes the learning feedback for gradient-free

prompt learning.

arXiv:2210.12050v1 [cs.CL] 21 Oct 2022

Contributions

(1) We propose a simple yet effec-

tive method Clip-Tuning, in which multiple frozen

subnetworks act as multi-view critics and provide

a mixture of informative rewards for gradient-free

prompt optimization (§4.1). The importance and

originality of this study are that it explores the new

direction of the exploitation of reward diversity

in gradient-free prompt optimization. (2) Empir-

ical results show that our method has surpassed

previous gradient-free prompt learning approaches

on seven natural language understanding (NLU)

benchmarks in few-shot settings (§5). Surprisingly,

the random search method can serve as an excellent

few-shot baseline to prime large PLMs. (3) Our

method sheds light on inference-only PLMs and

can be a good ﬁt for commercial PLM providers to

build API-based features. Note that our method re-

quires API providers to support dropout operation,

whereas API users do not need to make any change

based on derivative-free prompt learning.

2 Related work

2.1 Prompt-based learning

Holding the promise of exploiting the few-shot

learning capability of large pre-trained models,

prompt-based learning has attracted extensive atten-

tion in recent years (Brown et al.,2020;Schick and

Schütze,2021a;Li and Liang,2021;Lester et al.,

2021;Sun et al.,2022). It primes the frozen PLMs

using a series of discrete natural language tokens

or continuous “soft prompts” to conduct various

downstream tasks. Early work employed exemplar

language templates to condition the PLMs for task-

speciﬁc prediction (Schick and Schütze,2021b;

Scao and Rush,2021). Such methods require man-

ual involvement of humans in the design of prompt

templates, making the continuous prompt a promis-

ing direction.

Prompt tuning

Prompt tuning approaches (Li

and Liang,2021;Lester et al.,2021;Liu et al.,

2021) prepend a string of continuous word embed-

dings as “virtual tokens” to prime the pre-trained

models, where it optimizes the continuous prompts

with backpropagation while freezing the model

weights of PLMs. These methods achieve parity

with full model tuning and even surpasses the ﬁne-

tuning in few-shot settings.

Prompt search

There has been a surge of inter-

est in automatic prompt learning, which treats the

prompt as a parameter space to be optimized over.

One line of prompt search methods focuses on dis-

crete search space, i.e., natural language tokens.

Shin et al. (2020) employ a gradient-based method

to ﬁnd the optimal trigger words to construct the

prompt. Prasad et al. (2022) use a gradient-free

edit-based search method to reﬁne the instructional

language prompts, in which it produces the opti-

mal edited prompts given manually designed ones.

Another line in this direction is continuous prompt

search, where the prompt is optimized as “virtual

tokens” in the continuous parameter space. Sun

et al. (2022) adopt the Covariance Matrix Adap-

tation Evolutionary Strategy (CMA-ES) (Hansen

et al.,2003) to search over the intrinsic dimension

of prompts (Aghajanyan et al.,2021) with only ac-

cess to the inference API of PLMs. This approach

only requires the forward pass of PLMs without

the need for gradient backpropagation. This work

builds upon this line of research, targeting better

exploiting the over-parameterization of PLMs to

collect ﬁne-grained rewards for search algorithms.

2.2 Derivative-free optimization

Derivative-free optimization targets the settings

that the derivative of the objective is unavailable or

unreliable to achieve. It iteratively optimizes the

parameter candidate by local hill-climbing in the

objective landscape. Suppose the objective func-

tion

f:A→R

for some set

, derivative-free op-

timization only uses the input

and its ﬁtness

f(x)

after evaluation for iterative optimization. Exam-

ples include evolutionary strategies (Hansen et al.,

2003), Bayesian optimization (Frazier,2018), ran-

dom search (Zabinsky et al.,2009), and so forth. In

this work, we experimented with CMA-ES (Hansen

et al.,2003) and pure random search (Zabinsky

et al.,2009) algorithms.

3 Derivative-free prompt learning

Vanilla derivative-free prompt learning (Sun et al.,

2022) employs the model inference to evaluate the

ﬁtness of candidate prompts for iterative prompt

learning using evolutionary algorithms. Firstly, it

prepends a series of soft prompt embeddings

to the input tokens

to feed the prompted input

[P;X]

into the frozen pre-trained transformers

parameterized by

. The prompt

P=P0+P∆

is the summation of randomly initialized or pre-

trained prompt

P0∈RD

and prompt change

P∆∈RD

that is iteratively optimized by the

Covariance Matrix Adaptation Evolutionary Strat-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Clip-Tuning:TowardsDerivative-freePromptLearningwithaMixtureofRewardsYekunChaiShuohuanWangYuSunHaoTianHuaWuHaifengWangBaidu{chaiyekun,wangshuohuan}@baidu.com{sunyu02,tianhao,wu_hua,wanghaifeng}@baidu.comAbstractDerivative-freepromptlearninghasemergedasalightweightalternativetoprompttun-ing,whichonly...

展开>> 收起<<

Clip-Tuning Towards Derivative-free Prompt Learning with a Mixture of Rewards Yekun Chai Shuohuan Wang.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Clip-Tuning Towards Derivative-free Prompt Learning with a Mixture of Rewards Yekun Chai Shuohuan Wang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: