Clip-Tuning Towards Derivative-free Prompt Learning with a Mixture of Rewards Yekun Chai Shuohuan Wang

2025-04-29 0 0 677.2KB 10 页 10玖币
侵权投诉
Clip-Tuning: Towards Derivative-free Prompt Learning
with a Mixture of Rewards
Yekun Chai Shuohuan Wang
Yu Sun Hao Tian Hua Wu Haifeng Wang
Baidu
{chaiyekun,wangshuohuan}@baidu.com
{sunyu02,tianhao,wu_hua,wanghaifeng}@baidu.com
Abstract
Derivative-free prompt learning has emerged
as a lightweight alternative to prompt tun-
ing, which only requires model inference
to optimize the prompts. However, exist-
ing work did not take full advantage of the
over-parameterized characteristics of large pre-
trained language models (PLMs). In this paper,
we propose Clip-Tuning, a simple yet effective
method that adopts diverse frozen “thinned”
networks of PLMs to obtain a mixture of re-
wards and thus advance the derivative-free
prompt learning. The thinned networks consist
of all the hidden units that survive a station-
ary dropout strategy, whose inference predic-
tions reflect an ensemble of partial views over
prompted training samples. Our method out-
performs previous gradient-free prompt learn-
ing methods and achieves parity with gradient-
based counterparts on seven language under-
standing benchmarks under few-shot settings.
1 Introduction
Extensive research has shown that prompt tuning
achieves parity with model tuning (a.k.a., full fine-
tuning) in few-shot scenarios (Li and Liang,2021;
Lester et al.,2021). However, prompt tuning that re-
lies on backpropagation from very deep pre-trained
transformers requires prohibitive computation and
time costs, especially those with billions of param-
eters (Brown et al.,2020;Rae et al.,2021;Wang
et al.,2021;Chowdhery et al.,2022). Meanwhile,
for many inference-API-based PLMs, researchers
either do not have full access to model weights
due to commercial restrictions, or cannot afford
the training of enormous parameters, which signifi-
cantly limits the power of derivative-based tuning.
Therefore, derivative-free prompt learning has been
a promising alternative (Sun et al.,2022;Diao et al.,
2022).
By treating PLMs as black boxes, derivative-free
approaches seem to be a feasible solution to har-
nessing large PLMs. Sun et al. (2022) leveraged
evolutionary algorithms to optimize the continuous
prompts by repeatedly calling the inference APIs of
PLMs. It adopts the model performance over a spot
of samples as the optimization feedback. Never-
theless, few-shot demonstrations can only result in
sparse reward, preventing the prompts from taking
sufficient informative signals. Hence, our goal is to
acquire a mixture of diversified rewards for prompt
optimization, using colossal PLMs with millions
or billions of parameters in few-shot settings.
Recent work on lottery ticket hypothesis (Frankle
and Carbin,2019;Chen et al.,2020) states that an
over-parameterized PLMs contain matching subnet-
works capable of reaching the full test performance
comparable to the original model. Then, Kobayashi
et al. (2022); Havasi et al. (2021) find that ensem-
bling a mixture of subnetworks can improve the
diversity of model predictions and achieve the per-
formance gain. As such, the ensemble of subnet-
work predictions is a particularly interesting set-
ting to increase the diversity of learning signals for
derivative-free prompt optimization.
Since derivative-free prompt learning only con-
ducts model forward pass without backpropaga-
tion, clipping a large proportion of model weights
will heavily hurt the overall performance. Srivas-
tava et al. (2014) states that applying dropout to
a network amounts to “sampling” a thinned net-
work from it. Moreover, previous work (Gao
et al.,2021b;Liang et al.,2021) finds that standard
dropout can act as “minimal data augmentation” to
construct different sample representations. There-
fore, we employ dropout during model inference
to “sample” different “thinned” subnetworks and
diversify the data representations. Note that our
subnetworks are deterministic, while the original
dropout is random at each time. For each training
example, diverse subnetworks produce a variety of
hidden representations and fitness rewards, which
diversifies the learning feedback for gradient-free
prompt learning.
arXiv:2210.12050v1 [cs.CL] 21 Oct 2022
Contributions
(1) We propose a simple yet effec-
tive method Clip-Tuning, in which multiple frozen
subnetworks act as multi-view critics and provide
a mixture of informative rewards for gradient-free
prompt optimization (§4.1). The importance and
originality of this study are that it explores the new
direction of the exploitation of reward diversity
in gradient-free prompt optimization. (2) Empir-
ical results show that our method has surpassed
previous gradient-free prompt learning approaches
on seven natural language understanding (NLU)
benchmarks in few-shot settings (§5). Surprisingly,
the random search method can serve as an excellent
few-shot baseline to prime large PLMs. (3) Our
method sheds light on inference-only PLMs and
can be a good fit for commercial PLM providers to
build API-based features. Note that our method re-
quires API providers to support dropout operation,
whereas API users do not need to make any change
based on derivative-free prompt learning.
2 Related work
2.1 Prompt-based learning
Holding the promise of exploiting the few-shot
learning capability of large pre-trained models,
prompt-based learning has attracted extensive atten-
tion in recent years (Brown et al.,2020;Schick and
Schütze,2021a;Li and Liang,2021;Lester et al.,
2021;Sun et al.,2022). It primes the frozen PLMs
using a series of discrete natural language tokens
or continuous “soft prompts” to conduct various
downstream tasks. Early work employed exemplar
language templates to condition the PLMs for task-
specific prediction (Schick and Schütze,2021b;
Scao and Rush,2021). Such methods require man-
ual involvement of humans in the design of prompt
templates, making the continuous prompt a promis-
ing direction.
Prompt tuning
Prompt tuning approaches (Li
and Liang,2021;Lester et al.,2021;Liu et al.,
2021) prepend a string of continuous word embed-
dings as “virtual tokens” to prime the pre-trained
models, where it optimizes the continuous prompts
with backpropagation while freezing the model
weights of PLMs. These methods achieve parity
with full model tuning and even surpasses the fine-
tuning in few-shot settings.
Prompt search
There has been a surge of inter-
est in automatic prompt learning, which treats the
prompt as a parameter space to be optimized over.
One line of prompt search methods focuses on dis-
crete search space, i.e., natural language tokens.
Shin et al. (2020) employ a gradient-based method
to find the optimal trigger words to construct the
prompt. Prasad et al. (2022) use a gradient-free
edit-based search method to refine the instructional
language prompts, in which it produces the opti-
mal edited prompts given manually designed ones.
Another line in this direction is continuous prompt
search, where the prompt is optimized as “virtual
tokens” in the continuous parameter space. Sun
et al. (2022) adopt the Covariance Matrix Adap-
tation Evolutionary Strategy (CMA-ES) (Hansen
et al.,2003) to search over the intrinsic dimension
of prompts (Aghajanyan et al.,2021) with only ac-
cess to the inference API of PLMs. This approach
only requires the forward pass of PLMs without
the need for gradient backpropagation. This work
builds upon this line of research, targeting better
exploiting the over-parameterization of PLMs to
collect fine-grained rewards for search algorithms.
2.2 Derivative-free optimization
Derivative-free optimization targets the settings
that the derivative of the objective is unavailable or
unreliable to achieve. It iteratively optimizes the
parameter candidate by local hill-climbing in the
objective landscape. Suppose the objective func-
tion
f:AR
for some set
A
, derivative-free op-
timization only uses the input
x
and its fitness
f(x)
after evaluation for iterative optimization. Exam-
ples include evolutionary strategies (Hansen et al.,
2003), Bayesian optimization (Frazier,2018), ran-
dom search (Zabinsky et al.,2009), and so forth. In
this work, we experimented with CMA-ES (Hansen
et al.,2003) and pure random search (Zabinsky
et al.,2009) algorithms.
3 Derivative-free prompt learning
Vanilla derivative-free prompt learning (Sun et al.,
2022) employs the model inference to evaluate the
fitness of candidate prompts for iterative prompt
learning using evolutionary algorithms. Firstly, it
prepends a series of soft prompt embeddings
P
to the input tokens
X
to feed the prompted input
[P;X]
into the frozen pre-trained transformers
f
parameterized by
θ
. The prompt
P=P0+P
is the summation of randomly initialized or pre-
trained prompt
P0RD
and prompt change
PRD
that is iteratively optimized by the
Covariance Matrix Adaptation Evolutionary Strat-
摘要:

Clip-Tuning:TowardsDerivative-freePromptLearningwithaMixtureofRewardsYekunChaiShuohuanWangYuSunHaoTianHuaWuHaifengWangBaidu{chaiyekun,wangshuohuan}@baidu.com{sunyu02,tianhao,wu_hua,wanghaifeng}@baidu.comAbstractDerivative-freepromptlearninghasemergedasalightweightalternativetoprompttun-ing,whichonly...

展开>> 收起<<
Clip-Tuning Towards Derivative-free Prompt Learning with a Mixture of Rewards Yekun Chai Shuohuan Wang.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:677.2KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注