Contributions
(1) We propose a simple yet effec-
tive method Clip-Tuning, in which multiple frozen
subnetworks act as multi-view critics and provide
a mixture of informative rewards for gradient-free
prompt optimization (§4.1). The importance and
originality of this study are that it explores the new
direction of the exploitation of reward diversity
in gradient-free prompt optimization. (2) Empir-
ical results show that our method has surpassed
previous gradient-free prompt learning approaches
on seven natural language understanding (NLU)
benchmarks in few-shot settings (§5). Surprisingly,
the random search method can serve as an excellent
few-shot baseline to prime large PLMs. (3) Our
method sheds light on inference-only PLMs and
can be a good fit for commercial PLM providers to
build API-based features. Note that our method re-
quires API providers to support dropout operation,
whereas API users do not need to make any change
based on derivative-free prompt learning.
2 Related work
2.1 Prompt-based learning
Holding the promise of exploiting the few-shot
learning capability of large pre-trained models,
prompt-based learning has attracted extensive atten-
tion in recent years (Brown et al.,2020;Schick and
Schütze,2021a;Li and Liang,2021;Lester et al.,
2021;Sun et al.,2022). It primes the frozen PLMs
using a series of discrete natural language tokens
or continuous “soft prompts” to conduct various
downstream tasks. Early work employed exemplar
language templates to condition the PLMs for task-
specific prediction (Schick and Schütze,2021b;
Scao and Rush,2021). Such methods require man-
ual involvement of humans in the design of prompt
templates, making the continuous prompt a promis-
ing direction.
Prompt tuning
Prompt tuning approaches (Li
and Liang,2021;Lester et al.,2021;Liu et al.,
2021) prepend a string of continuous word embed-
dings as “virtual tokens” to prime the pre-trained
models, where it optimizes the continuous prompts
with backpropagation while freezing the model
weights of PLMs. These methods achieve parity
with full model tuning and even surpasses the fine-
tuning in few-shot settings.
Prompt search
There has been a surge of inter-
est in automatic prompt learning, which treats the
prompt as a parameter space to be optimized over.
One line of prompt search methods focuses on dis-
crete search space, i.e., natural language tokens.
Shin et al. (2020) employ a gradient-based method
to find the optimal trigger words to construct the
prompt. Prasad et al. (2022) use a gradient-free
edit-based search method to refine the instructional
language prompts, in which it produces the opti-
mal edited prompts given manually designed ones.
Another line in this direction is continuous prompt
search, where the prompt is optimized as “virtual
tokens” in the continuous parameter space. Sun
et al. (2022) adopt the Covariance Matrix Adap-
tation Evolutionary Strategy (CMA-ES) (Hansen
et al.,2003) to search over the intrinsic dimension
of prompts (Aghajanyan et al.,2021) with only ac-
cess to the inference API of PLMs. This approach
only requires the forward pass of PLMs without
the need for gradient backpropagation. This work
builds upon this line of research, targeting better
exploiting the over-parameterization of PLMs to
collect fine-grained rewards for search algorithms.
2.2 Derivative-free optimization
Derivative-free optimization targets the settings
that the derivative of the objective is unavailable or
unreliable to achieve. It iteratively optimizes the
parameter candidate by local hill-climbing in the
objective landscape. Suppose the objective func-
tion
f:A→R
for some set
A
, derivative-free op-
timization only uses the input
x
and its fitness
f(x)
after evaluation for iterative optimization. Exam-
ples include evolutionary strategies (Hansen et al.,
2003), Bayesian optimization (Frazier,2018), ran-
dom search (Zabinsky et al.,2009), and so forth. In
this work, we experimented with CMA-ES (Hansen
et al.,2003) and pure random search (Zabinsky
et al.,2009) algorithms.
3 Derivative-free prompt learning
Vanilla derivative-free prompt learning (Sun et al.,
2022) employs the model inference to evaluate the
fitness of candidate prompts for iterative prompt
learning using evolutionary algorithms. Firstly, it
prepends a series of soft prompt embeddings
P
to the input tokens
X
to feed the prompted input
[P;X]
into the frozen pre-trained transformers
f
parameterized by
θ
. The prompt
P=P0+P∆
is the summation of randomly initialized or pre-
trained prompt
P0∈RD
and prompt change
P∆∈RD
that is iteratively optimized by the
Covariance Matrix Adaptation Evolutionary Strat-