
2. RELATED WORK
Visual prompting.
Originated from the idea of in-context
learning or prompting in natural language processing (NLP)
[27
–
30], VP was first proposed in Bahng et. al. [21] for vi-
sion models. Before formalizing VP in Bahng et. al. [21],
the underlying prompting technique has also been devised in
computer vision (CV) with different naming. For example,
VP is closely related to adversarial reprogramming or model
reprogramming [22
–
24, 31
–
33], which focused on altering the
functionality of a fixed, pre-trained model across domains by
augmenting test-time examples with an additional (universal)
input perturbation pattern. Unadversarial learning also enjoys
the similar idea to VP. In [26], unadversarial examples that
perturb original ones using ‘prompting’ templates were intro-
duced to improve out-of-distribution generalization. Yet, the
problem of VP for adversarial defense is under-explored.
Adversarial defense.
The lack of adversarial robustness is
a weakness of ML models. Adversarial defense, such as ad-
versarial detection [19, 34
–
38] and robust training [2, 6, 9, 10,
18, 39], is a current research focus. In particular, adversar-
ial training (AT) [1] is the most widely-used defense strat-
egy and has inspired many recent advances in adversarial
defense [12, 13, 20, 40
–
42]. However, these AT-type defenses
(with the goal of robustness-enhanced model training) are
computationally intensive due to min-max optimization over
model parameters. To reduce the computation overhead of
robust training, the problem of test-time defense arises [14],
which aims to robustify a given model via lightweight unad-
versarial input perturbations (a.k.a input purification) [15, 43]
or minor modifications to the fixed model [44,45]. In different
kinds of test-time defenses, the most relevant work to ours is
anti-adversarial perturbation [17].
3. PROBLEM STATEMENT
Visual prompting.
We describe the problem setup of VP
following Bahng et. al. [21, 23
–
25]. Specifically, let
Dtr
denote a training set for supervised learning, where
(x, y)∈
Dtr
signifies a training sample with feature
x
and label
y
. And
let
δ
be a visual prompt to be designed. The prompted input is
then given by
x+δ
with respect to (w.r.t.)
x
. Different from the
problem of adversarial attack generation that optimizes
δ
for
erroneous prediction, VP drives
δ
to minimize the performance
loss `of a pre-trained model θ. This leads to
minimize
δ
E(x,y)∈Dtr [`(x+δ;y, θ)]
subject to δ∈C,(1)
where
`
denotes prediction error given the training data
(x, y)
and base model
θ
, and
C
is a perturbation constraint. Following
Bahng et. al. [21, 23, 24],
C
restricts
δ
to let
x+δ∈[0,1]
for any
x
. Projected gradient descent (PGD) [1, 26] can then
be applied to solving problem (1). In the evaluation,
δ
is
integrated into test data to improve the prediction ability of
θ
.
Adversarial visual prompting.
Inspired by the usefulness of
VP to improve model generalization [21, 24], we ask:
(AVP problem)
Can VP (1) be extended to robustify
θ
against adversarial attacks?
At the first glance, the AVP problem seems trivial if we specify
the performance loss `as the adversarial training loss [1, 2]:
`adv (x+δ;y, θ)=maximize
x′∶∥x′−x∥∞≤
`(x′+δ;y, θ),(2)
where
x′
denotes the adversarial input that lies in the
`∞
-norm
ball centered at xwith radius >0.
Recall from (1) that the conventional VP requests
δ
to be
universal across training data. Thus, we term universal AVP
(U-AVP) the following problem by integrating (1) with (2):
minimize
δ∶δ∈CλE(x,y)∈Dtr [`(x+δ;y, θ)]+
E(x,y)∈Dtr [`adv (x+δ;y, θ)] (U-AVP)
where
λ>0
is a regularization parameter to strike a balance
between generalization and adversarial robustness [2].
Fig. 1
:Example of designing U-AVP
for adversarial defense on (CIFAR-10,
ResNet18), measured by robust accu-
racy against PGD attacks [1] of differ-
ent steps. The robust accuracy of
0
steps is the standard accuracy.
The problem (U-AVP)
can be effectively solved
using a standard min-
max optimization method,
which involves two alter-
nating optimization rou-
tines: inner maximization
and outer minimization.
The former generates ad-
versarial examples as AT,
and the latter produces the
visual prompt
δ
like (1).
At test time, the effective-
ness of
δ
is measured from
two aspects: (1) standard
accuracy, i.e., the accuracy
of
δ
-integrated benign examples, and (2) robust accuracy, i.e.,
the accuracy of
δ
-integrated adversarial examples (against the
victim model
θ
). Despite the succinctness of (U-AVP), Fig. 1
shows its ineffectiveness to defend against adversarial attacks.
Compared to the vanilla VP (1), it also suffers a significant
standard accuracy drop (over
50%
in Fig. 1 corresponding to
0
PGD attack steps) and robust accuracy is only enhanced by a
small margin (around
18%
against PGD attacks). The negative
results in Fig. 1 are not quite surprising since a data-agnostic
input prompt
δ
has limited learning capacity to enable adver-
sarial defense. Thus, it is non-trivial to tackle the problem of
AVP.
4. CLASS-WISE ADVERSARIAL VISUAL PROMPT
No free lunch for class-wise visual prompts.
A direct ex-
tension of (U-AVP) is to introduce multiple adversarial visual
prompts, each of which corresponds to one class in the training
set
Dtr
. If we split
Dtr
into class-wise training sets
{D(i)
tr }N
i=1