
Realistic Decision-Based Attacks on Machine Learning Systems 2
sors. This cost can also be amortized across many generated
adversarial examples. We find that even the least efficient
preprocessor-aware attack outperforms all unaware attacks.
Learning the system’s preprocessing pipeline is thus more
important than devising an efficient standalone attack.
2. Background and Related Work
Adversarial Examples. Adversarial examples are inputs
designed to fool a machine learning classifier (Biggio et al.,
2013; Szegedy et al., 2014; Goodfellow et al., 2015). For
some classifier
f
, an example
x
has an adversarial example
x′=x+δ
if
f(x)̸=f(x′)
, where
δ
is a small perturbation
under some
ℓp
-norm, i.e.,
∥δ∥p≤ϵ
. Adversarial examples
can be constructed either in the white-box setting (where the
adversary uses gradient descent to produce the perturbation
δ
) (Carlini & Wagner, 2017; Madry et al., 2018), or more
realistically, in the black-box setting (where the adversary
uses just query access to the system) (Papernot et al., 2017;
Chen et al., 2017; Brendel et al., 2018a). Our paper focuses
on this black-box setting with ℓ2-norm perturbations.
Decision-based can generate adversarial examples with only
query access to the remote model’s decisions (i.e., the output
class
y←f(x)
). These attacks typically work by finding
the decision boundary between the original image and a
target label of interest and then walking along the decision
boundary to reduce the total distortion (Brendel et al., 2018a;
Cheng et al., 2020a; Chen et al., 2020; Li et al., 2020).
It has been shown that decision-based attacks should operate
at the lowest-dimensional input space possible. For exam-
ple, QEBA (Li et al., 2020) improves upon HSJA (Chen
et al., 2020) by constructing adversarial examples in a lower-
dimensional embedding space. This phenomenon will help
explain some of the results we observe, where we find that
high-dimensional images require more queries to attack.
Adversarial examples need not exploit the classifier itself.
Image scaling attacks (Quiring et al., 2020) construct a
high-resolution image
x
so that after resizing to a smaller
ˆx
, the low resolution image is visually dissimilar to
x
. As
a result, any accurate classifier will (correctly) classify the
high-resolution image and the low-resolution image differ-
ently. Gao et al. (2022) consider the image-scaling attack in
conjunction with a classifier similar to our setting. However,
our work applies to arbitrary preprocessors, not limited to
resizing, and we also propose an extraction attack to unveil
the deployed preprocessor in the first place.
Preprocessing defenses. A number of proposed defenses
against adversarial examples preprocess inputs before classi-
fication (Guo et al., 2018; Song et al., 2018). Unfortunately,
these defenses are largely ineffective in a white-box set-
ting (Athalye et al., 2018; Tramer et al., 2020; Sitawarin
et al., 2022). Surprisingly, recent work has shown that
defending against existing decision-based attacks with pre-
processors is quite simple. Aithal & Li (2022); Qin et al.
(2021) show that adding small amounts of random noise
to inputs impedes all current attacks. This suggests that
there may be a significant gap between the capabilities of
white-box and black-box attacks when preprocessors are
present.
Model Stealing Attacks. To improve the efficacy of black-
box attacks, we make use of techniques from model stealing
attacks (Tram
`
er et al., 2016). These attacks aim to create
a ML model that closely mimics the behavior of a remote
model (Jagielski et al., 2020). Our goal is slightly different
as we only aim to “steal” the system’s preprocessor and
use this knowledge to mount stronger evasion attacks. For
this, we leverage techniques that have been used to extract
functionally equivalent models, which exactly match the
behavior of the remote model on all inputs (Milli et al.,
2019; Rolnick & Kording, 2020; Carlini et al., 2020).
3. Setup and Threat Model
3.1. Notation
We denote an unperturbed input image in the original space
as
xo∈ Xo:= [0,1]so×so
and a processed image in the
model space as
xm∈ Xm⊆[0,1]sm×sm
. The original
size
so
can be the same or different from the target size
sm
.
A preprocessor
t:Xo→ Xm
maps
xo
to
xm:=t(xo)
.
For instance, a resizing preprocessor that maps an image
of size
256 ×256
pixels to
224 ×224
pixels means that
so= 256
,
sm= 224
, and
Xm= [0,1]224×224
. As another
example, an 8-bit quantization restricts
Xm
to a discrete
space of
{0,1/255,2/255,...,1}sm×sm
and
so=sm
.
The classifier, excluding the preprocessor, is represented by
f:Xm→ Y
where
Y
is the hard label space. Finally, the
entire classification pipeline is denoted by f◦t:Xo→ Y.
3.2. Threat Model
The key distinguishing factor between previous works and
ours is that we consider a preprocessing pipeline as part
of the victim system. In other words, the adversary cannot
simply run an attack algorithm on the model input space.
We thus follow in the direction of Pierazzi et al. (2020) and
Gao et al. (2022) who develop attacks that work end-to-end,
as opposed to just attacking a standalone model. To do
this, we develop strategies to “bypass” the preprocessors
(Section 4) and to reverse-engineer which preprocessors are
being used (Section 6). Our threat model is:
•
The adversary has black-box, query-based access to the
victim model and can query the model on any input and
observe the output label
y∈ Y
. The adversary has a
limited query budget per input. The adversary knows
nothing else about the system.
2