Preprocessors Matter Realistic Decision-Based Attacks on Machine Learning Systems Chawin Sitawarin1Florian Tram er2Nicholas Carlini3 Abstract

2025-05-06 0 0 7.65MB 25 页 10玖币
侵权投诉
Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
Chawin Sitawarin 1Florian Tram`
er 2Nicholas Carlini 3
Abstract
Decision-based attacks construct adversarial
examples against a machine learning (ML)
model by making only hard-label queries. These
attacks have mainly been applied directly to
standalone neural networks. However, in practice,
ML models are just one component of a larger
learning system. We find that by adding a
single preprocessor in front of a classifier,
state-of-the-art query-based attacks are up to
seven
×
less effective at attacking a prediction
pipeline than at attacking the model alone.
We explain this discrepancy by the fact that
most preprocessors introduce some notion of
invariance to the input space. Hence, attacks
that are unaware of this invariance inevitably
waste a large number of queries to re-discover or
overcome it. We therefore develop techniques
to (i) reverse-engineer the preprocessor and
then (ii) use this extracted information to attack
the end-to-end system. Our preprocessors
extraction method requires only a few hundreds
queries, and our preprocessor-aware attacks
recover the same efficacy as when attacking the
model alone. The code can be found at
https:
//github.com/google-research/
preprocessor-aware-black-box-attack
.
1. Introduction
Machine learning is widely used in security-critical systems,
for example for detecting abusive, harmful or otherwise
unsafe online content (Waseem et al., 2017; Clarifai; Jha &
Mamidi, 2017). It is critical that such systems are robust
against adversaries who seeks to evade them.
Yet, an extensive body of work has shown that an adver-
1
Department of Computer Science, University of California,
Berkeley, USA. Work partially done while the author was at
Google.
2
ETH Z
¨
urich, Z
¨
urich, Switzerland.
3
Google DeepMind,
Mountain View, USA. Correspondence to: Chawin Sitawarin
<chawins@berkeley.edu>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
sary can fool machine learning models with adversarial
examples (Biggio et al., 2013; Szegedy et al., 2014). Most
prior work focuses on white-box attacks, where an adversary
has perfect knowledge of the entire machine learning sys-
tem (Carlini & Wagner, 2017). Yet, real adversaries rarely
have this level of access (Tram
`
er et al., 2019), and must
thus instead resort to black-box attacks (Chen et al., 2017).
Decision-based attacks (Brendel et al., 2018a) are a partic-
ularly practical attack vector, as these attacks only require
the ability to query a target model and observe its decisions.
However, existing decision-based attacks (Brendel et al.,
2018b; Cheng et al., 2020a; Chen et al., 2020; Li et al.,
2020) have primarily been evaluated against standalone ML
models “in the lab”, thereby ignoring the components of
broader learning systems that are used in practice. While
some decision-based attacks have been demonstrated on
production systems as a proof-of-concept (e.g., Ilyas et al.
(2018); Brendel et al. (2018a); Li et al. (2020)), it is not
well understood how these attacks perform on end-to-end
learning systems compared to standalone models.
We show that existing decision-based attacks are signif-
icantly less effective against end-to-end systems com-
pared to standalone machine learning models. For exam-
ple, a standard decision-based attack can evade a ResNet
image classifier on ImageNet with an average
2
-distortion
of
3.7
(defined formally later). Yet, if we instead attack
an end-to-end learning system that simply preprocesses the
classifier’s input before classifying it—e.g., by resizing or
compressing the image—the attack achieves an average
2
distortion of
28.5
a
7×
increase! We further find that
extensive hyperparameter tuning and running the attacks for
more iterations fail to resolve this issue. We thus argue that
existing decision-box attacks have fundamental limitations
that make them sub-optimal in practice.
To remedy this, we develop improved attacks that achieve
the same success rate when attacking systems with un-
known preprocessors, as when attacking standalone
models. Our attacks combine decision-based attacks with
techniques developed for model extraction (Tram
`
er et al.,
2016). Our attacks first query the system to reverse-engineer
the preprocessor(s) used in the input pipeline, and then
mount a modified preprocessor-aware decision-based attack.
Our extraction procedure is efficient and often requires only
a few hundred queries to identify commonly used preproces-
1
arXiv:2210.03297v2 [cs.CR] 20 Jul 2023
Realistic Decision-Based Attacks on Machine Learning Systems 2
sors. This cost can also be amortized across many generated
adversarial examples. We find that even the least efficient
preprocessor-aware attack outperforms all unaware attacks.
Learning the system’s preprocessing pipeline is thus more
important than devising an efficient standalone attack.
2. Background and Related Work
Adversarial Examples. Adversarial examples are inputs
designed to fool a machine learning classifier (Biggio et al.,
2013; Szegedy et al., 2014; Goodfellow et al., 2015). For
some classifier
f
, an example
x
has an adversarial example
x=x+δ
if
f(x)̸=f(x)
, where
δ
is a small perturbation
under some
p
-norm, i.e.,
δpϵ
. Adversarial examples
can be constructed either in the white-box setting (where the
adversary uses gradient descent to produce the perturbation
δ
) (Carlini & Wagner, 2017; Madry et al., 2018), or more
realistically, in the black-box setting (where the adversary
uses just query access to the system) (Papernot et al., 2017;
Chen et al., 2017; Brendel et al., 2018a). Our paper focuses
on this black-box setting with 2-norm perturbations.
Decision-based can generate adversarial examples with only
query access to the remote model’s decisions (i.e., the output
class
yf(x)
). These attacks typically work by finding
the decision boundary between the original image and a
target label of interest and then walking along the decision
boundary to reduce the total distortion (Brendel et al., 2018a;
Cheng et al., 2020a; Chen et al., 2020; Li et al., 2020).
It has been shown that decision-based attacks should operate
at the lowest-dimensional input space possible. For exam-
ple, QEBA (Li et al., 2020) improves upon HSJA (Chen
et al., 2020) by constructing adversarial examples in a lower-
dimensional embedding space. This phenomenon will help
explain some of the results we observe, where we find that
high-dimensional images require more queries to attack.
Adversarial examples need not exploit the classifier itself.
Image scaling attacks (Quiring et al., 2020) construct a
high-resolution image
x
so that after resizing to a smaller
ˆx
, the low resolution image is visually dissimilar to
x
. As
a result, any accurate classifier will (correctly) classify the
high-resolution image and the low-resolution image differ-
ently. Gao et al. (2022) consider the image-scaling attack in
conjunction with a classifier similar to our setting. However,
our work applies to arbitrary preprocessors, not limited to
resizing, and we also propose an extraction attack to unveil
the deployed preprocessor in the first place.
Preprocessing defenses. A number of proposed defenses
against adversarial examples preprocess inputs before classi-
fication (Guo et al., 2018; Song et al., 2018). Unfortunately,
these defenses are largely ineffective in a white-box set-
ting (Athalye et al., 2018; Tramer et al., 2020; Sitawarin
et al., 2022). Surprisingly, recent work has shown that
defending against existing decision-based attacks with pre-
processors is quite simple. Aithal & Li (2022); Qin et al.
(2021) show that adding small amounts of random noise
to inputs impedes all current attacks. This suggests that
there may be a significant gap between the capabilities of
white-box and black-box attacks when preprocessors are
present.
Model Stealing Attacks. To improve the efficacy of black-
box attacks, we make use of techniques from model stealing
attacks (Tram
`
er et al., 2016). These attacks aim to create
a ML model that closely mimics the behavior of a remote
model (Jagielski et al., 2020). Our goal is slightly different
as we only aim to “steal” the system’s preprocessor and
use this knowledge to mount stronger evasion attacks. For
this, we leverage techniques that have been used to extract
functionally equivalent models, which exactly match the
behavior of the remote model on all inputs (Milli et al.,
2019; Rolnick & Kording, 2020; Carlini et al., 2020).
3. Setup and Threat Model
3.1. Notation
We denote an unperturbed input image in the original space
as
xo∈ Xo:= [0,1]so×so
and a processed image in the
model space as
xm∈ Xm[0,1]sm×sm
. The original
size
so
can be the same or different from the target size
sm
.
A preprocessor
t:Xo→ Xm
maps
xo
to
xm:=t(xo)
.
For instance, a resizing preprocessor that maps an image
of size
256 ×256
pixels to
224 ×224
pixels means that
so= 256
,
sm= 224
, and
Xm= [0,1]224×224
. As another
example, an 8-bit quantization restricts
Xm
to a discrete
space of
{0,1/255,2/255,...,1}sm×sm
and
so=sm
.
The classifier, excluding the preprocessor, is represented by
f:Xm→ Y
where
Y
is the hard label space. Finally, the
entire classification pipeline is denoted by ft:Xo→ Y.
3.2. Threat Model
The key distinguishing factor between previous works and
ours is that we consider a preprocessing pipeline as part
of the victim system. In other words, the adversary cannot
simply run an attack algorithm on the model input space.
We thus follow in the direction of Pierazzi et al. (2020) and
Gao et al. (2022) who develop attacks that work end-to-end,
as opposed to just attacking a standalone model. To do
this, we develop strategies to “bypass” the preprocessors
(Section 4) and to reverse-engineer which preprocessors are
being used (Section 6). Our threat model is:
The adversary has black-box, query-based access to the
victim model and can query the model on any input and
observe the output label
y∈ Y
. The adversary has a
limited query budget per input. The adversary knows
nothing else about the system.
2
Realistic Decision-Based Attacks on Machine Learning Systems 3
Figure 1: Illustration of our Bypassing Attack with resizing as the preprocessor as a comparison to the unaware or
preprocessor-oblivious attack. The red and the green arrows denote the query submitted by the attack and the output returned
by the MLaaS pipeline, respectively. The attack phase of our Bypassing Attack first resizes the input image to the correct
size used by the target pipeline. This allows any attack algorithm to operate on the model input space directly. The recovery
phase then finds the adversarial example in the original space that maps to the one found during the attack phase.
The adversary wants to misclassify as many perturbed
inputs as possible (either targeted and untargeted), while
minimizing the perturbation size—measured by Euclidean
distance in the original input space Xo.
The victim system accepts inputs of any dimension, and
the desired model input size is obtained by cropping
and/or resizing as part of an image preprocessing pipeline.
4. Preprocessor-Aware Attacks
Decision-based attacks often query a model on many nearby
points, e.g., to approximate the local geometry of the bound-
ary. Since most preprocessors are not injective functions,
nearby points in the original input space might map onto
the same processed image. Preprocessing thus makes the
model’s output invariant to some input changes. This can
cause the attack to waste queries and prevent it from learning
information about the target model.
4.1. Bypassing Attack
Our Bypassing Attack in Algorithm 1 avoids these invari-
ances by circumventing the preprocessor entirely. Figure 1
illustrates our attack with a resizing preprocesssor (e.g.,
1024 224
). To allow the Bypassing Attack to query the
model directly, we first map the input image (
xo∈ Xo
) to
the preprocessed space (
t(xo)∈ Xm
). Then, in the Attack
Phase, we execute an off-the-shelf decision-based attack
directly on this preprocessed image (xadv
m∈ Xm).
Finally, after completing the attack, we recover the adver-
sarial image in the original space (
xadv
o∈ Xo
) from
xadv
m
.
We call this step the Recovery Phase. It finds an adversarial
example with minimal perturbation in the original space, by
solving the following optimization problem:
arg min
zo∈Xo
zoxo2
2s.t. t(zo) = xadv
m.(1)
Algorithm 1 Outline of Bypassing Attack. This exam-
ple is built on top of a gradient-approximation-based at-
tack algorithm (e.g., HSJA, QEBA), but it is compati-
ble with any black-box attack.
ApproxGrad()
and
AttackUpdate()
are unmodified gradient approxima-
tion and perturbation update functions from the base attack.
Uis a distribution of vectors on a uniform unit sphere.
Input: Image x, label y, classifier f, preprocessor t
Output: Adversarial examples xadv
xt(x)#Initialization
#Attack Phase: run an attack algorithm of choice
for i= 1 to num steps do
˜
X← {x+αub}B
b=1 where ub∼ U
xSApproxGrad(ft,˜
X,y)
xAttackUpdate(x,xS)
end for
#Recovery Phase: exactly recover
xadv
in original input space
xadv ExactRecovery(t, x)
return xadv
4.1.1. CROPPING
Because almost all image classifiers operate on square im-
ages (Wightman, 2019), one of the most common prepro-
cessing operations is to first crop the image to a square. In
practice, this means that any pixels on the edge of the image
are completely ignored by the classifier. Our Bypassing
Attack exploits this fact by simply removing these cropped
pixels, simply running an off-the-shelf attack in the cropped
space. For a more formal statement, see Appendix B.1.
4.1.2. RESIZING
Image resizing is a ubiquitous preprocessing step in any
vision system, as most classifiers are trained only on images
of a specific size. We begin by considering the special case
3
Realistic Decision-Based Attacks on Machine Learning Systems 4
Figure 2: Illustration of the Biased-Gradient Attack with quantization as the preprocessor. Biased-Gradient Attack cannot
directly operate on the model input space like Bypassing Attack. Rather, it takes advantage of the preprocessor knowledge
by modifying a specific attack but still operates in the original space.
of resizing with “nearest-neighbor interpolation”, which
downsizes images by a factor
k
simply by selecting only
1
out of every block of
k
pixels. This resize operation is
conceptually similar to cropping, and thus the intuition be-
hind our attack is the same: if we know which pixels are
retained by the preprocessor, we can avoid wasting perturba-
tion budget and queries on pixels that are discarded. Other
interpolation methods for resizing, e.g., bilinear or bicubic,
work in a similar way, and can all be expressed as a linear
transform, i.e., xm=tres(xo) = Mresxofor so> s1.
The attack phase for resizing is exactly the same as that of
cropping. The adversary simply runs an attack algorithm of
their choice on the model space
Xm
. The main difference
comes in the recovery phase which amounts to solving the
following optimization problem:
xadv
o= arg minzoRso×sozoxo2s.t. Mreszo=xadv
m.(2)
Quiring et al. (2020); Gao et al. (2022) solve a similar ver-
sion of this problem via a gradient-based algorithm. How-
ever, we show that there exists a closed-form solution for
the global optimum. Since the constraint in Eqn. (2) is an
underdetermined linear system, this problem is analogous
to finding a minimum-norm solution, given by:
xadv
o=xo+δ
o=xo+ (Mres)+xadv
mxm.(3)
Here,
(·)+
represents the Moore-Penrose pseudo-inverse.
We defer the formal derivation to Appendix B.2.
Limitation. We have demonstrated how to bypass two
very common preprocessors—cropping and resizing—but
not all can be bypassed in this way. Our Bypassing At-
tack assumes (A1) the preprocessors are idempotent, i.e.,
t(t(x)) = t(x)
, and (A2) the preprocessor’s output space
is continuous. Most common preprocessing functions are
idempotent: e.g., quantizing an already quantized image
makes no difference. For preprocessors that do not satisfy
(A2), e.g., quantization whose output space is discrete, we
propose an alternative attack in the next section.
4.2. Biased-Gradient Attacks
We now turn our attention to more general preprocessors that
cannot be bypassed without modifying the search space—
for example quantization, discretizes a continuous space.
Quantization is one of the most common preprocessors an
adversary has to overcome since all common image for-
mats (e.g., PNG or JPEG) discretize the pixel values to 8
bits. However, prior black-box attacks ignore this fact and
operate in the continuous domain.
We thus propose the Biased-Gradient Attack in Algorithm 2.
Unlike the Bypassing Attack, this attack operates in the
original space. Instead of applying a black-box attack as
is, the Biased-Gradient Attack modifies the base attack in
order to bias queries toward directions that the preprocessor
is more sensitive to. The intuition is that while it is hard to
completely avoid the invariance of the preprocessor, we can
encourage the attack to explore directions that result in large
changes in the output space of the preprocessing function.
Our Biased-Gradient Attack also consists of an attack and
recovery phase. The attack phase makes two modifications
to an underlying gradient approximation attack (e.g., HSJA,
QEBA) which we explain below. The recovery phase sim-
ply solves Equation (1) with a gradient-based method, by
relaxing the constraint using a Lagrange multiplier (since
closed-formed solutions do not exist in general). For this,
we defer the details to Appendix C. Figure 2 illustrates the
Biased-Gradient Attack for a quantization preprocessor.
(i) Biased Gradient Approximation: We modify the gradient
approximation step to account for the preprocessor. First,
consider the adversary’s loss function defined as
S(x):=(maxc∈Y\{y}fc(x)fy(x)(untargeted)
fy(x)maxc∈Y\{y}fc(x)(targeted)
(4)
where
(x, y)
is the input, and
y̸=y
is the target label.
Attacks such as HSJA and QEBA estimate the gradient of
4
Realistic Decision-Based Attacks on Machine Learning Systems 5
Algorithm 2 Outline of Biased-Gradient Attack built on top
of gradient-approximation-based attack algorithm.
Input: Image x, label y, classifier f, preprocessor t
Output: Adversarial examples xadv
xx#No special initialization
#Attack Phase: run modified attack
for i= 1 to num steps do
#Biased gradient approximation
˜
X← {t(x+αub)}B
b=1 where ub∼ U
t(x)SApproxGrad(ft,˜
X,y)
¯
xS¯
t(x)S·t(x)
x #Backprop through t
xAttackUpdate(x,xS)
end for
#Recovery Phase: optimization-based recover
xadv
in original
space (works for any differentiable t)
xadv OptRecovery(t, x)
return xadv
S(x)
by applying finite-differences to the quantity
ϕ(x):=
sign(S(x))
which can be measured by querying the model’s
label. The attack samples uniformly random unit vectors
{ub}B
b=1, scales them a hyper-parameter α, and computes
xS(x, α)1
B
B
X
b=1
ϕ(t(x+αub))ub,(5)
We then perform a change-of-variables to obtain a gradient
estimate with respect to t(x)instead of x:
1
B
B
X
b=1
ϕ(t(x+αub))ub=1
B
B
X
b=1
ϕ(t(x) + α
bu
b)ub(6)
where
α
b=t(x+αub)t(x)2
, and
u
b= (t(x+αub)
t(x))
b
. Notice that
α
bu
b
corresponds to a random per-
turbation in the model space. Thus, we can “bypass” the
preprocessor and approximate gradients in the model space
instead by substituting ubwith u
bin Equation (6).
¯
t(x)S(x, α):=1
BPB
b=1 ϕ(t(x) + α
bu
b)u
b≈ ∇t(x)S(x, α).(7)
So instead of querying the ML system with inputs
x+αub
,
we use
t(x+αub) = t(x)+α
bu
b
which is equivalent to pre-
applying the preprocessor to the queries. If the preprocessor
is idempotent, the model
f
sees the same processed input in
both cases. This gradient estimator is biased because
u
b
de-
pends on
t
. Concretely, the distribution of
u
b
is concentrated
around directions that “survive” the preprocessor.
(ii) Backpropagate Gradients through the Preprocessor: The
gradient estimate
¯
t(x)S
in Eqn. (7) is w.r.t. the model
space, instead of the original input space where the attack
operates. Hence, we can backpropagate
¯
t(x)S
through
t
according to the chain rule,
¯
xS=xt(x)·¯
t(x)S
where
xt(x)
is the Jacobian matrix of the preprocessor
t
w.r.t. the
original space. In our experiments, we use a differentiable
version of quantization and JPEG compression by Shin &
Song (2017) so the Jacobian matrix exists.
5. Attack Experiments
5.1. Setup
Model. Similarly to previous works (Brendel et al., 2018a),
we evaluate our attacks on a ResNet-18 (He et al., 2016)
trained on the ImageNet dataset (Deng et al., 2009). The
model is publicly available in the popular
timm
pack-
age (Wightman, 2019).
Off-the-shelf attacks. We consider four different attacks,
Boundary Attack (Brendel et al., 2018a), Sign-OPT (Cheng
et al., 2020a), HopSkipJump Attack (HJSA) (Chen et al.,
2020), and QEBA (Li et al., 2020). The first three attacks
have both targeted and untargeted versions while QEBA is
only used as a targeted attack. We also compare our attacks
to the baseline preprocessor-aware attack, SNS (Gao et al.,
2022). As this attack only considers resizing, we adapt it to
the other preprocessors we consider.
Attack hyperpameters. As we discuss in Section 7.2 and
Appendix E.2, a change in preprocessor has a large impact
on the optimal choice of hyperparameters for each attack.
We thus sweep hyperparameters for all attacks and report
results for the best choice.
Metrics. We report the average perturbation size (
2
-norm)
of adversarial examples found by each attack—referred to
as the “adversarial distance” in short. Smaller adversarial
distance means a stronger attack.
Appendix A contains full detail of all our experiments.
5.2. Bypassing Attack Results
Cropping. We consider a common operation that center
crops an image of size
256 ×256
pixels down to
224 ×224
pixels, i.e.,
so= 256, sm= 224
. In Table 1, our Bypassing
approach improves all of the baseline preprocessor-unaware
attacks. The adversarial distance found by the baseline
is about 8–16% higher than that of the Bypassing Attack
counterpart across all settings. This difference is very close
to the portion of the border pixels that are cropped out
(
p2562/224210.14
), suggesting that the cropping-
unaware attacks do waste perturbation on these invariant
pixels. Our Bypassing Attack also recovers about the same
mean adversarial distance as the case where there is no
preprocessor (first row of Table 1).
Resizing. We study the three most common interpolation
or resampling techniques, i.e., nearest, bilinear, and bicubic.
For an input size of
1024 ×1024
(see Table 1), a reason-
5
摘要:

PreprocessorsMatter!RealisticDecision-BasedAttacksonMachineLearningSystemsChawinSitawarin1FlorianTram`er2NicholasCarlini3AbstractDecision-basedattacksconstructadversarialexamplesagainstamachinelearning(ML)modelbymakingonlyhard-labelqueries.Theseattackshavemainlybeenapplieddirectlytostandaloneneuraln...

展开>> 收起<<
Preprocessors Matter Realistic Decision-Based Attacks on Machine Learning Systems Chawin Sitawarin1Florian Tram er2Nicholas Carlini3 Abstract.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:25 页 大小:7.65MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注