Preprocessors Matter Realistic Decision-Based Attacks on Machine Learning Systems Chawin Sitawarin1Florian Tram er2Nicholas Carlini3 Abstract

2025-05-06 0 0 7.65MB 25 页 10玖币

侵权投诉

Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

Chawin Sitawarin 1Florian Tram`

er 2Nicholas Carlini 3

Abstract

Decision-based attacks construct adversarial

examples against a machine learning (ML)

model by making only hard-label queries. These

attacks have mainly been applied directly to

standalone neural networks. However, in practice,

ML models are just one component of a larger

learning system. We ﬁnd that by adding a

single preprocessor in front of a classiﬁer,

state-of-the-art query-based attacks are up to

seven

less effective at attacking a prediction

pipeline than at attacking the model alone.

We explain this discrepancy by the fact that

most preprocessors introduce some notion of

invariance to the input space. Hence, attacks

that are unaware of this invariance inevitably

waste a large number of queries to re-discover or

overcome it. We therefore develop techniques

to (i) reverse-engineer the preprocessor and

then (ii) use this extracted information to attack

the end-to-end system. Our preprocessors

extraction method requires only a few hundreds

queries, and our preprocessor-aware attacks

recover the same efﬁcacy as when attacking the

model alone. The code can be found at

https:

//github.com/google-research/

preprocessor-aware-black-box-attack

1. Introduction

Machine learning is widely used in security-critical systems,

for example for detecting abusive, harmful or otherwise

unsafe online content (Waseem et al., 2017; Clarifai; Jha &

Mamidi, 2017). It is critical that such systems are robust

against adversaries who seeks to evade them.

Yet, an extensive body of work has shown that an adver-

Department of Computer Science, University of California,

Berkeley, USA. Work partially done while the author was at

Google.

ETH Z

urich, Z

urich, Switzerland.

Google DeepMind,

Mountain View, USA. Correspondence to: Chawin Sitawarin

<chawins@berkeley.edu>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

sary can fool machine learning models with adversarial

examples (Biggio et al., 2013; Szegedy et al., 2014). Most

prior work focuses on white-box attacks, where an adversary

has perfect knowledge of the entire machine learning sys-

tem (Carlini & Wagner, 2017). Yet, real adversaries rarely

have this level of access (Tram

er et al., 2019), and must

thus instead resort to black-box attacks (Chen et al., 2017).

Decision-based attacks (Brendel et al., 2018a) are a partic-

ularly practical attack vector, as these attacks only require

the ability to query a target model and observe its decisions.

However, existing decision-based attacks (Brendel et al.,

2018b; Cheng et al., 2020a; Chen et al., 2020; Li et al.,

2020) have primarily been evaluated against standalone ML

models “in the lab”, thereby ignoring the components of

broader learning systems that are used in practice. While

some decision-based attacks have been demonstrated on

production systems as a proof-of-concept (e.g., Ilyas et al.

(2018); Brendel et al. (2018a); Li et al. (2020)), it is not

well understood how these attacks perform on end-to-end

learning systems compared to standalone models.

We show that existing decision-based attacks are signif-

icantly less effective against end-to-end systems com-

pared to standalone machine learning models. For exam-

ple, a standard decision-based attack can evade a ResNet

image classiﬁer on ImageNet with an average

ℓ2

-distortion

3.7

(deﬁned formally later). Yet, if we instead attack

an end-to-end learning system that simply preprocesses the

classiﬁer’s input before classifying it—e.g., by resizing or

compressing the image—the attack achieves an average

ℓ2

distortion of

28.5

—a

7×

increase! We further ﬁnd that

extensive hyperparameter tuning and running the attacks for

more iterations fail to resolve this issue. We thus argue that

existing decision-box attacks have fundamental limitations

that make them sub-optimal in practice.

To remedy this, we develop improved attacks that achieve

the same success rate when attacking systems with un-

known preprocessors, as when attacking standalone

models. Our attacks combine decision-based attacks with

techniques developed for model extraction (Tram

er et al.,

2016). Our attacks ﬁrst query the system to reverse-engineer

the preprocessor(s) used in the input pipeline, and then

mount a modiﬁed preprocessor-aware decision-based attack.

Our extraction procedure is efﬁcient and often requires only

a few hundred queries to identify commonly used preproces-

arXiv:2210.03297v2 [cs.CR] 20 Jul 2023

Realistic Decision-Based Attacks on Machine Learning Systems 2

sors. This cost can also be amortized across many generated

adversarial examples. We ﬁnd that even the least efﬁcient

preprocessor-aware attack outperforms all unaware attacks.

Learning the system’s preprocessing pipeline is thus more

important than devising an efﬁcient standalone attack.

2. Background and Related Work

Adversarial Examples. Adversarial examples are inputs

designed to fool a machine learning classiﬁer (Biggio et al.,

2013; Szegedy et al., 2014; Goodfellow et al., 2015). For

some classiﬁer

, an example

has an adversarial example

x′=x+δ

f(x)̸=f(x′)

, where

is a small perturbation

under some

ℓp

-norm, i.e.,

∥δ∥p≤ϵ

. Adversarial examples

can be constructed either in the white-box setting (where the

adversary uses gradient descent to produce the perturbation

) (Carlini & Wagner, 2017; Madry et al., 2018), or more

realistically, in the black-box setting (where the adversary

uses just query access to the system) (Papernot et al., 2017;

Chen et al., 2017; Brendel et al., 2018a). Our paper focuses

on this black-box setting with ℓ2-norm perturbations.

Decision-based can generate adversarial examples with only

query access to the remote model’s decisions (i.e., the output

class

y←f(x)

). These attacks typically work by ﬁnding

the decision boundary between the original image and a

target label of interest and then walking along the decision

boundary to reduce the total distortion (Brendel et al., 2018a;

Cheng et al., 2020a; Chen et al., 2020; Li et al., 2020).

It has been shown that decision-based attacks should operate

at the lowest-dimensional input space possible. For exam-

ple, QEBA (Li et al., 2020) improves upon HSJA (Chen

et al., 2020) by constructing adversarial examples in a lower-

dimensional embedding space. This phenomenon will help

explain some of the results we observe, where we ﬁnd that

high-dimensional images require more queries to attack.

Adversarial examples need not exploit the classiﬁer itself.

Image scaling attacks (Quiring et al., 2020) construct a

high-resolution image

so that after resizing to a smaller

ˆx

, the low resolution image is visually dissimilar to

. As

a result, any accurate classiﬁer will (correctly) classify the

high-resolution image and the low-resolution image differ-

ently. Gao et al. (2022) consider the image-scaling attack in

conjunction with a classiﬁer similar to our setting. However,

our work applies to arbitrary preprocessors, not limited to

resizing, and we also propose an extraction attack to unveil

the deployed preprocessor in the ﬁrst place.

Preprocessing defenses. A number of proposed defenses

against adversarial examples preprocess inputs before classi-

ﬁcation (Guo et al., 2018; Song et al., 2018). Unfortunately,

these defenses are largely ineffective in a white-box set-

ting (Athalye et al., 2018; Tramer et al., 2020; Sitawarin

et al., 2022). Surprisingly, recent work has shown that

defending against existing decision-based attacks with pre-

processors is quite simple. Aithal & Li (2022); Qin et al.

(2021) show that adding small amounts of random noise

to inputs impedes all current attacks. This suggests that

there may be a signiﬁcant gap between the capabilities of

white-box and black-box attacks when preprocessors are

present.

Model Stealing Attacks. To improve the efﬁcacy of black-

box attacks, we make use of techniques from model stealing

attacks (Tram

er et al., 2016). These attacks aim to create

a ML model that closely mimics the behavior of a remote

model (Jagielski et al., 2020). Our goal is slightly different

as we only aim to “steal” the system’s preprocessor and

use this knowledge to mount stronger evasion attacks. For

this, we leverage techniques that have been used to extract

functionally equivalent models, which exactly match the

behavior of the remote model on all inputs (Milli et al.,

2019; Rolnick & Kording, 2020; Carlini et al., 2020).

3. Setup and Threat Model

3.1. Notation

We denote an unperturbed input image in the original space

xo∈ Xo:= [0,1]so×so

and a processed image in the

model space as

xm∈ Xm⊆[0,1]sm×sm

. The original

size

can be the same or different from the target size

A preprocessor

t:Xo→ Xm

maps

xm:=t(xo)

For instance, a resizing preprocessor that maps an image

of size

256 ×256

pixels to

224 ×224

pixels means that

so= 256

sm= 224

, and

Xm= [0,1]224×224

. As another

example, an 8-bit quantization restricts

to a discrete

space of

{0,1/255,2/255,...,1}sm×sm

and

so=sm

The classiﬁer, excluding the preprocessor, is represented by

f:Xm→ Y

where

is the hard label space. Finally, the

entire classiﬁcation pipeline is denoted by f◦t:Xo→ Y.

3.2. Threat Model

The key distinguishing factor between previous works and

ours is that we consider a preprocessing pipeline as part

of the victim system. In other words, the adversary cannot

simply run an attack algorithm on the model input space.

We thus follow in the direction of Pierazzi et al. (2020) and

Gao et al. (2022) who develop attacks that work end-to-end,

as opposed to just attacking a standalone model. To do

this, we develop strategies to “bypass” the preprocessors

(Section 4) and to reverse-engineer which preprocessors are

being used (Section 6). Our threat model is:

•

The adversary has black-box, query-based access to the

victim model and can query the model on any input and

observe the output label

y∈ Y

. The adversary has a

limited query budget per input. The adversary knows

nothing else about the system.

Realistic Decision-Based Attacks on Machine Learning Systems 3

Figure 1: Illustration of our Bypassing Attack with resizing as the preprocessor as a comparison to the unaware or

preprocessor-oblivious attack. The red and the green arrows denote the query submitted by the attack and the output returned

by the MLaaS pipeline, respectively. The attack phase of our Bypassing Attack ﬁrst resizes the input image to the correct

size used by the target pipeline. This allows any attack algorithm to operate on the model input space directly. The recovery

phase then ﬁnds the adversarial example in the original space that maps to the one found during the attack phase.

•

The adversary wants to misclassify as many perturbed

inputs as possible (either targeted and untargeted), while

minimizing the perturbation size—measured by Euclidean

distance in the original input space Xo.

•

The victim system accepts inputs of any dimension, and

the desired model input size is obtained by cropping

and/or resizing as part of an image preprocessing pipeline.

4. Preprocessor-Aware Attacks

Decision-based attacks often query a model on many nearby

points, e.g., to approximate the local geometry of the bound-

ary. Since most preprocessors are not injective functions,

nearby points in the original input space might map onto

the same processed image. Preprocessing thus makes the

model’s output invariant to some input changes. This can

cause the attack to waste queries and prevent it from learning

information about the target model.

4.1. Bypassing Attack

Our Bypassing Attack in Algorithm 1 avoids these invari-

ances by circumventing the preprocessor entirely. Figure 1

illustrates our attack with a resizing preprocesssor (e.g.,

1024 →224

). To allow the Bypassing Attack to query the

model directly, we ﬁrst map the input image (

xo∈ Xo

) to

the preprocessed space (

t(xo)∈ Xm

). Then, in the Attack

Phase, we execute an off-the-shelf decision-based attack

directly on this preprocessed image (xadv

m∈ Xm).

Finally, after completing the attack, we recover the adver-

sarial image in the original space (

xadv

o∈ Xo

) from

xadv

We call this step the Recovery Phase. It ﬁnds an adversarial

example with minimal perturbation in the original space, by

solving the following optimization problem:

arg min

zo∈Xo

∥zo−xo∥2

2s.t. t(zo) = xadv

m.(1)

Algorithm 1 Outline of Bypassing Attack. This exam-

ple is built on top of a gradient-approximation-based at-

tack algorithm (e.g., HSJA, QEBA), but it is compati-

ble with any black-box attack.

ApproxGrad()

and

AttackUpdate()

are unmodiﬁed gradient approxima-

tion and perturbation update functions from the base attack.

Uis a distribution of vectors on a uniform unit sphere.

Input: Image x, label y, classiﬁer f, preprocessor t

Output: Adversarial examples xadv

x′←t(x)#Initialization

#Attack Phase: run an attack algorithm of choice

for i= 1 to num steps do

X← {x′+αub}B

b=1 where ub∼ U

∇xS←ApproxGrad(f◦t,˜

X,y)

x′←AttackUpdate(x′,∇xS)

end for

#Recovery Phase: exactly recover

xadv

in original input space

xadv ←ExactRecovery(t, x′)

return xadv

4.1.1. CROPPING

Because almost all image classiﬁers operate on square im-

ages (Wightman, 2019), one of the most common prepro-

cessing operations is to ﬁrst crop the image to a square. In

practice, this means that any pixels on the edge of the image

are completely ignored by the classiﬁer. Our Bypassing

Attack exploits this fact by simply removing these cropped

pixels, simply running an off-the-shelf attack in the cropped

space. For a more formal statement, see Appendix B.1.

4.1.2. RESIZING

Image resizing is a ubiquitous preprocessing step in any

vision system, as most classiﬁers are trained only on images

of a speciﬁc size. We begin by considering the special case

Realistic Decision-Based Attacks on Machine Learning Systems 4

Figure 2: Illustration of the Biased-Gradient Attack with quantization as the preprocessor. Biased-Gradient Attack cannot

directly operate on the model input space like Bypassing Attack. Rather, it takes advantage of the preprocessor knowledge

by modifying a speciﬁc attack but still operates in the original space.

of resizing with “nearest-neighbor interpolation”, which

downsizes images by a factor

simply by selecting only

out of every block of

pixels. This resize operation is

conceptually similar to cropping, and thus the intuition be-

hind our attack is the same: if we know which pixels are

retained by the preprocessor, we can avoid wasting perturba-

tion budget and queries on pixels that are discarded. Other

interpolation methods for resizing, e.g., bilinear or bicubic,

work in a similar way, and can all be expressed as a linear

transform, i.e., xm=tres(xo) = Mresxofor so> s1.

The attack phase for resizing is exactly the same as that of

cropping. The adversary simply runs an attack algorithm of

their choice on the model space

. The main difference

comes in the recovery phase which amounts to solving the

following optimization problem:

xadv

o= arg minzo∈Rso×so∥zo−xo∥2s.t. Mreszo=xadv

m.(2)

Quiring et al. (2020); Gao et al. (2022) solve a similar ver-

sion of this problem via a gradient-based algorithm. How-

ever, we show that there exists a closed-form solution for

the global optimum. Since the constraint in Eqn. (2) is an

underdetermined linear system, this problem is analogous

to ﬁnding a minimum-norm solution, given by:

xadv

o=xo+δ∗

o=xo+ (Mres)+xadv

m−xm.(3)

Here,

(·)+

represents the Moore-Penrose pseudo-inverse.

We defer the formal derivation to Appendix B.2.

Limitation. We have demonstrated how to bypass two

very common preprocessors—cropping and resizing—but

not all can be bypassed in this way. Our Bypassing At-

tack assumes (A1) the preprocessors are idempotent, i.e.,

t(t(x)) = t(x)

, and (A2) the preprocessor’s output space

is continuous. Most common preprocessing functions are

idempotent: e.g., quantizing an already quantized image

makes no difference. For preprocessors that do not satisfy

(A2), e.g., quantization whose output space is discrete, we

propose an alternative attack in the next section.

4.2. Biased-Gradient Attacks

We now turn our attention to more general preprocessors that

cannot be bypassed without modifying the search space—

for example quantization, discretizes a continuous space.

Quantization is one of the most common preprocessors an

adversary has to overcome since all common image for-

mats (e.g., PNG or JPEG) discretize the pixel values to 8

bits. However, prior black-box attacks ignore this fact and

operate in the continuous domain.

We thus propose the Biased-Gradient Attack in Algorithm 2.

Unlike the Bypassing Attack, this attack operates in the

original space. Instead of applying a black-box attack as

is, the Biased-Gradient Attack modiﬁes the base attack in

order to bias queries toward directions that the preprocessor

is more sensitive to. The intuition is that while it is hard to

completely avoid the invariance of the preprocessor, we can

encourage the attack to explore directions that result in large

changes in the output space of the preprocessing function.

Our Biased-Gradient Attack also consists of an attack and

recovery phase. The attack phase makes two modiﬁcations

to an underlying gradient approximation attack (e.g., HSJA,

QEBA) which we explain below. The recovery phase sim-

ply solves Equation (1) with a gradient-based method, by

relaxing the constraint using a Lagrange multiplier (since

closed-formed solutions do not exist in general). For this,

we defer the details to Appendix C. Figure 2 illustrates the

Biased-Gradient Attack for a quantization preprocessor.

(i) Biased Gradient Approximation: We modify the gradient

approximation step to account for the preprocessor. First,

consider the adversary’s loss function deﬁned as

S(x):=(maxc∈Y\{y}fc(x)−fy(x)(untargeted)

fy′(x)−maxc∈Y\{y′}fc(x)(targeted)

(4)

where

(x, y)

is the input, and

y′̸=y

is the target label.

Attacks such as HSJA and QEBA estimate the gradient of

Realistic Decision-Based Attacks on Machine Learning Systems 5

Algorithm 2 Outline of Biased-Gradient Attack built on top

of gradient-approximation-based attack algorithm.

Input: Image x, label y, classiﬁer f, preprocessor t

Output: Adversarial examples xadv

x′←x#No special initialization

#Attack Phase: run modiﬁed attack

for i= 1 to num steps do

#Biased gradient approximation

X← {t(x′+αub)}B

b=1 where ub∼ U

∇t(x)S←ApproxGrad(f◦t,˜

X,y)

∇xS←¯

∇t(x)S·∂t(x)

∂x #Backprop through t

x′←AttackUpdate(x′,∇xS)

end for

#Recovery Phase: optimization-based recover

xadv

in original

space (works for any differentiable t)

xadv ←OptRecovery(t, x′)

return xadv

S(x)

by applying ﬁnite-differences to the quantity

ϕ(x):=

sign(S(x))

which can be measured by querying the model’s

label. The attack samples uniformly random unit vectors

{ub}B

b=1, scales them a hyper-parameter α, and computes

∇xS(x, α)≈1

b=1

ϕ(t(x+αub))ub,(5)

We then perform a change-of-variables to obtain a gradient

estimate with respect to t(x)instead of x:

b=1

ϕ(t(x+αub))ub=1

b=1

ϕ(t(x) + α′

bu′

b)ub(6)

where

α′

b=∥t(x+αub)−t(x)∥2

, and

u′

b= (t(x+αub)−

t(x))/α′

. Notice that

α′

bu′

corresponds to a random per-

turbation in the model space. Thus, we can “bypass” the

preprocessor and approximate gradients in the model space

instead by substituting ubwith u′

bin Equation (6).

∇t(x)S(x, α):=1

BPB

b=1 ϕ(t(x) + α′

bu′

b)u′

b≈ ∇t(x)S(x, α).(7)

So instead of querying the ML system with inputs

x+αub

we use

t(x+αub) = t(x)+α′

bu′

which is equivalent to pre-

applying the preprocessor to the queries. If the preprocessor

is idempotent, the model

sees the same processed input in

both cases. This gradient estimator is biased because

u′

de-

pends on

. Concretely, the distribution of

u′

is concentrated

around directions that “survive” the preprocessor.

(ii) Backpropagate Gradients through the Preprocessor: The

gradient estimate

∇t(x)S

in Eqn. (7) is w.r.t. the model

space, instead of the original input space where the attack

operates. Hence, we can backpropagate

∇t(x)S

through

according to the chain rule,

∇xS=∇xt(x)·¯

∇t(x)S

where

∇xt(x)

is the Jacobian matrix of the preprocessor

w.r.t. the

original space. In our experiments, we use a differentiable

version of quantization and JPEG compression by Shin &

Song (2017) so the Jacobian matrix exists.

5. Attack Experiments

5.1. Setup

Model. Similarly to previous works (Brendel et al., 2018a),

we evaluate our attacks on a ResNet-18 (He et al., 2016)

trained on the ImageNet dataset (Deng et al., 2009). The

model is publicly available in the popular

timm

pack-

age (Wightman, 2019).

Off-the-shelf attacks. We consider four different attacks,

Boundary Attack (Brendel et al., 2018a), Sign-OPT (Cheng

et al., 2020a), HopSkipJump Attack (HJSA) (Chen et al.,

2020), and QEBA (Li et al., 2020). The ﬁrst three attacks

have both targeted and untargeted versions while QEBA is

only used as a targeted attack. We also compare our attacks

to the baseline preprocessor-aware attack, SNS (Gao et al.,

2022). As this attack only considers resizing, we adapt it to

the other preprocessors we consider.

Attack hyperpameters. As we discuss in Section 7.2 and

Appendix E.2, a change in preprocessor has a large impact

on the optimal choice of hyperparameters for each attack.

We thus sweep hyperparameters for all attacks and report

results for the best choice.

Metrics. We report the average perturbation size (

ℓ2

-norm)

of adversarial examples found by each attack—referred to

as the “adversarial distance” in short. Smaller adversarial

distance means a stronger attack.

Appendix A contains full detail of all our experiments.

5.2. Bypassing Attack Results

Cropping. We consider a common operation that center

crops an image of size

256 ×256

pixels down to

224 ×224

pixels, i.e.,

so= 256, sm= 224

. In Table 1, our Bypassing

approach improves all of the baseline preprocessor-unaware

attacks. The adversarial distance found by the baseline

is about 8–16% higher than that of the Bypassing Attack

counterpart across all settings. This difference is very close

to the portion of the border pixels that are cropped out

(

p2562/2242−1≈0.14

), suggesting that the cropping-

unaware attacks do waste perturbation on these invariant

pixels. Our Bypassing Attack also recovers about the same

mean adversarial distance as the case where there is no

preprocessor (ﬁrst row of Table 1).

Resizing. We study the three most common interpolation

or resampling techniques, i.e., nearest, bilinear, and bicubic.

For an input size of

1024 ×1024

(see Table 1), a reason-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PreprocessorsMatter!RealisticDecision-BasedAttacksonMachineLearningSystemsChawinSitawarin1FlorianTram`er2NicholasCarlini3AbstractDecision-basedattacksconstructadversarialexamplesagainstamachinelearning(ML)modelbymakingonlyhard-labelqueries.Theseattackshavemainlybeenapplieddirectlytostandaloneneuraln...

展开>> 收起<<

Preprocessors Matter Realistic Decision-Based Attacks on Machine Learning Systems Chawin Sitawarin1Florian Tram er2Nicholas Carlini3 Abstract.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Preprocessors Matter Realistic Decision-Based Attacks on Machine Learning Systems Chawin Sitawarin1Florian Tram er2Nicholas Carlini3 Abstract

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: