A Closer Look at Robustness to L-inﬁnity and Spatial Perturbations and their Composition Luke RoweBenjamin ThérienKrzysztof Czarnecki Hongyang Zhang

2025-04-28 0 0 799.87KB 16 页 10玖币

侵权投诉

A Closer Look at Robustness to L-inﬁnity and Spatial

Perturbations and their Composition

Luke Rowe∗Benjamin Thérien∗Krzysztof Czarnecki Hongyang Zhang

School of Computer Science

University of Waterloo

{l6rowe,btherien,k2czarne,hongyang.zhang}@uwaterloo.ca

Abstract

In adversarial machine learning, the popular

`∞

threat model has been the focus

of much previous work. While this mathematical deﬁnition of imperceptibility

successfully captures an inﬁnite set of additive image transformations that a model

should be robust to, this is only a subset of all transformations which leave the

semantic label of an image unchanged. Indeed, previous work also considered

robustness to spatial attacks as well as other semantic transformations; however,

designing defense methods against the composition of spatial and

`∞

perturbations

remains relatively underexplored. In the following, we improve the understanding

of this seldom investigated compositional setting. We prove theoretically that

no linear classiﬁer can achieve more than trivial accuracy against a composite

adversary in a simple statistical setting, illustrating its difﬁculty. We then investigate

how state-of-the-art

`∞

defenses can be adapted to this novel threat model and study

their performance against compositional attacks. We ﬁnd that our newly proposed

TRADES

All

strategy performs the strongest of all. Analyzing its logit’s Lipschitz

constant for RT transformations of different sizes, we ﬁnd that TRADES

All

remains

stable over a wide range of RT transformations with and without

`∞

perturbations.

1 Introduction

Despite the outstanding performance of deep neural networks[

] on a variety of computer

vision tasks, deep neural networks have been shown to be vulnerable to human-imperceptible

adversarial perturbations [

]. Designing algorithms that are robust to small human-imperceptible

`∞

-bounded alterations of the input has been an extensive focus of previous work [

]. While it

is certainly unreasonable for a classiﬁer to change its decision based on the addition of imperceptible

`∞

-bounded noise, this is not the only input transformation we wish to be robust to. Many spatial

transformations, such as bounded rotation/translations (RTs), leave an image’s label unchanged, but

are ill-deﬁned by an

`∞

-threat model (see Figure 1). Yet, any classiﬁer deemed robust should not

be any more vulnerable to perturbations applied to RT transformed images seen in Figure 1 (row 3)

than to natural images (row 1). However, current defenses, designed for

`∞

robustness fail under

this compositional setting (see Table 2), suggesting that our models, at least for image classiﬁcation,

are less robust than we thought. To build truly robust models, we must design training protocols to

account for such situations.

While many prior works have considered robustness under adversarial settings that differ from

the standard

`∞

setting, most works either consider robustness under a single perturbation type

[

] or by selecting a perturbation from a ﬁxed set (i.e., the union) of perturbation

types [

]. However, relatively few consider robustness to the composition of multiple

perturbation types [

]. Realistically, an adversary is not restricted to selecting a perturbation

∗Equal contribution. Firth-authorship determined by a coinﬂip.

Preprint. Under review.

arXiv:2210.02577v1 [cs.LG] 5 Oct 2022

Figure 1:

Adversarial images obtained by the AAA ◦RT attack

. The ﬁrst row shows clean images.

The second row shows the same images with applied adversarial RT transformations, and the third

row shows the same RT images as above, but with imperceptible

`∞

perturbations applied via AAA.

from one threat model but may choose to compose perturbations from multiple threat models

(see Figure 1). Moreover, our theoretical analysis shows that defending against an adversary who

can compose



-bounded

`∞

perturbations and RT transformations is challenging even in a simple

statistical setting. This theoretical result highlights the need to explore how we can build truly

robust models in this well-motivated compositional setting. The main contributions of this work are

three-fold:

•

We show theoretically that no linear classiﬁer can attain non-trivial compositional robustness

in a simple, yet realistic, statistical setting.

•

We train a family of empirical defenses constructed from TRADES [

] and analyze their

performance under a compositional adversary.

•

We propose

TRADESAll

, a new training protocol for defending against

`∞◦RT

adversaries,

show that it attains the best performance of all the defenses trained, and discover that its

logits are more stable than our other robust models, shedding light on its strong performance.

2 Related work

Many existing works consider adversarial robustness to single perturbation types, which include

robustness to

perturbations [

] as well as robustness to spatial transformations

of the input [

]. Compared with single perturbation type robustness, relatively few

consider the problem of attaining robustness to the composition of multiple perturbation types [

]. Tramèr and Boneh [

] ﬁrst identiﬁed the compositional setting and studied the composition

of multiple

perturbations as well as

`∞

and RT perturbations; however, they consider afﬁne

combinations of multiple perturbations, which unreasonably constrains the power of the compositional

adversary. Li et al. [

] designed methods to attain certiﬁed robustness to the composition of various

semantic transformations of the input, and Tsai et al. [

] designed a generalized form of adversarial

training for compositional semantic perturbations. Mao et al. [

] designed a composite adversarial

attack that composes the search space of multiple base attackers. Our work is most closely related

to [

]; however, we differ from [

] by considering the addition of

`∞

perturbations and RT

transformations rather than an afﬁne combination of such perturbations in our analysis, so as to not

unreasonably limit the strength of the compositional adversary. Figure 1 motivates this treatment, as

the second and third row of images are indistinguishable to humans.

3 Preliminaries

In this work, we consider a compositional threat model consisting of the composition of



-bounded

`∞

perturbations and bounded RT transformations. For the

`∞

threat model, we consider an adversary

who can perturb an image

with



-bounded

`∞

noise. That is, the adversarial reachable region

A`∞(x)under the `∞-threat model is deﬁned by:

A`∞(x) = B∞(x, ) := {x+∆;||∆||∞≤}.(1)

For the RT threat model, we consider an adversary who can apply a bounded rotation

followed by

bounded horizontal and vertical translations

δx, δy

. Concretely, the adversarial reachable region

under the RT-threat model is deﬁned by:

ART(x) = {T (x;θ, δx, δy); |θ| ≤ θmax,|δx| ≤ δmax

x,|δy| ≤ δmax

y},(2)

where

T(·;θ, δx, δy)

is the afﬁne transformation function with rotation

and horizontal/vertical

translations

δx, δy

, which implicitly warps the image via an interpolation algorithm (our experiments

utilize bilinear interpolation). For the compositional threat model, the adversarial reachable region is

naturally deﬁned by:

A`∞◦RT(x) = {B∞(T(x;θ, δx, δy), ); |θ| ≤ θmax,|δx| ≤ δmax

x,|δy| ≤ δmax

y}.(3)

That is,

A`∞◦RT(x)

is deﬁned as as the set of



-bounded

`∞

balls around all valid afﬁne transfor-

mations of the image

. To compare with our compositional setting, we also consider the union

threat model consisting of the union of



-bounded

`∞

perturbations and bounded RT transformations

[

]. In this case, the adversarial reachable region under the

`∞∪RT

-threat model is deﬁned as

A`∞∪RT(x) = A`∞(x)∪ ART(x).

4 On the Difﬁculty of Attaining Compositional Robustness with Linear

Classiﬁers

In this section, we theoretically demonstrate the difﬁculty of defending against an

`∞◦RT

composi-

tional adversary with a linear classiﬁer on a simple statistical setting.

4.1 Statistical Setting

To theoretically analyze the compositional adversarial setting, we use the statistical distribution

proposed in [

]. Namely, we study a binary classiﬁcation problem with

-dimensional input features,

in which the ﬁrst feature

is strongly correlated with the output label

with probability

, and the

remaining features are weakly correlated with y. The distribution can be written as follows:

Yu.a.r.

∼ {−1,1}, X0|Y=y:= y, w.p. p;

−y, w.p. 1−p, Xt|Y=y∼ N (yη, 1) ,1≤t≤d−1,(4)

where

η= Θ( 1

√d)

and

p≥0.5

. We assume that an

`∞

adversary has budget

= 2η

, similar to [

Moreover, we deﬁne an RT transformation as it is deﬁned in [

]. Concretely, an RT transformation

is deﬁned as a swap between the strongly correlated feature

and a weakly correlated feature

1≤t≤d−1

. To constrain the RT transformation, we assume that an RT adversary can swap

with at most

positions on the input signal. If we assume the input features

, . . . ,

Xd−1

lie on a

2-dimensional grid, then this deﬁnition of an RT transformation serves as a realistic abstraction of

applying an RT transformation to an image using nearest interpolation and rotating about the image’s

center. Namely, since the distribution over the last

d−1

features is permutation invariant, then the only

power of an RT transformation is to move the strongly correlated feature, where

deﬁnes the number

of reachable pixels that the strongly correlated feature can be mapped to via an RT transformation.

For example, when considering only translations, we have

N= (2δmax

x+ 1)(2δmax

y+ 1)

. We now

state a theorem that establishes the difﬁculty of defending against an

`∞◦RT

compositional adversary

with a linear classiﬁer.

Theorem 4.1

(A linear classiﬁer cannot attain nontrivial

`∞◦RT

robustness)

Given data distribution

where

p≥1

η≥1

√d

, and

d≥24

, no linear classiﬁer

f:Rd→ {−1,1}

, where

f(x) =

sign(wTx)

, can obtain robust accuracy

>0.5

under the

`∞◦RT

threat model with

`∞

budget

= 2ηand RT budget N=d

This theorem shows that under reasonable constraints on the compositional adversary, a linear

classiﬁer can perform no better than random, even in the inﬁnite data limit. We note by contrast that

a linear classiﬁer can attain

>0.99

natural accuracy in this statistical setting; e.g., see [

]. This

result distinguishes itself from Theorem 4 in [

], in that [

] show that an adversary that composes

`∞

and RT perturbations yields a stronger attack than a union adversary, whereas we show that

a linear classiﬁer cannot have nontrivial robustness against a compositional adversary under this

Table 1:

MNIST results for different defense methods.

Columns correspond to robust accuracy

under different perturbation types, whereas rows correspond to different defense models. All RT

attacks utilize our grid search strategy. PGD attacks use

iterations on MNIST. The best performing

entry under a given attack is bolded, while the second best is underlined.

Defense \ Attack βAAA ◦RT PGD ◦RT AAA ∪RT PGD ∪RT AAA PGD RT Natural

TRADESAll 1.0 49.47 68.60 90.28 92.49 92.54 95.60 93.93 99.34

TRADES`∞◦RT 1.0 55.61 71.37 89.28 91.51 91.66 95.10 93.06 99.03

TRADES`∞∪RT 1.0 21.97 49.16 89.67 92.31 91.32 95.20 94.08 99.47

TRADES`∞1.0 0.02 00.11 00.43 00.43 92.99 95.88 00.57 99.52

TRADESRT 1.0 0.00 0.07 0.00 0.18 0.00 0.18 96.63 99.64

TRADESAll 3.0 58.68 72.70 90.59 92.35 92.33 95.00 94.05 98.94

TRADES`∞◦RT 3.0 59.28 73.74 88.93 91.07 91.03 94.01 93.09 98.65

TRADES`∞∪RT 3.0 39.72 66.49 91.00 93.43 92.21 95.62 95.00 99.28

TRADES`∞3.0 0.04 0.11 0.48 0.48 93.83 96.47 0.64 99.35

TRADESRT 3.0 0.00 0.01 0.00 0.01 0.00 0.17 97.54 99.49

TRADESAll 6.061.74 75.38 90.33 92.29 92.19 95.25 93.60 98.73

TRADES`∞◦RT 6.0 58.21 72.54 88.25 90.27 90.62 93.75 92.21 98.23

TRADES`∞∪RT 6.0 44.52 69.97 91.25 93.43 92.49 95.46 94.99 99.22

†TRADES`∞6.0 0.01 0.07 0.48 0.48 92.73 96.07 0.58 99.48

TRADESRT 6.0 0.00 0.00 0.00 0.07 0.00 0.07 97.48 99.42

Natural - 0.00 0.00 0.00 2.13 0.00 2.18 0.19 99.18

†model checkpoint from [20].

statistical setting. We emphasize that although no linear classiﬁer can attain nontrivial robustness on

this statistical setting, networks with sufﬁcient depth and capacity may be able to attain nontrivial

robustness (see

TRADESAll

in Tables 1 and 2). Nevertheless, this result highlights the difﬁculty of

attaining compositional robustness in this setting. We next explore how we can design robust models

in this well-motivated compositional setting.

5 Experiments

5.1 Proposed defense methods

To explore the space of compositional adversarial examples and the compositional threat model, we

train a family of empirical defenses constructed from TRADES [

] and evaluate these defenses in

a white-box setting. We choose a white-box setting to assess the full adversarial strength of these

compositional adversarial examples. Below, we have a general form for the TRADES objective:

min

E(L(f(x), Y )

| {z }

Natural Accuracy

+βmax

x0∈A(x)L(f(x), f(x0))

| {z }

Robustness under A(x)

).(5)

We train a family of TRADES models under the various threat models discussed in Section 3.

Concretely, we train the following family of TRADES defense methods:

TRADES`∞

TRADESRT

TRADES`∞∪RT

, and

TRADES`∞◦RT

, where the subscript indicates the threat model being consid-

ered during training. To train TRADES models under these new threat models, we require a way to

efﬁciently solve the inner optimization problem in the TRADES objective for the new corresponding

deﬁnitions of

A(x)

. In the

`∞

case, we perform Projected Gradient Descent (PGD) for a small

number of steps, as is typically done [

]. For the RT-threat model, we perform a

Worst-of-10

search:

we sample 10 random valid afﬁne transformations and select the afﬁne transformed image that attains

the highest loss, as is done in [

]. For the robustness loss function, we use the KL-divergence between

the logits of the natural image and the logits of the transformed image. For the union setting, we

use an existing approach called the

Max Strategy

[

], in which we compute an

`∞

perturbation

using PGD and an RT perturbation using Worst-of-10, and select the perturbation that attains the

maximum KL-divergence loss. For the compositional setting, we propose the

Worst-on-Worst

strategy, whereby we ﬁrst compute an RT adversarial example using Worst-of-10, and then we

perform PGD on the worst RT-perturbed image. Worst-on-Worst implicitly assumes that the “worst”

adversarial image from Worst-of-10 will produce the “worst” compositional adversarial example.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ACloserLookatRobustnesstoL-innityandSpatialPerturbationsandtheirCompositionLukeRoweBenjaminThérienKrzysztofCzarneckiHongyangZhangSchoolofComputerScienceUniversityofWaterloo{l6rowe,btherien,k2czarne,hongyang.zhang}@uwaterloo.caAbstractInadversarialmachinelearning,thepopular`1threatmodelhasbeenthef...

展开>> 收起<<

A Closer Look at Robustness to L-inﬁnity and Spatial Perturbations and their Composition Luke RoweBenjamin ThérienKrzysztof Czarnecki Hongyang Zhang.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Closer Look at Robustness to L-inﬁnity and Spatial Perturbations and their Composition Luke RoweBenjamin ThérienKrzysztof Czarnecki Hongyang Zhang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: