A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition Luke RoweBenjamin ThérienKrzysztof Czarnecki Hongyang Zhang

2025-04-28 0 0 799.87KB 16 页 10玖币
侵权投诉
A Closer Look at Robustness to L-infinity and Spatial
Perturbations and their Composition
Luke RoweBenjamin ThérienKrzysztof Czarnecki Hongyang Zhang
School of Computer Science
University of Waterloo
{l6rowe,btherien,k2czarne,hongyang.zhang}@uwaterloo.ca
Abstract
In adversarial machine learning, the popular
`
threat model has been the focus
of much previous work. While this mathematical definition of imperceptibility
successfully captures an infinite set of additive image transformations that a model
should be robust to, this is only a subset of all transformations which leave the
semantic label of an image unchanged. Indeed, previous work also considered
robustness to spatial attacks as well as other semantic transformations; however,
designing defense methods against the composition of spatial and
`
perturbations
remains relatively underexplored. In the following, we improve the understanding
of this seldom investigated compositional setting. We prove theoretically that
no linear classifier can achieve more than trivial accuracy against a composite
adversary in a simple statistical setting, illustrating its difficulty. We then investigate
how state-of-the-art
`
defenses can be adapted to this novel threat model and study
their performance against compositional attacks. We find that our newly proposed
TRADES
All
strategy performs the strongest of all. Analyzing its logit’s Lipschitz
constant for RT transformations of different sizes, we find that TRADES
All
remains
stable over a wide range of RT transformations with and without
`
perturbations.
1 Introduction
Despite the outstanding performance of deep neural networks[
2
,
19
,
21
] on a variety of computer
vision tasks, deep neural networks have been shown to be vulnerable to human-imperceptible
adversarial perturbations [
5
,
14
]. Designing algorithms that are robust to small human-imperceptible
`
-bounded alterations of the input has been an extensive focus of previous work [
10
,
20
]. While it
is certainly unreasonable for a classifier to change its decision based on the addition of imperceptible
`
-bounded noise, this is not the only input transformation we wish to be robust to. Many spatial
transformations, such as bounded rotation/translations (RTs), leave an image’s label unchanged, but
are ill-defined by an
`
-threat model (see Figure 1). Yet, any classifier deemed robust should not
be any more vulnerable to perturbations applied to RT transformed images seen in Figure 1 (row 3)
than to natural images (row 1). However, current defenses, designed for
`
robustness fail under
this compositional setting (see Table 2), suggesting that our models, at least for image classification,
are less robust than we thought. To build truly robust models, we must design training protocols to
account for such situations.
While many prior works have considered robustness under adversarial settings that differ from
the standard
`
setting, most works either consider robustness under a single perturbation type
[
1
,
3
,
4
,
6
,
8
,
18
] or by selecting a perturbation from a fixed set (i.e., the union) of perturbation
types [
7
,
8
,
11
,
12
,
15
]. However, relatively few consider robustness to the composition of multiple
perturbation types [
8
,
15
,
16
]. Realistically, an adversary is not restricted to selecting a perturbation
Equal contribution. Firth-authorship determined by a coinflip.
Preprint. Under review.
arXiv:2210.02577v1 [cs.LG] 5 Oct 2022
Figure 1:
Adversarial images obtained by the AAA RT attack
. The first row shows clean images.
The second row shows the same images with applied adversarial RT transformations, and the third
row shows the same RT images as above, but with imperceptible
`
perturbations applied via AAA.
from one threat model but may choose to compose perturbations from multiple threat models
(see Figure 1). Moreover, our theoretical analysis shows that defending against an adversary who
can compose
-bounded
`
perturbations and RT transformations is challenging even in a simple
statistical setting. This theoretical result highlights the need to explore how we can build truly
robust models in this well-motivated compositional setting. The main contributions of this work are
three-fold:
We show theoretically that no linear classifier can attain non-trivial compositional robustness
in a simple, yet realistic, statistical setting.
We train a family of empirical defenses constructed from TRADES [
20
] and analyze their
performance under a compositional adversary.
We propose
TRADESAll
, a new training protocol for defending against
`RT
adversaries,
show that it attains the best performance of all the defenses trained, and discover that its
logits are more stable than our other robust models, shedding light on its strong performance.
2 Related work
Many existing works consider adversarial robustness to single perturbation types, which include
robustness to
`p
perturbations [
5
,
10
,
14
,
20
,
22
] as well as robustness to spatial transformations
of the input [
1
,
3
,
4
,
6
,
8
,
18
]. Compared with single perturbation type robustness, relatively few
consider the problem of attaining robustness to the composition of multiple perturbation types [
8
,
13
,
15
,
16
]. Tramèr and Boneh [
15
] first identified the compositional setting and studied the composition
of multiple
`p
perturbations as well as
`
and RT perturbations; however, they consider affine
combinations of multiple perturbations, which unreasonably constrains the power of the compositional
adversary. Li et al. [
8
] designed methods to attain certified robustness to the composition of various
semantic transformations of the input, and Tsai et al. [
16
] designed a generalized form of adversarial
training for compositional semantic perturbations. Mao et al. [
13
] designed a composite adversarial
attack that composes the search space of multiple base attackers. Our work is most closely related
to [
15
]; however, we differ from [
15
] by considering the addition of
`
perturbations and RT
transformations rather than an affine combination of such perturbations in our analysis, so as to not
unreasonably limit the strength of the compositional adversary. Figure 1 motivates this treatment, as
the second and third row of images are indistinguishable to humans.
3 Preliminaries
In this work, we consider a compositional threat model consisting of the composition of
-bounded
`
perturbations and bounded RT transformations. For the
`
threat model, we consider an adversary
who can perturb an image
x
with
-bounded
`
noise. That is, the adversarial reachable region
A`(x)under the `-threat model is defined by:
A`(x) = B(x, ) := {x+;||||}.(1)
2
For the RT threat model, we consider an adversary who can apply a bounded rotation
θ
followed by
bounded horizontal and vertical translations
δx, δy
to
x
. Concretely, the adversarial reachable region
under the RT-threat model is defined by:
ART(x) = {T (x;θ, δx, δy); |θ| ≤ θmax,|δx| ≤ δmax
x,|δy| ≤ δmax
y},(2)
where
T(·;θ, δx, δy)
is the affine transformation function with rotation
θ
and horizontal/vertical
translations
δx, δy
, which implicitly warps the image via an interpolation algorithm (our experiments
utilize bilinear interpolation). For the compositional threat model, the adversarial reachable region is
naturally defined by:
A`RT(x) = {B(T(x;θ, δx, δy), ); |θ| ≤ θmax,|δx| ≤ δmax
x,|δy| ≤ δmax
y}.(3)
That is,
A`RT(x)
is defined as as the set of
-bounded
`
balls around all valid affine transfor-
mations of the image
x
. To compare with our compositional setting, we also consider the union
threat model consisting of the union of
-bounded
`
perturbations and bounded RT transformations
[
15
]. In this case, the adversarial reachable region under the
`RT
-threat model is defined as
A`RT(x) = A`(x)∪ ART(x).
4 On the Difficulty of Attaining Compositional Robustness with Linear
Classifiers
In this section, we theoretically demonstrate the difficulty of defending against an
`RT
composi-
tional adversary with a linear classifier on a simple statistical setting.
4.1 Statistical Setting
To theoretically analyze the compositional adversarial setting, we use the statistical distribution
proposed in [
17
]. Namely, we study a binary classification problem with
d
-dimensional input features,
in which the first feature
X0
is strongly correlated with the output label
y
with probability
p
, and the
remaining features are weakly correlated with y. The distribution can be written as follows:
Yu.a.r.
∼ {−1,1}, X0|Y=y:= y, w.p. p;
y, w.p. 1p, Xt|Y=y∼ N (yη, 1) ,1td1,(4)
where
η= Θ( 1
d)
and
p0.5
. We assume that an
`
adversary has budget
= 2η
, similar to [
17
].
Moreover, we define an RT transformation as it is defined in [
15
]. Concretely, an RT transformation
is defined as a swap between the strongly correlated feature
X0
and a weakly correlated feature
Xt
,
1td1
. To constrain the RT transformation, we assume that an RT adversary can swap
X0
with at most
N
positions on the input signal. If we assume the input features
X0
, . . . ,
Xd1
lie on a
2-dimensional grid, then this definition of an RT transformation serves as a realistic abstraction of
applying an RT transformation to an image using nearest interpolation and rotating about the image’s
center. Namely, since the distribution over the last
d1
features is permutation invariant, then the only
power of an RT transformation is to move the strongly correlated feature, where
N
defines the number
of reachable pixels that the strongly correlated feature can be mapped to via an RT transformation.
For example, when considering only translations, we have
N= (2δmax
x+ 1)(2δmax
y+ 1)
. We now
state a theorem that establishes the difficulty of defending against an
`RT
compositional adversary
with a linear classifier.
Theorem 4.1
(A linear classifier cannot attain nontrivial
`RT
robustness)
.
Given data distribution
D
where
p1
2
,
η1
d
, and
d24
, no linear classifier
f:Rd→ {−1,1}
, where
f(x) =
sign(wTx)
, can obtain robust accuracy
>0.5
under the
`RT
threat model with
`
budget
= 2ηand RT budget N=d
8.
This theorem shows that under reasonable constraints on the compositional adversary, a linear
classifier can perform no better than random, even in the infinite data limit. We note by contrast that
a linear classifier can attain
>0.99
natural accuracy in this statistical setting; e.g., see [
17
]. This
result distinguishes itself from Theorem 4 in [
15
], in that [
15
] show that an adversary that composes
`
and RT perturbations yields a stronger attack than a union adversary, whereas we show that
a linear classifier cannot have nontrivial robustness against a compositional adversary under this
3
Table 1:
MNIST results for different defense methods.
Columns correspond to robust accuracy
under different perturbation types, whereas rows correspond to different defense models. All RT
attacks utilize our grid search strategy. PGD attacks use
40
iterations on MNIST. The best performing
entry under a given attack is bolded, while the second best is underlined.
Defense \ Attack βAAA RT PGD RT AAA RT PGD RT AAA PGD RT Natural
TRADESAll 1.0 49.47 68.60 90.28 92.49 92.54 95.60 93.93 99.34
TRADES`RT 1.0 55.61 71.37 89.28 91.51 91.66 95.10 93.06 99.03
TRADES`RT 1.0 21.97 49.16 89.67 92.31 91.32 95.20 94.08 99.47
TRADES`1.0 0.02 00.11 00.43 00.43 92.99 95.88 00.57 99.52
TRADESRT 1.0 0.00 0.07 0.00 0.18 0.00 0.18 96.63 99.64
TRADESAll 3.0 58.68 72.70 90.59 92.35 92.33 95.00 94.05 98.94
TRADES`RT 3.0 59.28 73.74 88.93 91.07 91.03 94.01 93.09 98.65
TRADES`RT 3.0 39.72 66.49 91.00 93.43 92.21 95.62 95.00 99.28
TRADES`3.0 0.04 0.11 0.48 0.48 93.83 96.47 0.64 99.35
TRADESRT 3.0 0.00 0.01 0.00 0.01 0.00 0.17 97.54 99.49
TRADESAll 6.061.74 75.38 90.33 92.29 92.19 95.25 93.60 98.73
TRADES`RT 6.0 58.21 72.54 88.25 90.27 90.62 93.75 92.21 98.23
TRADES`RT 6.0 44.52 69.97 91.25 93.43 92.49 95.46 94.99 99.22
TRADES`6.0 0.01 0.07 0.48 0.48 92.73 96.07 0.58 99.48
TRADESRT 6.0 0.00 0.00 0.00 0.07 0.00 0.07 97.48 99.42
Natural - 0.00 0.00 0.00 2.13 0.00 2.18 0.19 99.18
model checkpoint from [20].
statistical setting. We emphasize that although no linear classifier can attain nontrivial robustness on
this statistical setting, networks with sufficient depth and capacity may be able to attain nontrivial
robustness (see
TRADESAll
in Tables 1 and 2). Nevertheless, this result highlights the difficulty of
attaining compositional robustness in this setting. We next explore how we can design robust models
in this well-motivated compositional setting.
5 Experiments
5.1 Proposed defense methods
To explore the space of compositional adversarial examples and the compositional threat model, we
train a family of empirical defenses constructed from TRADES [
20
] and evaluate these defenses in
a white-box setting. We choose a white-box setting to assess the full adversarial strength of these
compositional adversarial examples. Below, we have a general form for the TRADES objective:
min
f
E(L(f(x), Y )
| {z }
Natural Accuracy
+βmax
x0∈A(x)L(f(x), f(x0))
| {z }
Robustness under A(x)
).(5)
We train a family of TRADES models under the various threat models discussed in Section 3.
Concretely, we train the following family of TRADES defense methods:
TRADES`
,
TRADESRT
,
TRADES`RT
, and
TRADES`RT
, where the subscript indicates the threat model being consid-
ered during training. To train TRADES models under these new threat models, we require a way to
efficiently solve the inner optimization problem in the TRADES objective for the new corresponding
definitions of
A(x)
. In the
`
case, we perform Projected Gradient Descent (PGD) for a small
number of steps, as is typically done [
10
]. For the RT-threat model, we perform a
Worst-of-10
search:
we sample 10 random valid affine transformations and select the affine transformed image that attains
the highest loss, as is done in [
4
]. For the robustness loss function, we use the KL-divergence between
the logits of the natural image and the logits of the transformed image. For the union setting, we
use an existing approach called the
Max Strategy
[
15
], in which we compute an
`
perturbation
using PGD and an RT perturbation using Worst-of-10, and select the perturbation that attains the
maximum KL-divergence loss. For the compositional setting, we propose the
Worst-on-Worst
strategy, whereby we first compute an RT adversarial example using Worst-of-10, and then we
perform PGD on the worst RT-perturbed image. Worst-on-Worst implicitly assumes that the “worst”
adversarial image from Worst-of-10 will produce the “worst” compositional adversarial example.
4
摘要:

ACloserLookatRobustnesstoL-innityandSpatialPerturbationsandtheirCompositionLukeRoweBenjaminThérienKrzysztofCzarneckiHongyangZhangSchoolofComputerScienceUniversityofWaterloo{l6rowe,btherien,k2czarne,hongyang.zhang}@uwaterloo.caAbstractInadversarialmachinelearning,thepopular`1threatmodelhasbeenthef...

展开>> 收起<<
A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition Luke RoweBenjamin ThérienKrzysztof Czarnecki Hongyang Zhang.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:16 页 大小:799.87KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注