
For the RT threat model, we consider an adversary who can apply a bounded rotation
θ
followed by
bounded horizontal and vertical translations
δx, δy
to
x
. Concretely, the adversarial reachable region
under the RT-threat model is defined by:
ART(x) = {T (x;θ, δx, δy); |θ| ≤ θmax,|δx| ≤ δmax
x,|δy| ≤ δmax
y},(2)
where
T(·;θ, δx, δy)
is the affine transformation function with rotation
θ
and horizontal/vertical
translations
δx, δy
, which implicitly warps the image via an interpolation algorithm (our experiments
utilize bilinear interpolation). For the compositional threat model, the adversarial reachable region is
naturally defined by:
A`∞◦RT(x) = {B∞(T(x;θ, δx, δy), ); |θ| ≤ θmax,|δx| ≤ δmax
x,|δy| ≤ δmax
y}.(3)
That is,
A`∞◦RT(x)
is defined as as the set of
-bounded
`∞
balls around all valid affine transfor-
mations of the image
x
. To compare with our compositional setting, we also consider the union
threat model consisting of the union of
-bounded
`∞
perturbations and bounded RT transformations
[
15
]. In this case, the adversarial reachable region under the
`∞∪RT
-threat model is defined as
A`∞∪RT(x) = A`∞(x)∪ ART(x).
4 On the Difficulty of Attaining Compositional Robustness with Linear
Classifiers
In this section, we theoretically demonstrate the difficulty of defending against an
`∞◦RT
composi-
tional adversary with a linear classifier on a simple statistical setting.
4.1 Statistical Setting
To theoretically analyze the compositional adversarial setting, we use the statistical distribution
proposed in [
17
]. Namely, we study a binary classification problem with
d
-dimensional input features,
in which the first feature
X0
is strongly correlated with the output label
y
with probability
p
, and the
remaining features are weakly correlated with y. The distribution can be written as follows:
Yu.a.r.
∼ {−1,1}, X0|Y=y:= y, w.p. p;
−y, w.p. 1−p, Xt|Y=y∼ N (yη, 1) ,1≤t≤d−1,(4)
where
η= Θ( 1
√d)
and
p≥0.5
. We assume that an
`∞
adversary has budget
= 2η
, similar to [
17
].
Moreover, we define an RT transformation as it is defined in [
15
]. Concretely, an RT transformation
is defined as a swap between the strongly correlated feature
X0
and a weakly correlated feature
Xt
,
1≤t≤d−1
. To constrain the RT transformation, we assume that an RT adversary can swap
X0
with at most
N
positions on the input signal. If we assume the input features
X0
, . . . ,
Xd−1
lie on a
2-dimensional grid, then this definition of an RT transformation serves as a realistic abstraction of
applying an RT transformation to an image using nearest interpolation and rotating about the image’s
center. Namely, since the distribution over the last
d−1
features is permutation invariant, then the only
power of an RT transformation is to move the strongly correlated feature, where
N
defines the number
of reachable pixels that the strongly correlated feature can be mapped to via an RT transformation.
For example, when considering only translations, we have
N= (2δmax
x+ 1)(2δmax
y+ 1)
. We now
state a theorem that establishes the difficulty of defending against an
`∞◦RT
compositional adversary
with a linear classifier.
Theorem 4.1
(A linear classifier cannot attain nontrivial
`∞◦RT
robustness)
.
Given data distribution
D
where
p≥1
2
,
η≥1
√d
, and
d≥24
, no linear classifier
f:Rd→ {−1,1}
, where
f(x) =
sign(wTx)
, can obtain robust accuracy
>0.5
under the
`∞◦RT
threat model with
`∞
budget
= 2ηand RT budget N=d
8.
This theorem shows that under reasonable constraints on the compositional adversary, a linear
classifier can perform no better than random, even in the infinite data limit. We note by contrast that
a linear classifier can attain
>0.99
natural accuracy in this statistical setting; e.g., see [
17
]. This
result distinguishes itself from Theorem 4 in [
15
], in that [
15
] show that an adversary that composes
`∞
and RT perturbations yields a stronger attack than a union adversary, whereas we show that
a linear classifier cannot have nontrivial robustness against a compositional adversary under this
3