
non-robust features in the data that standard models learn to rely upon (Tsipras et al., 2019, Ilyas
et al., 2019). In particular it is conjectured that reliance on useful but non-robust features during
training is responsible for the brittleness of neural nets. Here, we slightly adapt the feature definitions
of (Ilyas et al., 2019)1, and extend them to multi-class problems (see Appendix A).
Let
D
be the data generating distribution with
x∈ X
and
y∈ {±1}
. We define a feature as a function
φ:X → Rand distinguish how they perform as classifiers. Fix ρ, γ ≥0:
1. ρ-Useful feature: A feature φis called ρ-useful if
Ex,y∼D1{sign[φ(x)]=y}=ρ(3)
2. γ-Robust
feature: A feature
φ
is called
γ
-robust if it remains useful under any perturbation
inside a bounded “ball” B, that is if
Ex,y∼D inf
δ∈B
1{sign[φ(x+δ)]=y}=γ(4)
In general, a feature adds predictive value if it gives an advantage above guessing the most likely
label, i.e.
ρ > maxy0∈{±1}Ex,y∼D[1{y0=y}]
, and we will speak of “useful” features in this case,
omitting the
ρ
. We will call such a feature
useful, non-robust
if it is useful, but
γ
-robust only for
γ= 0 or very close to 0, depending on context.
The vast majority of works imagines features as being induced by the activations of neurons in the
net, most commonly those of the penultimate layer (representation-layer features), but the previous
formal definitions are in no way restricted to activations, and we will show how to exploit them using
the eigenspectrum of the NTK. In particular, in Sec. 4, we demonstrate that the above framework
agrees perfectly with features induced by the eigenspectrum of the NTK of a network, providing a
natural way to decompose the predictions of the NTK into such feature functions. In particular we
can identify robust, useful, and, indeed, useful non-robust features.
2.3 Neural Tangent Kernel
Let
f:Rd→R
be a (scalar) neural network with a linear final layer parameterized by a set of
weights
w∈Rp
and
{X,Y}
be a dataset of size
n
, with
X ∈ Rn×d
and
Y ∈ {±1}n
. Linearized
training methods study the first order approximation
f(x;wt+1) = f(x;wt) + ∇wf(x;wt)>(wt+1 −wt).(5)
The network gradient
∇wf(x;wt)
induces a kernel function
Θt:Rd×Rd→R
, usually referred as
the Neural Tangent Kernel (NTK) of the model
Θt(x,x0) = ∇wf(x;wt)>∇wf(x0;wt).(6)
This kernel describes the dynamics with infinitesimal learning rate (gradient flow). In general, the
tangent space spanned by the
∇wf(x;wt)
twists substantially during training, and learning with the
Gram matrix of Eq.
(6)
(empirical NTK) corresponds to training along an intermediate tangent plane.
Remarkably, however, in the infinite width limit with appropriate initialization and low learning
rate, it has been shown that
f
becomes a linear function of the parameters (Jacot et al., 2018, Liu
et al., 2020), and the NTK remains constant (
Θt= Θ0=: Θ
). Then, for learning with
`2
loss the
training dynamics of infinitely wide networks admits a closed form solution corresponding to kernel
regression (Jacot et al., 2018, Lee et al., 2019, Arora et al., 2019b)
ft(x) = Θ(x,X)>Θ−1(X,X)(I−e−λΘ(X,X)t)Y,(7)
where
x∈Rd
is any input (training or testing),
t
denotes the time evolution of gradient descent,
λ
is the (small) learning rate and, slightly abusing notation,
Θ(X,X)∈Rn×n
denotes the matrix
containing the pairwise training values of the NTK,
Θ(X,X)ij = Θ(xi,xj)
, and similarly for
Θ(x,X)∈Rn
. To be precise, Eq.
(7)
gives the mean output of the network using a weight-
independent kernel with variance depending on the initialization2.
1We distinguish useful and robust features based on their accuracy as classifiers, not in terms of correlation
with the labels as in Ilyas et al. (2019), allowing a natural extension to the multi-class setting. For robustness, we
consider any accuracy bounded away from zero as robust, quantifying that an adversary cannot drive accuracy to
zero entirely.
2
For that reason, in the experiments, we often compare this with the centered prediction of the actual neural
network, f−f0, as is commonly done in similar studies (Chizat et al., 2019).
5