in a small neighborhood around training examples, where the inner maximization is computed during
training using iterated PGD [25,51].
From a theoretical standpoint, much work has been devoted to explaining the causes of adversarial
vulnerability and the limitations of adversarial robustness. Popular theories include the notion of
there existing “robust and non-robust features" present in data [
37
,
76
], adversarial examples being an
outcome of high dimensional geometry [
24
,
39
,
69
], an outcome of stochastic gradient descent (SGD)
[
70
,
3
], and many more [
14
]. Based on the NTK theory, there have been works studying the presence
of adversarial examples under the NTK and NNGP kernels [
22
,
9
,
15
,
60
,
10
]. Recently, NTK theory
has been used to generate attack methods for deep networks [
75
,
79
]. Adversarial examples have
been shown to arise in simplified linear classification settings [
68
,
37
], which readily transfer to the
NTK settings as the NTK classifier is linear in its underlying feature representation. However, these
reports on the adversarial vulnerability of NTK and NNGP kernels have been focused solely on the
infinite-width limit of these kernels, with no literature on the adversarial robustness of the learned
and data-dependent NTK that is present in practical networks.
2 Experimental setup
Definitions and problem setup.
We first define the necessary objects and terminologies for our
experimental setup. Our scheme follows closely that of [
21
]; however, we consider the additional
dimension of adversarial robustness in addition to benign accuracy. First, we define the parameter-
dependent empirical NTK:
kENT K,θ (x, x0) = ∇θfθ(x)T∇θfθ(x0)
. In the case where
θ=θ(t)
,
where
θ(t)
refers the parameters of the network after training for
t
epochs, we use the shorthand
kt=kENT K,θ(t). Next, we define three training dynamics.
1. Standard dynamics as fstandard,θ =fθ(t), that is, network behavior with no modifications.
2. Linearized
(also referred to as lazy) dynamics around epoch
t
:
flin,θ,t(x) = fθt(x) + (θ−
θt)T∇θ0fθ0(x)
θ0=θt
. I.e., linearized training dynamics corresponds to performing a first-
order Taylor expansion of the network parameters around a point
θt
. We refer to
fθt
as the
parent network and
t
as the spawn epoch. Note that in linearized dynamics, the underlying
feature map
φ(x) = ∇θ0fθ0(x)
θ0=θt
, and the corresponding kernel is fixed. Fort et al.
[21]
studied this regime and showed that in practice, deep networks training with SGD undergo
two stages in training, in which the first phase was chaotic with a rapid changing kernel until
some relatively early epoch t, and the second stage behaves like linear training about θt.
3. Centered linear
training (or centered for short): where we subtract the parent network’s
output [
34
,
47
,
49
]:
fcentered,θ,t(x)=(θ−θt)T∇θ0fθ0(x)
θ0=θt
. This corresponds to linear
training, with the zeroth order term removed. Now the output is strictly dependent on the
difference between
θ
and
θt
, and the contribution of the first
t
epochs is only through how it
modifies the feature map. Studying this setting lets us isolate the properties of the learned
kernel
kt
, without worrying about its contribution from earlier. Efficient implementation of
both these linearized versions is done using forward mode differentiation in the JAX library
[11].
Experimental design
We follow closely that of [
21
], where we train networks in two distance stages
with an added dimension of adversarial training. Additional details are available in appendix B.
Stage 1: Standard dynamics. We train Resnet18s on CIFAR-10 or CIFAR-100 [
41
] for
t
epochs
either using benign data (i.e. no data modification), or adversarial training, for
0≤t≤100
.Stage
2: Linearized dynamics. Following stage 1, we take the parent network from stage 1 and train for
an additional 100 epochs with either linearized or centered linearized training dynamics. Note that
for centered training, after stage 1, the networks will output entirely zeros, as the zeroth order term
has no effect, so the classifier does not have a “warm start." In this stage, we train using
benign
data, with no adversarial training. Additionally, we freeze the batchnorm running mean and standard
deviation parameters after stage 1, as allowing these to change would implicitly change the feature
map
φ(x) = ∇θ0fθ0(x)
θ0=θt
. For all experiments in the main text, we use the standard
ε= 4/255
adversarial radius under a
L∞
norm, but we verify that the results hold for
ε= 8/255
in section 9
with additional results for ε= 8/255 in the appendix.
3