Evolution of Neural Tangent Kernels under Benign and Adversarial Training Noel Loo Ramin Hasani Alexander Amini Daniela Rus

2025-05-06 0 0 9.75MB 29 页 10玖币
侵权投诉
Evolution of Neural Tangent Kernels under Benign
and Adversarial Training
Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus
Computer Science and Artificial Intelligence Lab (CSAIL)
Massachusetts Institute of Technology (MIT)
{loo, rhasani, amini, rus} @mit.edu
Abstract
Two key challenges facing modern deep learning are mitigating deep networks’
vulnerability to adversarial attacks and understanding deep learning’s generalization
capabilities. Towards the first issue, many defense strategies have been developed,
with the most common being Adversarial Training (AT). Towards the second
challenge, one of the dominant theories that has emerged is the Neural Tangent
Kernel (NTK) – a characterization of neural network behavior in the infinite-width
limit. In this limit, the kernel is frozen, and the underlying feature map is fixed. In
finite widths, however, there is evidence that feature learning happens at the earlier
stages of the training (kernel learning) before a second phase where the kernel
remains fixed (lazy training). While prior work has aimed at studying adversarial
vulnerability through the lens of the frozen infinite-width NTK, there is no work
that studies the adversarial robustness of the empirical/finite NTK during training.
In this work, we perform an empirical study of the evolution of the empirical
NTK under standard and adversarial training, aiming to disambiguate the effect of
adversarial training on kernel learning and lazy training. We find under adversarial
training, the empirical NTK rapidly converges to a different kernel (and feature
map) than standard training. This new kernel provides adversarial robustness, even
when non-robust training is performed on top of it. Furthermore, we find that
adversarial training on top of a fixed kernel can yield a classifier with
76.1%
robust
accuracy under PGD attacks with ε= 4/255 on CIFAR-10.1
1 Introduction
Modern deep learning, while effective in tackling clean and curated datasets, is often very brittle
to domain and distribution shifts [
76
]. Perhaps the most notorious failure mode of deep learning
to domain shifts is adversarial examples [
73
]: images with small, bounded perturbations which
consistently fool state-of-the-art classifiers. While work has been dedicated to mitigating [
25
,
18
,
51
],
explaining [
24
,
39
,
37
], and harnessing [
66
,
54
,
67
] this peculiar behavior, the problem still remains
largely open, with the state-of-the-art robust classifiers still falling far behind in benign accuracy
compared to standard non-robust networks [18]. Indeed, tackling this idiosyncrasy of deep learning
seems almost impossible given that the behavior of deep models under benign data and training is
still largely unexplained [13,62,63], let alone adversarial training.
The community has tried developed numerous theories on better interpreting [
77
,
27
,
43
] and
understanding deep learning [
74
,
5
,
38
,
64
,
28
,
14
,
31
,
30
]. One emerging theory is the notion of
deep learning as kernel learners [
48
,
21
,
1
]. That is, deep networks trained with gradient descent
learn a kernel whose feature map is embedded in the tangent space of the network outputs with
1Code is available at https://github.com/yolky/adversarial_ntk_evolution
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.12030v1 [cs.LG] 21 Oct 2022
respect to its parameters. After a brief phase of kernel learning, neural networks behave similarly to
lazy learners, i.e., linear and slow in this neural tangent feature map. This two-stage theory of deep
learning has been observed in vision networks such as residual networks [
32
] used in practice and
has been the subject of many recent empirical [
71
,
48
,
21
,
7
] and theoretical works [
29
,
35
,
2
]. While
these reports have verified this theory for benign training, no work has looked at this property at the
intersection of adversarial training. To this end, here, we perform the first empirical study of how the
empirical/finite neural tangent kernel evolves over the course of adversarial training.
2
In section 2,
we present our experimental setup designed to isolate the effect of adversarial training on the two
stage of deep learning under the kernel learner theory: 1) kernel learning and 2) linear fitting. By
using this framework, we study and show the following:
1.
Similar to standard training, adversarial training results in a distinct kernel which quickly
converges in the first few epochs of training (section 3)
2. Adversarial robustness can be inherited from the kernel made from adversarial train-
ing
, even when the second stage classifier has
no access to adversarial examples
during
training (section 4).
3.
Adversarial training is effective on top of the learned kernels given by standard training (in
the frozen feature regime), providing a testbed where many of the fixed-feature assumptions
present in theoretical works on adversarial training are met, while still providing high
practical performance (section 7).
4.
Eigenvectors in the learned NTK of networks with adversarial training contained visually
interpretable features, while the initial NTK and benign training NTK do not (section 8).
1.1 Background and Related Works
Neural Tangent Kernel.
Explaining the empirical success of deep learning models from a theoretical
standpoint is an exciting and active line of research that is still in its infancy. One of the most
popular tools for understanding neural network behavior is the neural tangent kernel (NTK) [
38
,
6
,
4
]. Under this theory training deep networks with gradient descent in the infinite-width limit
corresponds to training training a kernel-based classifier, with the corresponding kernel being the
NTK defined as
kN T K (x, x0) = Eθp(θ)[θf(x)Tθf(x0)]
, for parameter distribution
p(θ)
and
network architecture
fθ
. Analogously, training an infinite width Bayesian neural network corresponds
to Gaussian process regression using the NNGP kernel [
55
,
23
,
45
,
52
,
56
,
50
]. In both these
infinite-width limits, the underlying feature space is fixed, determined entirely by the architecture and
parameter distribution.
The frozen nature of the kernel stands in contradiction to the ability of deep models to learn useful
features and representations. Indeed, it has been shown theoretically, under different limiting
assumptions, or with finite-widths, that feature learning does occur, with a time-evolving and data-
dependent kernel,
kt(x, x0) = θtfθt(x)Tθtfθt(x0)
, with time-dependent network parameters
θt
[
78
,
29
,
35
,
2
,
58
,
59
]. This learned NTK aligns itself with dataset labels to accelerate training and
improve generalization. Empirical evidence supports this, with works finding that this meta-kernel
evolves quickly within the early stages of network training before stabilizing [
48
,
21
,
7
,
71
,
40
,
8
]. In
this new setting, there exists two stages in deep network training: 1) kernel learning and 2) linear
fitting. In the first stage, the kernel rapidly evolves to align with the dataset’s features and labels,
while in the second phase, the kernel changes only minimally and the network behaves linearly in the
NTK feature map
φ(x) = θ0fθ0(x)
θ0=θt
, a regime sometimes referred to as lazy training [
17
,
46
].
Adversarial Examples and adversarial training.
Adversarial examples present one of the
most infamous failure modes of deep learning [
73
]. These are images within bounded pertur-
bations of naturally occurring images that consistently fool deep networks over a wide array
of architectures [
53
]. There is much literature dedicated to designing techniques for securing
networks against these attacks[
25
,
51
,
18
,
26
,
44
,
80
,
61
,
16
,
42
]. In this work, we focus on
adversarial training with iterated projected gradient descent (PGD), which seeks to minimize
Lrob =Ex,yp(x,y)hmaxx0B(x)Lstandard(x0, y)i
. That is, minimizing the worst-case loss of samples
2Throughout this study, we refer to the finite width NTK as the NTK unless otherwise specified
2
in a small neighborhood around training examples, where the inner maximization is computed during
training using iterated PGD [25,51].
From a theoretical standpoint, much work has been devoted to explaining the causes of adversarial
vulnerability and the limitations of adversarial robustness. Popular theories include the notion of
there existing “robust and non-robust features" present in data [
37
,
76
], adversarial examples being an
outcome of high dimensional geometry [
24
,
39
,
69
], an outcome of stochastic gradient descent (SGD)
[
70
,
3
], and many more [
14
]. Based on the NTK theory, there have been works studying the presence
of adversarial examples under the NTK and NNGP kernels [
22
,
9
,
15
,
60
,
10
]. Recently, NTK theory
has been used to generate attack methods for deep networks [
75
,
79
]. Adversarial examples have
been shown to arise in simplified linear classification settings [
68
,
37
], which readily transfer to the
NTK settings as the NTK classifier is linear in its underlying feature representation. However, these
reports on the adversarial vulnerability of NTK and NNGP kernels have been focused solely on the
infinite-width limit of these kernels, with no literature on the adversarial robustness of the learned
and data-dependent NTK that is present in practical networks.
2 Experimental setup
Definitions and problem setup.
We first define the necessary objects and terminologies for our
experimental setup. Our scheme follows closely that of [
21
]; however, we consider the additional
dimension of adversarial robustness in addition to benign accuracy. First, we define the parameter-
dependent empirical NTK:
kENT K,θ (x, x0) = θfθ(x)Tθfθ(x0)
. In the case where
θ=θ(t)
,
where
θ(t)
refers the parameters of the network after training for
t
epochs, we use the shorthand
kt=kENT K,θ(t). Next, we define three training dynamics.
1. Standard dynamics as fstandard=fθ(t), that is, network behavior with no modifications.
2. Linearized
(also referred to as lazy) dynamics around epoch
t
:
flin,θ,t(x) = fθt(x) + (θ
θt)Tθ0fθ0(x)
θ0=θt
. I.e., linearized training dynamics corresponds to performing a first-
order Taylor expansion of the network parameters around a point
θt
. We refer to
fθt
as the
parent network and
t
as the spawn epoch. Note that in linearized dynamics, the underlying
feature map
φ(x) = θ0fθ0(x)
θ0=θt
, and the corresponding kernel is fixed. Fort et al.
[21]
studied this regime and showed that in practice, deep networks training with SGD undergo
two stages in training, in which the first phase was chaotic with a rapid changing kernel until
some relatively early epoch t, and the second stage behaves like linear training about θt.
3. Centered linear
training (or centered for short): where we subtract the parent network’s
output [
34
,
47
,
49
]:
fcentered,θ,t(x)=(θθt)Tθ0fθ0(x)
θ0=θt
. This corresponds to linear
training, with the zeroth order term removed. Now the output is strictly dependent on the
difference between
θ
and
θt
, and the contribution of the first
t
epochs is only through how it
modifies the feature map. Studying this setting lets us isolate the properties of the learned
kernel
kt
, without worrying about its contribution from earlier. Efficient implementation of
both these linearized versions is done using forward mode differentiation in the JAX library
[11].
Experimental design
We follow closely that of [
21
], where we train networks in two distance stages
with an added dimension of adversarial training. Additional details are available in appendix B.
Stage 1: Standard dynamics. We train Resnet18s on CIFAR-10 or CIFAR-100 [
41
] for
t
epochs
either using benign data (i.e. no data modification), or adversarial training, for
0t100
.Stage
2: Linearized dynamics. Following stage 1, we take the parent network from stage 1 and train for
an additional 100 epochs with either linearized or centered linearized training dynamics. Note that
for centered training, after stage 1, the networks will output entirely zeros, as the zeroth order term
has no effect, so the classifier does not have a “warm start." In this stage, we train using
benign
data, with no adversarial training. Additionally, we freeze the batchnorm running mean and standard
deviation parameters after stage 1, as allowing these to change would implicitly change the feature
map
φ(x) = θ0fθ0(x)
θ0=θt
. For all experiments in the main text, we use the standard
ε= 4/255
adversarial radius under a
L
norm, but we verify that the results hold for
ε= 8/255
in section 9
with additional results for ε= 8/255 in the appendix.
3
Figure 1: Evolution of the Neural Tangent Kernel under benign and adversarial training on Resnet-18s
on CIFAR-100 (Top row) and CIFAR-10 (Bottom Row). Networks are either trained for 100 epochs
with benign or adversarial training or 50 of benign followed by 50 epochs of adversarial training
(50/50). From left to right, we plot the kernel velocity, kernel distance to the final kernel, effective
rank, and mean kernel specialization of the resulting kernels. (n=3)
3 Evolution of the NTK under standard and adversarial training
First, we look at the evolution of the kernel under different training conditions. To do this, we
calculate the empirical NTK on a random subset of 500 class-balanced training points on CIFAR-
10 and CIFAR-100 for resnets trained for 100 epochs SGD under standard dynamics with either
benign training or adversarial training. We also consider a third scenario where we perform benign
training for 50 epochs then change to adversarial training for the remaining 50 epochs so we have
an additional control point of the effect of adversarial training. For networks with multiple outputs,
the kernel matrix is a rank-four tensor in
RC×C×N×N
with
C
being the class count and
N
being
the dataset size. The calculation for these entries is given by
kc,c0(x, x0) = θfc
θ(x)Tθfc0
θ(x0)
,
and the resulting subclass kernel matrix
Kc,c0RN×N
. Unless otherwise stated, for the statistics
we present, we calculate the trace kernel, the average of the diagonal elements of the class-specific
matrix: ¯
K=1
CPC
c=1 Kc,c.
Firstly, we look at the kernel distance between neighboring epochs. This kernel distance is given by:
S(K1, K2) = 1 Tr(KT
1K2)
||K1||F||K2||F
, which equals to 0 iff
K1=K2
. The kernel velocity is given by
dS
dt
, which we approximate as a finite difference between neighboring epochs (See fig. 1a). We also
consider the kernel distance from the current epoch to the kernel at the end of training (fig. 1b). From
these metrics we see that both for benign and adversarial training, the kernel converges within 30
epochs to close to the final kernel, in accordance with the results in Fort et al.
[21]
. This suggests that
after these few epochs, the underlying feature set is fixed, and the remainder of the training is spent
performing linear classification on this feature set. Surprisingly, adversarial training also quickly
converges, albeit to a different kernel whose underlying feature set is more robust (as we will see in
later sections). In the third setting, we observe a small spike in kernel velocity at epoch 50 when we
swap from standard to adversarial training. The change is small compared to the initial rapid kernel
evolution, suggesting that the standard training kernel is more similar to the adversarial kernel than
it is to the initial NTK. Likewise, when we plot the kernel distance to the final kernel, both models
nearly converge after epoch 40, while in the third setting the model stabilizes at the standard training
kernel before changing to the adversarial kernel after 50 epochs.
The third metric we compute is the effective rank of the kernel matrix [
65
]. This measures the
dispersion of the matrix over its eigenvectors, and is given by:
erank(K) =exp(PN
i=1 pilog pi)
,
With
pi=λi
PN
i=1 λi
with
λi
being the eigenvalues of the kernel matrix
K
. This value is bounded
between 1 when the matrix has one dominant direction and
N
when all eigenvalues are equal.
Previous work has shown that deeper networks are biased towards low effective rank (of the conjugate
kernel) at initialization [
36
]. Alternatively, the effective rank could be interpreted as the complexity
of the dataset under the kernel, and existing work has shown that robust classifiers require more
4
Figure 2: Performance of standard, linearized, and centered training based on kernels made from
benign or standard training on CIFAR-100 (top row) and CIFAR-10 (bottom row). Solid lines indicate
benign accuracy, while dashed lines indicate adversarial accuracy. Under standard or linearized
dynamics with benign training (left), networks have little to no robust accuracy, but the networks learn
kernels with robust features over time, as centered training gains robustness as the kernel evolves.
Centered training also sees a robustness gain over adversarial training (center) at the cost of some
benign accuracy. For linearized dynamics, the relative magnitude of the first-order component sharply
peaks in early epochs before decaying to 0 as the spawn network full trains (right).
model capacity [
14
,
68
], although it is unclear the precise relationship between the effective rank
notion of complexity and those given in prior work relating to adversarial robustness.
As a fourth metric, we consider the mean kernel specialization. As discussed earlier, for multi-classed
outputs, the full NTK is a
4D
tensor in
RC×C×N×N
with C class-specific kernels corresponding
to the diagonal entries of the first two dimensions. Over training, it has been observed that these
individual class-specific kernels specialize in aligning themselves with their classes [
71
]. The kernel
specialization, defined as:
KSM(c, c0) = A(Kc,c,yc0yT
c0)
C1ΣC
d=1A(Kd,d,yc0yT
c0)
, where
A(Kc,c, yc0yT
c0)=1
S(Kc,c, yc0yT
c0)
, i.e. the cosine similarity of the kernel matrix and the one-hot class labels. Intuitively,
this compares how aligned class
c
specific kernel is to the labels of class
c0
compared to other class
kernels. This quantity is bounded between
0
and
C
and higher diagonal entries (when
c=c0
),
indicate higher specialization. We define the mean kernel specialization as
C1ΣC
c=1KSM(c, c)
,
i.e. the average of diagonal entries of the KSM matrix. We calculate this for only CIFAR-10, as
computing all 100 class-specific kernels for CIFAR-100 is too costly. From fig. 1d, where we see
that adversarial training is associated with a lower mean kernel specialization, we take away that
adversarial training promotes features that are more broadly shared between classes, although this
needs further investigations which will be the focus of our continued effort.
4 Performance of Linearized and Centered Training
Next, we look at the performance of linearized and centered training, where we vary the spawn epoch
at which we begin stage 2 training, i.e. the epoch
t
described in section 2. We then plot the benign and
adversarial performance of the classifier after stage 2 training, in comparison with the performance
at the end of stage 1 training in fig. 2 for CIFAR-10 and CIFAR-100. We show the results when
either stage 1 is performed with benign training or adversarial training. Additionally, for the case
of linearized training, we plot the relative magnitude of the zeroth order and first-order terms in the
linearization dynamics equation. Specifically, let
f0=fθt(x)
and
f= (θθt)Tθ0fθ0(x)
θ0=θt
.
We plot the relative magnitude of the two components, given by fTf
fT
0f0, averaged over the test set.
The most surprising observation is that centered training gains significant robustness over standard
and linearized training dynamics, both in adversarial and benign training. Because centered training
does include the zeroth order term in its predictions, all the robustness given by centered training is
inherited entirely through the learned NTK
, and not through adversarial training (as we perform
benign training in stage 2). This lends credence to the “robust feature" hypothesis given in Ilyas
et al.
[37]
, however, now what matters is not necessarily what features are present in the data, but
5
摘要:

EvolutionofNeuralTangentKernelsunderBenignandAdversarialTrainingNoelLoo,RaminHasani,AlexanderAmini,DanielaRusComputerScienceandArticialIntelligenceLab(CSAIL)MassachusettsInstituteofTechnology(MIT){loo,rhasani,amini,rus}@mit.eduAbstractTwokeychallengesfacingmoderndeeplearningaremitigatingdeepnetwork...

展开>> 收起<<
Evolution of Neural Tangent Kernels under Benign and Adversarial Training Noel Loo Ramin Hasani Alexander Amini Daniela Rus.pdf

共29页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:29 页 大小:9.75MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 29
客服
关注