Evolution of Neural Tangent Kernels under Benign and Adversarial Training Noel Loo Ramin Hasani Alexander Amini Daniela Rus

2025-05-06 0 0 9.75MB 29 页 10玖币

侵权投诉

Evolution of Neural Tangent Kernels under Benign

and Adversarial Training

Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

Computer Science and Artiﬁcial Intelligence Lab (CSAIL)

Massachusetts Institute of Technology (MIT)

{loo, rhasani, amini, rus} @mit.edu

Abstract

Two key challenges facing modern deep learning are mitigating deep networks’

vulnerability to adversarial attacks and understanding deep learning’s generalization

capabilities. Towards the ﬁrst issue, many defense strategies have been developed,

with the most common being Adversarial Training (AT). Towards the second

challenge, one of the dominant theories that has emerged is the Neural Tangent

Kernel (NTK) – a characterization of neural network behavior in the inﬁnite-width

limit. In this limit, the kernel is frozen, and the underlying feature map is ﬁxed. In

ﬁnite widths, however, there is evidence that feature learning happens at the earlier

stages of the training (kernel learning) before a second phase where the kernel

remains ﬁxed (lazy training). While prior work has aimed at studying adversarial

vulnerability through the lens of the frozen inﬁnite-width NTK, there is no work

that studies the adversarial robustness of the empirical/ﬁnite NTK during training.

In this work, we perform an empirical study of the evolution of the empirical

NTK under standard and adversarial training, aiming to disambiguate the effect of

adversarial training on kernel learning and lazy training. We ﬁnd under adversarial

training, the empirical NTK rapidly converges to a different kernel (and feature

map) than standard training. This new kernel provides adversarial robustness, even

when non-robust training is performed on top of it. Furthermore, we ﬁnd that

adversarial training on top of a ﬁxed kernel can yield a classiﬁer with

76.1%

robust

accuracy under PGD attacks with ε= 4/255 on CIFAR-10.1

1 Introduction

Modern deep learning, while effective in tackling clean and curated datasets, is often very brittle

to domain and distribution shifts [

]. Perhaps the most notorious failure mode of deep learning

to domain shifts is adversarial examples [

]: images with small, bounded perturbations which

consistently fool state-of-the-art classiﬁers. While work has been dedicated to mitigating [

explaining [

], and harnessing [

] this peculiar behavior, the problem still remains

largely open, with the state-of-the-art robust classiﬁers still falling far behind in benign accuracy

compared to standard non-robust networks [18]. Indeed, tackling this idiosyncrasy of deep learning

seems almost impossible given that the behavior of deep models under benign data and training is

still largely unexplained [13,62,63], let alone adversarial training.

The community has tried developed numerous theories on better interpreting [

] and

understanding deep learning [

]. One emerging theory is the notion of

deep learning as kernel learners [

]. That is, deep networks trained with gradient descent

learn a kernel whose feature map is embedded in the tangent space of the network outputs with

1Code is available at https://github.com/yolky/adversarial_ntk_evolution

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.12030v1 [cs.LG] 21 Oct 2022

respect to its parameters. After a brief phase of kernel learning, neural networks behave similarly to

lazy learners, i.e., linear and slow in this neural tangent feature map. This two-stage theory of deep

learning has been observed in vision networks such as residual networks [

] used in practice and

has been the subject of many recent empirical [

] and theoretical works [

]. While

these reports have veriﬁed this theory for benign training, no work has looked at this property at the

intersection of adversarial training. To this end, here, we perform the ﬁrst empirical study of how the

empirical/ﬁnite neural tangent kernel evolves over the course of adversarial training.

In section 2,

we present our experimental setup designed to isolate the effect of adversarial training on the two

stage of deep learning under the kernel learner theory: 1) kernel learning and 2) linear ﬁtting. By

using this framework, we study and show the following:

Similar to standard training, adversarial training results in a distinct kernel which quickly

converges in the ﬁrst few epochs of training (section 3)

2. Adversarial robustness can be inherited from the kernel made from adversarial train-

ing

, even when the second stage classiﬁer has

no access to adversarial examples

during

training (section 4).

Adversarial training is effective on top of the learned kernels given by standard training (in

the frozen feature regime), providing a testbed where many of the ﬁxed-feature assumptions

present in theoretical works on adversarial training are met, while still providing high

practical performance (section 7).

Eigenvectors in the learned NTK of networks with adversarial training contained visually

interpretable features, while the initial NTK and benign training NTK do not (section 8).

1.1 Background and Related Works

Neural Tangent Kernel.

Explaining the empirical success of deep learning models from a theoretical

standpoint is an exciting and active line of research that is still in its infancy. One of the most

popular tools for understanding neural network behavior is the neural tangent kernel (NTK) [

]. Under this theory training deep networks with gradient descent in the inﬁnite-width limit

corresponds to training training a kernel-based classiﬁer, with the corresponding kernel being the

NTK deﬁned as

kN T K (x, x0) = Eθ∼p(θ)[∇θf(x)T∇θf(x0)]

, for parameter distribution

p(θ)

and

network architecture

fθ

. Analogously, training an inﬁnite width Bayesian neural network corresponds

to Gaussian process regression using the NNGP kernel [

]. In both these

inﬁnite-width limits, the underlying feature space is ﬁxed, determined entirely by the architecture and

parameter distribution.

The frozen nature of the kernel stands in contradiction to the ability of deep models to learn useful

features and representations. Indeed, it has been shown theoretically, under different limiting

assumptions, or with ﬁnite-widths, that feature learning does occur, with a time-evolving and data-

dependent kernel,

kt(x, x0) = ∇θtfθt(x)T∇θtfθt(x0)

, with time-dependent network parameters

θt

[

]. This learned NTK aligns itself with dataset labels to accelerate training and

improve generalization. Empirical evidence supports this, with works ﬁnding that this meta-kernel

evolves quickly within the early stages of network training before stabilizing [

]. In

this new setting, there exists two stages in deep network training: 1) kernel learning and 2) linear

ﬁtting. In the ﬁrst stage, the kernel rapidly evolves to align with the dataset’s features and labels,

while in the second phase, the kernel changes only minimally and the network behaves linearly in the

NTK feature map

φ(x) = ∇θ0fθ0(x)



θ0=θt

, a regime sometimes referred to as lazy training [

Adversarial Examples and adversarial training.

Adversarial examples present one of the

most infamous failure modes of deep learning [

]. These are images within bounded pertur-

bations of naturally occurring images that consistently fool deep networks over a wide array

of architectures [

]. There is much literature dedicated to designing techniques for securing

networks against these attacks[

]. In this work, we focus on

adversarial training with iterated projected gradient descent (PGD), which seeks to minimize

Lrob =Ex,y∼p(x,y)hmaxx0∈B(x)Lstandard(x0, y)i

. That is, minimizing the worst-case loss of samples

2Throughout this study, we refer to the ﬁnite width NTK as the NTK unless otherwise speciﬁed

in a small neighborhood around training examples, where the inner maximization is computed during

training using iterated PGD [25,51].

From a theoretical standpoint, much work has been devoted to explaining the causes of adversarial

vulnerability and the limitations of adversarial robustness. Popular theories include the notion of

there existing “robust and non-robust features" present in data [

], adversarial examples being an

outcome of high dimensional geometry [

], an outcome of stochastic gradient descent (SGD)

[

], and many more [

]. Based on the NTK theory, there have been works studying the presence

of adversarial examples under the NTK and NNGP kernels [

]. Recently, NTK theory

has been used to generate attack methods for deep networks [

]. Adversarial examples have

been shown to arise in simpliﬁed linear classiﬁcation settings [

], which readily transfer to the

NTK settings as the NTK classiﬁer is linear in its underlying feature representation. However, these

reports on the adversarial vulnerability of NTK and NNGP kernels have been focused solely on the

inﬁnite-width limit of these kernels, with no literature on the adversarial robustness of the learned

and data-dependent NTK that is present in practical networks.

2 Experimental setup

Deﬁnitions and problem setup.

We ﬁrst deﬁne the necessary objects and terminologies for our

experimental setup. Our scheme follows closely that of [

]; however, we consider the additional

dimension of adversarial robustness in addition to benign accuracy. First, we deﬁne the parameter-

dependent empirical NTK:

kENT K,θ (x, x0) = ∇θfθ(x)T∇θfθ(x0)

. In the case where

θ=θ(t)

where

θ(t)

refers the parameters of the network after training for

epochs, we use the shorthand

kt=kENT K,θ(t). Next, we deﬁne three training dynamics.

1. Standard dynamics as fstandard,θ =fθ(t), that is, network behavior with no modiﬁcations.

2. Linearized

(also referred to as lazy) dynamics around epoch

flin,θ,t(x) = fθt(x) + (θ−

θt)T∇θ0fθ0(x)



θ0=θt

. I.e., linearized training dynamics corresponds to performing a ﬁrst-

order Taylor expansion of the network parameters around a point

θt

. We refer to

fθt

as the

parent network and

as the spawn epoch. Note that in linearized dynamics, the underlying

feature map

φ(x) = ∇θ0fθ0(x)



θ0=θt

, and the corresponding kernel is ﬁxed. Fort et al.

[21]

studied this regime and showed that in practice, deep networks training with SGD undergo

two stages in training, in which the ﬁrst phase was chaotic with a rapid changing kernel until

some relatively early epoch t, and the second stage behaves like linear training about θt.

3. Centered linear

training (or centered for short): where we subtract the parent network’s

output [

fcentered,θ,t(x)=(θ−θt)T∇θ0fθ0(x)



θ0=θt

. This corresponds to linear

training, with the zeroth order term removed. Now the output is strictly dependent on the

difference between

and

θt

, and the contribution of the ﬁrst

epochs is only through how it

modiﬁes the feature map. Studying this setting lets us isolate the properties of the learned

kernel

, without worrying about its contribution from earlier. Efﬁcient implementation of

both these linearized versions is done using forward mode differentiation in the JAX library

[11].

Experimental design

We follow closely that of [

], where we train networks in two distance stages

with an added dimension of adversarial training. Additional details are available in appendix B.

Stage 1: Standard dynamics. We train Resnet18s on CIFAR-10 or CIFAR-100 [

] for

epochs

either using benign data (i.e. no data modiﬁcation), or adversarial training, for

0≤t≤100

.Stage

2: Linearized dynamics. Following stage 1, we take the parent network from stage 1 and train for

an additional 100 epochs with either linearized or centered linearized training dynamics. Note that

for centered training, after stage 1, the networks will output entirely zeros, as the zeroth order term

has no effect, so the classiﬁer does not have a “warm start." In this stage, we train using

benign

data, with no adversarial training. Additionally, we freeze the batchnorm running mean and standard

deviation parameters after stage 1, as allowing these to change would implicitly change the feature

map

φ(x) = ∇θ0fθ0(x)



θ0=θt

. For all experiments in the main text, we use the standard

ε= 4/255

adversarial radius under a

L∞

norm, but we verify that the results hold for

ε= 8/255

in section 9

with additional results for ε= 8/255 in the appendix.

Figure 1: Evolution of the Neural Tangent Kernel under benign and adversarial training on Resnet-18s

on CIFAR-100 (Top row) and CIFAR-10 (Bottom Row). Networks are either trained for 100 epochs

with benign or adversarial training or 50 of benign followed by 50 epochs of adversarial training

(50/50). From left to right, we plot the kernel velocity, kernel distance to the ﬁnal kernel, effective

rank, and mean kernel specialization of the resulting kernels. (n=3)

3 Evolution of the NTK under standard and adversarial training

First, we look at the evolution of the kernel under different training conditions. To do this, we

calculate the empirical NTK on a random subset of 500 class-balanced training points on CIFAR-

10 and CIFAR-100 for resnets trained for 100 epochs SGD under standard dynamics with either

benign training or adversarial training. We also consider a third scenario where we perform benign

training for 50 epochs then change to adversarial training for the remaining 50 epochs so we have

an additional control point of the effect of adversarial training. For networks with multiple outputs,

the kernel matrix is a rank-four tensor in

RC×C×N×N

with

being the class count and

being

the dataset size. The calculation for these entries is given by

kc,c0(x, x0) = ∇θfc

θ(x)T∇θfc0

θ(x0)

and the resulting subclass kernel matrix

Kc,c0∈RN×N

. Unless otherwise stated, for the statistics

we present, we calculate the trace kernel, the average of the diagonal elements of the class-speciﬁc

matrix: ¯

K=1

CPC

c=1 Kc,c.

Firstly, we look at the kernel distance between neighboring epochs. This kernel distance is given by:

S(K1, K2) = 1 −Tr(KT

1K2)

||K1||F||K2||F

, which equals to 0 iff

K1=K2

. The kernel velocity is given by

, which we approximate as a ﬁnite difference between neighboring epochs (See ﬁg. 1a). We also

consider the kernel distance from the current epoch to the kernel at the end of training (ﬁg. 1b). From

these metrics we see that both for benign and adversarial training, the kernel converges within 30

epochs to close to the ﬁnal kernel, in accordance with the results in Fort et al.

[21]

. This suggests that

after these few epochs, the underlying feature set is ﬁxed, and the remainder of the training is spent

performing linear classiﬁcation on this feature set. Surprisingly, adversarial training also quickly

converges, albeit to a different kernel whose underlying feature set is more robust (as we will see in

later sections). In the third setting, we observe a small spike in kernel velocity at epoch 50 when we

swap from standard to adversarial training. The change is small compared to the initial rapid kernel

evolution, suggesting that the standard training kernel is more similar to the adversarial kernel than

it is to the initial NTK. Likewise, when we plot the kernel distance to the ﬁnal kernel, both models

nearly converge after epoch 40, while in the third setting the model stabilizes at the standard training

kernel before changing to the adversarial kernel after 50 epochs.

The third metric we compute is the effective rank of the kernel matrix [

]. This measures the

dispersion of the matrix over its eigenvectors, and is given by:

erank(K) =exp(−PN

i=1 pilog pi)

With

pi=λi

i=1 λi

with

λi

being the eigenvalues of the kernel matrix

. This value is bounded

between 1 when the matrix has one dominant direction and

when all eigenvalues are equal.

Previous work has shown that deeper networks are biased towards low effective rank (of the conjugate

kernel) at initialization [

]. Alternatively, the effective rank could be interpreted as the complexity

of the dataset under the kernel, and existing work has shown that robust classiﬁers require more

Figure 2: Performance of standard, linearized, and centered training based on kernels made from

benign or standard training on CIFAR-100 (top row) and CIFAR-10 (bottom row). Solid lines indicate

benign accuracy, while dashed lines indicate adversarial accuracy. Under standard or linearized

dynamics with benign training (left), networks have little to no robust accuracy, but the networks learn

kernels with robust features over time, as centered training gains robustness as the kernel evolves.

Centered training also sees a robustness gain over adversarial training (center) at the cost of some

benign accuracy. For linearized dynamics, the relative magnitude of the ﬁrst-order component sharply

peaks in early epochs before decaying to 0 as the spawn network full trains (right).

model capacity [

], although it is unclear the precise relationship between the effective rank

notion of complexity and those given in prior work relating to adversarial robustness.

As a fourth metric, we consider the mean kernel specialization. As discussed earlier, for multi-classed

outputs, the full NTK is a

tensor in

RC×C×N×N

with C class-speciﬁc kernels corresponding

to the diagonal entries of the ﬁrst two dimensions. Over training, it has been observed that these

individual class-speciﬁc kernels specialize in aligning themselves with their classes [

]. The kernel

specialization, deﬁned as:

KSM(c, c0) = A(Kc,c,yc0yT

c0)

C−1ΣC

d=1A(Kd,d,yc0yT

c0)

, where

A(Kc,c, yc0yT

c0)=1−

S(Kc,c, yc0yT

c0)

, i.e. the cosine similarity of the kernel matrix and the one-hot class labels. Intuitively,

this compares how aligned class

speciﬁc kernel is to the labels of class

compared to other class

kernels. This quantity is bounded between

and

and higher diagonal entries (when

c=c0

indicate higher specialization. We deﬁne the mean kernel specialization as

C−1ΣC

c=1KSM(c, c)

i.e. the average of diagonal entries of the KSM matrix. We calculate this for only CIFAR-10, as

computing all 100 class-speciﬁc kernels for CIFAR-100 is too costly. From ﬁg. 1d, where we see

that adversarial training is associated with a lower mean kernel specialization, we take away that

adversarial training promotes features that are more broadly shared between classes, although this

needs further investigations which will be the focus of our continued effort.

4 Performance of Linearized and Centered Training

Next, we look at the performance of linearized and centered training, where we vary the spawn epoch

at which we begin stage 2 training, i.e. the epoch

described in section 2. We then plot the benign and

adversarial performance of the classiﬁer after stage 2 training, in comparison with the performance

at the end of stage 1 training in ﬁg. 2 for CIFAR-10 and CIFAR-100. We show the results when

either stage 1 is performed with benign training or adversarial training. Additionally, for the case

of linearized training, we plot the relative magnitude of the zeroth order and ﬁrst-order terms in the

linearization dynamics equation. Speciﬁcally, let

f0=fθt(x)

and

∆f= (θ−θt)T∇θ0fθ0(x)



θ0=θt

We plot the relative magnitude of the two components, given by ∆fT∆f

0f0, averaged over the test set.

The most surprising observation is that centered training gains signiﬁcant robustness over standard

and linearized training dynamics, both in adversarial and benign training. Because centered training

does include the zeroth order term in its predictions, all the robustness given by centered training is

inherited entirely through the learned NTK

, and not through adversarial training (as we perform

benign training in stage 2). This lends credence to the “robust feature" hypothesis given in Ilyas

et al.

[37]

, however, now what matters is not necessarily what features are present in the data, but

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EvolutionofNeuralTangentKernelsunderBenignandAdversarialTrainingNoelLoo,RaminHasani,AlexanderAmini,DanielaRusComputerScienceandArticialIntelligenceLab(CSAIL)MassachusettsInstituteofTechnology(MIT){loo,rhasani,amini,rus}@mit.eduAbstractTwokeychallengesfacingmoderndeeplearningaremitigatingdeepnetwork...

展开>> 收起<<

Evolution of Neural Tangent Kernels under Benign and Adversarial Training Noel Loo Ramin Hasani Alexander Amini Daniela Rus.pdf

共29页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Evolution of Neural Tangent Kernels under Benign and Adversarial Training Noel Loo Ramin Hasani Alexander Amini Daniela Rus

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: