Stable and Efﬁcient Adversarial Training through Local Linearization Zhuorong Li Dawei Yu Zhejiang University City College

2025-05-03 0 0 974.97KB 16 页 10玖币

侵权投诉

Stable and Efﬁcient Adversarial Training through Local Linearization

Zhuorong Li, Dawei Yu

Zhejiang University City College

Hangzhou, China

lizr@zucc.edu.cn, ydw.ccm@gmail.com

Abstract

There has been a recent surge in single-step adversar-

ial training as it shows robustness and efﬁciency. How-

ever, a phenomenon referred to as “catastrophic overﬁt-

ting” has been observed, which is prevalent in single-

step defenses and may frustrate attempts to use FGSM ad-

versarial training. To address this issue, we propose a

novel method, Stable and Efﬁcient Adversarial Training

(SEAT), which mitigates catastrophic overﬁtting by har-

nessing on local properties that distinguish a robust model

from that of a catastrophic overﬁtted model. The proposed

SEAT has strong theoretical justiﬁcations, in that minimiz-

ing the SEAT loss can be shown to favour smooth empir-

ical risk, thereby leading to robustness. Experimental re-

sults demonstrate that the proposed method successfully

mitigates catastrophic overﬁtting, yielding superior perfor-

mance amongst efﬁcient defenses. Our single-step method

can reach 51% robust accuracy for CIFAR-10 with l∞per-

turbations of radius 8/255 under a strong PGD-50 attack,

matching the performance of a 10-step iterative adversarial

training at merely 3% computational cost.

1. Introduction

Whereas Deep Neural Networks(DNNs) based systems

penetrate almost every corner in our daily life, they are

not intrinsically robust. In particular, by imposing high ﬁ-

delity and imperceptible distortion on the original inputs,

also known as adversarial attacks, decisions of DNNs can

be completely altered [14,32]. It is thus imperative to de-

velop mitigation strategies, especially in high-stakes appli-

cations where DNNs are applied, e.g., autonomous driving

and surveillance system [5,28].

There has been a great deal of work on devising sophis-

ticate adversarial attacks [3,16], which has thus spurred im-

mense interest towards building defenses against such at-

tacks [8,38,45]. Among them, adversarial training (AT)

is one of the most promising methods to achieve empirical

robustness against adversarial attacks. Its training regime

Figure 1. Analysis of catastrophic overﬁtting. Solid lines indi-

cate the robust accuracy against PGD-20 on validation set during

the training process, and dashed lines denote the mean linearity

approximation error over entire training set.

attempts to directly augment the training set with adversar-

ial samples that are generated on-the-ﬂy [15,27]. Speciﬁ-

cally, when the adversarial samples above are produced by

multiple gradient propagations, the corresponding adversar-

ial training is named multi-step AT or otherwise single-step

AT. Unfortunately, the cost of AT becomes prohibitively

high with growing model capacity and dataset scale. This

is primarily due to the intensive computation of adversarial

perturbations, as each step of adversarial training requires

multiple forward propagations to ﬁnd the perturbations.

One approach to alleviate such computational cost is to

train with single-step adversary, such as Fast Gradient Sign

Method (FGSM) [15], which takes only one gradient step to

compute thus is much cheaper. However, this method relies

on the local linear assumption of the loss surface, which is

often compromised as the FGSM training progresses. As

ﬁrstly discovered by [43], single-step AT is prone to an in-

triguing phenomenon referred as catastrophic overﬁtting,

arXiv:2210.05373v1 [cs.LG] 11 Oct 2022

i.e., the validating robustness under multi-step attack, e.g.,

projected gradient descent (PGD) [27], suddenly drops to

zero whereas the training robustness against FGSM attack

keeps increasing. Later, it was found that catastrophic over-

ﬁtting it not limited to FGSM based AT but also occurs in

diverse single-step AT methods [2]. Few attempts [2,24,34]

have been made to identify the underlying reason for catas-

trophic overﬁtting and thus to develop strategies to prevent

this failure. However, these work did not provide a funda-

mental reason for the problem and methods proposed are

computationally inefﬁcient [26].

In this work, we ﬁrst unfold the connection behind catas-

trophic overﬁtting and local linearity. Recall that single-step

adversary such as FGSM produces perturbations based on

liner approximation of the loss function. However, the gra-

dient masking causes the linear assumption to become unre-

liable, which results in the exclusion of strong attacks that

maximize the loss during training. As catastrophic over-

ﬁtting typically happens in non-iterative AT, we perform a

comparison between single-step AT and multi-step AT for

our empirical study. Fig. 1shows that catastrophic over-

ﬁtting coincides with the drastic change of local linearity

of the loss surface. To be more speciﬁc, the linearity ap-

proximation error of FGSM-AT abruptly increases in the

moment when catastrophic overﬁtting occurs, with the ro-

bust accuracy suddenly deteriorated within an epoch. On

the contrary, TRADES that generates adversarial examples

with multiple iterations, is able to maintain negligible value

of local linearity error during entire training.

Upon this observation, we attempt to make a rigorous

and comprehensive study on addressing catastrophic over-

ﬁtting by retaining the local linearity of the loss function.

The proposed Stable and Efﬁcient Adversarial Training

harnesses on the local linearity of models trained using

multi-step methods, and incorporates such salient property

into models trained using single-step AT. Fig. C.2 in Ap-

pendix shows that model resulting from the proposed SEAT

behaves in a strikingly similar way to that of TRADES, in-

dicating competitive robustness with multi-step AT.

Our main contributions are summarized as follows:

•We ﬁrst empirically identify a clear correlation be-

tween catastrophic overﬁtting and local linearity in DNNs,

which sparks our theoretical analysis of the failure reason

behind FGSM-AT, motivating us to overcome the weakness.

•We propose a novel regularization, Stable and Efﬁcient

Adversarial Training (SEAT), which prevents catastrophic

overﬁtting by explicitly punishing the violation of linearity

assumption to assure the validity of FGSM solution.

•We conduct a throughout experimental study and show

that the proposed SEAT consistently achieves superiority in

stability and adversarial robustness amongst existing single-

step AT, and is even comparable to most multi-step AT but

at a much lower cost. We also justify the effectiveness of

SEAT in different attack setups, and through loss surface

smoothness, and decision boundary distortions.

2. Related Work

2.1. Adversarial Robustness and Attack Strength

Adversarial training is widely regarded as the most ef-

fective defenses. According to the count of gradient prop-

agations involved in attack generation, methods can be

mainly grouped into multi-step AT [7,27] and single-step

AT [15,22]. Multi-step AT, such as PGD-AT [27], gener-

ally achieves robustness by training on strong perturbations

generated by iterative optimization. In more recent work,

TRADES [47] and AWP [44] yield enhanced robustness

with regard to a regularization, and [31] further improves

by a judicious choice of hyperparameters. Albeit empir-

ically the best performing method to train robust models,

multi-step AT is time-consuming.

The high cost of multi-step AT has motivated an alter-

nate, i.e., single-step AT, which proves to be efﬁcient. It

trains by training with shared gradient computations [35];

or by using cheaper adversaries, such as FGSM [15,36]; or

by ﬁrst using FGSM and later switching to PGD [40]. While

these single-step AT shows promising direction, their robust

performance is not on par with multi-step AT. Worse still, it

is prone to a serious problem of catastrophic overﬁtting,i.e.

after a few epochs of adversarial training, the robust accu-

racy of the model against PGD sharply decreases to 0%.

2.2. Adversarial Generalization and Flat Minima

There exists a large body of work investigating the corre-

lation between the ﬂatness of local minima and the general-

ization performance of DNNs on natural samples [21,25]. It

has been empirically veriﬁed and commonly accepted that

ﬂatter loss surface tends to yield better generalization, and

this understanding is further utilized to design regulariza-

tion(e.g., [13,20,42]).

Analogous connection has also be identiﬁed in adver-

sarial training scenario, where the ﬂatness of loss surface

helps to improve robust generalization on adversarial sam-

ples [30]. Several well-recognized improvements of AT,

i.e., TRADES [47], MART [41], and RST [7], all implicitly

ﬂatten the loss surface to improve the robust generalization.

Moreover, there are a line of works proposed explicit reg-

ularization to directly encourage the ﬂatness of local min-

ima [33,44].

3. Methodology

Our aim is to develop a technique that resolves catas-

trophic overﬁtting so as to stabilize the single-step adver-

sarial training. In this section, we ﬁrst theoretically ana-

lyze the pitfalls of present single-step AT methods. Then,

we provide theoretical justiﬁcations for our regularization,

proving our potentials to avoid catastrophic overﬁtting. Fi-

nally, we expound on our proposal, which is termed “Stable

and Efﬁcient Adversarial Training” or SEAT.

3.1. Revisiting Single-step Adversarial Training

Recall the adversarial training, where adversarial pertur-

bation δcan be generated by:

δ←Projδ−· ∇δ`(x+δ)(1)

where Proj(x) = arg minξ∈B(r)kx−ξkp,ris small value

radius, x∈ D is the sampled data that with ground truth

label y,`(·)is the loss function, is an arbitrary perturbation

with sufﬁciently small size, and ∇δ`(·)denotes the gradient

of the loss w.r.t. perturbation δ. While the loss function

is highly non-linear, we can approximate it by making a

reasonable assumption that the loss is once-differentiable.

So for the adversarial loss `(x+δ), according to the ﬁrst-

order Taylor expansion, it leads to:

`(x+δ) = `(x) + hδ, ∇x`(x)i+ω(δ),(2)

where x+δ∈ Br(x)denotes a small neighborhood of the

x. Suppose that given a metric space M= (X, d), there

exists a radius r > 0satisfying Br(x) = {x+δ∈ X :

d(x, x +δ)< r}. Besides, ω(δ)denotes the higher order

terms in δ, which tends to be zero in the limit of a small

perturbation.

As we focus on single-step attacks in this work, we

choose δ=·sgn(∇x`(x)) where is a small value and

sgn(∇x`(x)) is a vector. Now Eq. (2) would be,

`(x+δ) = `(x)+(∇x`(x)T)·sgn(∇x`(x)) + ω(δ)

=`(x) + · k∇x`(x)k1+ω(δ)

As the adversarial perturbation δis carefully crafted to be

imperceptible, it naturally satisﬁes that ω(δ)is negligible.

Thus, we have:

`(x+δ)−`(x) = · k∇x`(x)k1(3)

As described in the previous section, catastrophic overﬁt-

ting has been observed during the training phase of models

using single-step defenses. This indicates that the maxi-

mum magnitude is no longer the strongest step size in the

direction of perturbation δ. Subsequently, the maximization

cannot be satisﬁed when catastrophic overﬁtting occurs as

the loss surface is highly curved. Eq. (3) reveals that a ﬁx

step of perturbation, i.e.,·k∇x`(x)k1is probably the prin-

ciple reason for catastrophic overﬁtting.

3.2. Justiﬁcations for Local Linearization

To resolve the issue mentioned above, we suggest a reg-

ularizer to prevent catastrophic overﬁtting by encouraging

local linearity, coupled with a scaled step size.

As described before, suppose that we are given a loss

function `(·)that is once-differentiable, the loss at the point

x+δ∈ Br(x),i.e., the adversarial loss `(x+δ), can be well

approximated by its ﬁrst-order Taylor expansion at the point

xas `(x) + hδ, ∇x`(x)i. In other words, we can measure

how linear the surface is within a neighborhood by comput-

ing the absolute difference between these two values as

ξ(θ, x) = |`(x+δ)−`(x)− hδ, ∇x`(x)i|,(4)

where the ξ(θ, x)deﬁnes linearity approximation error, and

the perturbation we focus on is generated by non-iterative

adversary as δ=·sgn(∇x`(x)).

Next, we provide theoretical analysis on how the linear-

ity approximation error ξ(θ, x)correlates with the catas-

trophic overﬁtting, thereby deriving a regularizer for stabi-

lizing the training phase.

Property 1. Consider a loss function `(·)that is once-

differentiable, and a local neighbourhood B(x)that of ra-

dius and centered at x. For any x+δ∈ B(x), we have

|`(x+δ)−`(x)− hδ, ∇x`(x)i|

≤(+1

√n· kδk2)· k∇x`(x)k1.(5)

Proof. We start from the local Taylor expansion Eq. (2). It

is clear that when x→x+δ, the higher order term ω(δ)

becomes negligible. Mostly, we instead use its equivalent,

i.e.,|`(x+δ)−`(x)− hδ, ∇x`(x)i| as a measure of how

linear the surface is within a neighbourhood. Apparently,

this is precisely the regularizer we deﬁned in Eq. (4).

|`(x+δ)−`(x)− hδ, ∇x`(x)i|

≤|`(x+δ)−`(x)|+|hδ, ∇x`(x)i|

=· k∇`(x)k1+|hδ, ∇x`(x)i|

≤· k∇`(x)k1+kδk2· k∇x`(x)k2

≤(+1

√n· kδk2)· k∇`(x)k1

(6)

Remark 1. Upon Eq. (5)it is clear that by bounding the

linearity approximation error, one can implicitly introduces

a scaling parameter kδk2/√nfor perturbation generation

instead of a ﬁxed magnitude ·k∇x`(x)k1of FGSM attack,

which is the main cause of catastrophic overﬁtting.

Property 2. Consider a loss function `(·)that is locally Lip-

schitz continuous. For any point in B(x), linearity approx-

imation error ξ(θ, x)gives the following inequality:

K < sup

x+δ∈B(x)

kδk2·|`(x+δ)−`(x)− hδ, ∇x`(x)i|

+|hδ, ∇x`(x)i|

Details of the proof are provided in Appendix B.

Remark 2. The linearity approximation error ξ(θ, x)com-

poses the expression on the RHS. Hence, minimizing ξ(θ, x)

is expected to induce a smaller locally Lipschitz constant K,

thereby encouraging the optimization procedure to ﬁnd a

model that is locally Lipschitz continuous.

Based on the above favorable bound (as Property 1) and

local Lipschitz (as Property 2) continuity, we derive our

proposed regularizer from the linearity approximation error

J(θ) = λ· | max

δ∈B(x)`(x+δ)−`(x)−δT∇x`(x)|.(7)

where λ∈Ris a hyperparameter specifying the strength for

imposing local linearization. We set λ= 0.5based on the

experiment presented in Tab. 3, and also for a well balance.

3.3. Training Objective

The proposed method incorporates the linearization reg-

ularizer to enhance optimization in both attack generation

and defense training. In particular, we modify the training

scheme of vanilla multi-step defense and propose our train-

ing objective of SEAT as follows:

min

E(x,y)e

Ladvfθ(x+δ∗), y+J(θ)(8)

δ∗= arg max

x+δ∈B(x)Ladvfθ(x+δ), y+J(θ)(9)

where Ladv is the standard loss, e.g., the cross-entropy loss

or maximum margin loss, J(θ)is the proposed lineariza-

tion regularizer and fθ(·)represents the neural network with

parameters θ. We will defer the deﬁnition of e

Ladv later.

Pseudo-code for our algorithm is given in Algorithm 1.

Algorithm 1 Stable and Efﬁcient Adversarial Training

Require: Total Epoch N, Neural Network fθwith param-

eters θ, Training Set D={(xj, yj)}, Adversarial Pertur-

bation Radius , Step Size α, Flooding level b.

for all epoch = 1,··· , N do

for all (xj, yj)∈ D do

* Inner Maximization to update δ

δ=Uniform (−, )

L=Ladv(xj+δ, yj) + J(θ)

=−fyj

θ(xj+δ) + fj

θ(xj+δ) + J(θ)

δ∗←δ+α·sgn(∇δL)

* Outer Minimization to update θ

L=e

Ladv(xj+δ∗, yj) + J(θ)

=|Lce(xj+δ∗, yj)−b|+b+J(θ)

θ←θ−η·(∇θe

end for

3.3.1 Outer Minimization

Upon previous empirical observation and theoretical analy-

sis, we assume that catastrophic overﬁtting is probably cor-

related with the deterioration of local linearity. To resolve

this problem, we motivate SEAT through local linearization.

The proposed training scheme caters to the dual objective of

minimizing the classiﬁcation loss on adversarial examples,

while also explicitly minimizing the violation of linearity

assumption to assure the validity of FGSM solution. Our

preliminary is to use the cross-entropy loss and further in-

troduce the proposed regularizer derived from the linearity

approximation error.

As is observed that training models excessively towards

adversarial robustness may hurt the generalization [39], this

work takes one more step towards mitigating overﬁtting.

Based on the understand that ﬂat minima tend to yield bet-

ter generalization, we leverage the recently proposed regu-

larization Flooding [20], which forces the training loss to

stay above a reasonably small values rather than approach-

ing zero to avoid overﬁtting. To the best of our knowledge,

we are the ﬁrst ever to introduce the Flooding into single-

step adversarial training, motivating to produce models that

can be better generalized to multi-step optimized attacks,

thereby mitigating the catastrophic overﬁtting of the same.

The implementation of Flooding is surprisingly simple

as e

R(θ) = |R(θ)−b|+b, where R(θ)and e

R(θ)respectively

denotes the original training objective and the ﬂooded train-

ing objective, and b > 0is the ﬂood level [20]. Thus, we

are inspired to use the ﬂooded version instead of the basic

adversarial training loss.

Building on it, we replace the adversarial loss Ladv with

Ladv. Now, we arrive at our implementation of the outer

minimization for our single-step defense SAET, as Eq. (8).

3.3.2 Inner Maximization

On the attackers’ side, they are trying to ﬁnd the perturba-

tion in which not only the loss on adversarial samples is

maximized, but also the linearity assumption in the vicin-

ity of each data point is maximally violated. Further, we

use the maximum margin loss for the implementation of

Ladv, as it is known to be beneﬁcial especially for single-

step AT, which heavily relies on the initial gradient direc-

tion [17,36]. The maximum margin loss can be given by

−fy

θ(ex) + max fj

θ(ex), where exdenotes the adversarial im-

age, fy

θ(ex)is the score on exwith ground truth yand j6=y.

As the ablation study in Appendix Fshown, the maximum

margin loss, coupled with our proposed linearization reg-

ularizer, improves the attack efﬁcacy and thereby yielding

models that are signiﬁcantly more robust.

4. Experiments and Analysis

We conduct comprehensive evaluations to verify the ef-

fectiveness of our proposed SEAT. We ﬁrst present the

benchmarking robustness in white-box and black-box set-

tings, followed by extensive quantitative and qualitative

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StableandEfcientAdversarialTrainingthroughLocalLinearizationZhuorongLi,DaweiYuZhejiangUniversityCityCollegeHangzhou,Chinalizr@zucc.edu.cn,ydw.ccm@gmail.comAbstractTherehasbeenarecentsurgeinsingle-stepadversar-ialtrainingasitshowsrobustnessandefciency.How-ever,aphenomenonreferredtoascatastrophicov...

展开>> 收起<<

Stable and Efﬁcient Adversarial Training through Local Linearization Zhuorong Li Dawei Yu Zhejiang University City College.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Stable and Efﬁcient Adversarial Training through Local Linearization Zhuorong Li Dawei Yu Zhejiang University City College

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: