Stable and Efficient Adversarial Training through Local Linearization Zhuorong Li Dawei Yu Zhejiang University City College

2025-05-03 0 0 974.97KB 16 页 10玖币
侵权投诉
Stable and Efficient Adversarial Training through Local Linearization
Zhuorong Li, Dawei Yu
Zhejiang University City College
Hangzhou, China
lizr@zucc.edu.cn, ydw.ccm@gmail.com
Abstract
There has been a recent surge in single-step adversar-
ial training as it shows robustness and efficiency. How-
ever, a phenomenon referred to as “catastrophic overfit-
ting” has been observed, which is prevalent in single-
step defenses and may frustrate attempts to use FGSM ad-
versarial training. To address this issue, we propose a
novel method, Stable and Efficient Adversarial Training
(SEAT), which mitigates catastrophic overfitting by har-
nessing on local properties that distinguish a robust model
from that of a catastrophic overfitted model. The proposed
SEAT has strong theoretical justifications, in that minimiz-
ing the SEAT loss can be shown to favour smooth empir-
ical risk, thereby leading to robustness. Experimental re-
sults demonstrate that the proposed method successfully
mitigates catastrophic overfitting, yielding superior perfor-
mance amongst efficient defenses. Our single-step method
can reach 51% robust accuracy for CIFAR-10 with lper-
turbations of radius 8/255 under a strong PGD-50 attack,
matching the performance of a 10-step iterative adversarial
training at merely 3% computational cost.
1. Introduction
Whereas Deep Neural Networks(DNNs) based systems
penetrate almost every corner in our daily life, they are
not intrinsically robust. In particular, by imposing high fi-
delity and imperceptible distortion on the original inputs,
also known as adversarial attacks, decisions of DNNs can
be completely altered [14,32]. It is thus imperative to de-
velop mitigation strategies, especially in high-stakes appli-
cations where DNNs are applied, e.g., autonomous driving
and surveillance system [5,28].
There has been a great deal of work on devising sophis-
ticate adversarial attacks [3,16], which has thus spurred im-
mense interest towards building defenses against such at-
tacks [8,38,45]. Among them, adversarial training (AT)
is one of the most promising methods to achieve empirical
robustness against adversarial attacks. Its training regime
Figure 1. Analysis of catastrophic overfitting. Solid lines indi-
cate the robust accuracy against PGD-20 on validation set during
the training process, and dashed lines denote the mean linearity
approximation error over entire training set.
attempts to directly augment the training set with adversar-
ial samples that are generated on-the-fly [15,27]. Specifi-
cally, when the adversarial samples above are produced by
multiple gradient propagations, the corresponding adversar-
ial training is named multi-step AT or otherwise single-step
AT. Unfortunately, the cost of AT becomes prohibitively
high with growing model capacity and dataset scale. This
is primarily due to the intensive computation of adversarial
perturbations, as each step of adversarial training requires
multiple forward propagations to find the perturbations.
One approach to alleviate such computational cost is to
train with single-step adversary, such as Fast Gradient Sign
Method (FGSM) [15], which takes only one gradient step to
compute thus is much cheaper. However, this method relies
on the local linear assumption of the loss surface, which is
often compromised as the FGSM training progresses. As
firstly discovered by [43], single-step AT is prone to an in-
triguing phenomenon referred as catastrophic overfitting,
arXiv:2210.05373v1 [cs.LG] 11 Oct 2022
i.e., the validating robustness under multi-step attack, e.g.,
projected gradient descent (PGD) [27], suddenly drops to
zero whereas the training robustness against FGSM attack
keeps increasing. Later, it was found that catastrophic over-
fitting it not limited to FGSM based AT but also occurs in
diverse single-step AT methods [2]. Few attempts [2,24,34]
have been made to identify the underlying reason for catas-
trophic overfitting and thus to develop strategies to prevent
this failure. However, these work did not provide a funda-
mental reason for the problem and methods proposed are
computationally inefficient [26].
In this work, we first unfold the connection behind catas-
trophic overfitting and local linearity. Recall that single-step
adversary such as FGSM produces perturbations based on
liner approximation of the loss function. However, the gra-
dient masking causes the linear assumption to become unre-
liable, which results in the exclusion of strong attacks that
maximize the loss during training. As catastrophic over-
fitting typically happens in non-iterative AT, we perform a
comparison between single-step AT and multi-step AT for
our empirical study. Fig. 1shows that catastrophic over-
fitting coincides with the drastic change of local linearity
of the loss surface. To be more specific, the linearity ap-
proximation error of FGSM-AT abruptly increases in the
moment when catastrophic overfitting occurs, with the ro-
bust accuracy suddenly deteriorated within an epoch. On
the contrary, TRADES that generates adversarial examples
with multiple iterations, is able to maintain negligible value
of local linearity error during entire training.
Upon this observation, we attempt to make a rigorous
and comprehensive study on addressing catastrophic over-
fitting by retaining the local linearity of the loss function.
The proposed Stable and Efficient Adversarial Training
harnesses on the local linearity of models trained using
multi-step methods, and incorporates such salient property
into models trained using single-step AT. Fig. C.2 in Ap-
pendix shows that model resulting from the proposed SEAT
behaves in a strikingly similar way to that of TRADES, in-
dicating competitive robustness with multi-step AT.
Our main contributions are summarized as follows:
We first empirically identify a clear correlation be-
tween catastrophic overfitting and local linearity in DNNs,
which sparks our theoretical analysis of the failure reason
behind FGSM-AT, motivating us to overcome the weakness.
We propose a novel regularization, Stable and Efficient
Adversarial Training (SEAT), which prevents catastrophic
overfitting by explicitly punishing the violation of linearity
assumption to assure the validity of FGSM solution.
We conduct a throughout experimental study and show
that the proposed SEAT consistently achieves superiority in
stability and adversarial robustness amongst existing single-
step AT, and is even comparable to most multi-step AT but
at a much lower cost. We also justify the effectiveness of
SEAT in different attack setups, and through loss surface
smoothness, and decision boundary distortions.
2. Related Work
2.1. Adversarial Robustness and Attack Strength
Adversarial training is widely regarded as the most ef-
fective defenses. According to the count of gradient prop-
agations involved in attack generation, methods can be
mainly grouped into multi-step AT [7,27] and single-step
AT [15,22]. Multi-step AT, such as PGD-AT [27], gener-
ally achieves robustness by training on strong perturbations
generated by iterative optimization. In more recent work,
TRADES [47] and AWP [44] yield enhanced robustness
with regard to a regularization, and [31] further improves
by a judicious choice of hyperparameters. Albeit empir-
ically the best performing method to train robust models,
multi-step AT is time-consuming.
The high cost of multi-step AT has motivated an alter-
nate, i.e., single-step AT, which proves to be efficient. It
trains by training with shared gradient computations [35];
or by using cheaper adversaries, such as FGSM [15,36]; or
by first using FGSM and later switching to PGD [40]. While
these single-step AT shows promising direction, their robust
performance is not on par with multi-step AT. Worse still, it
is prone to a serious problem of catastrophic overfitting,i.e.
after a few epochs of adversarial training, the robust accu-
racy of the model against PGD sharply decreases to 0%.
2.2. Adversarial Generalization and Flat Minima
There exists a large body of work investigating the corre-
lation between the flatness of local minima and the general-
ization performance of DNNs on natural samples [21,25]. It
has been empirically verified and commonly accepted that
flatter loss surface tends to yield better generalization, and
this understanding is further utilized to design regulariza-
tion(e.g., [13,20,42]).
Analogous connection has also be identified in adver-
sarial training scenario, where the flatness of loss surface
helps to improve robust generalization on adversarial sam-
ples [30]. Several well-recognized improvements of AT,
i.e., TRADES [47], MART [41], and RST [7], all implicitly
flatten the loss surface to improve the robust generalization.
Moreover, there are a line of works proposed explicit reg-
ularization to directly encourage the flatness of local min-
ima [33,44].
3. Methodology
Our aim is to develop a technique that resolves catas-
trophic overfitting so as to stabilize the single-step adver-
sarial training. In this section, we first theoretically ana-
lyze the pitfalls of present single-step AT methods. Then,
we provide theoretical justifications for our regularization,
proving our potentials to avoid catastrophic overfitting. Fi-
nally, we expound on our proposal, which is termed “Stable
and Efficient Adversarial Training” or SEAT.
3.1. Revisiting Single-step Adversarial Training
Recall the adversarial training, where adversarial pertur-
bation δcan be generated by:
δProjδ· ∇δ`(x+δ)(1)
where Proj(x) = arg minξ∈B(r)kxξkp,ris small value
radius, x D is the sampled data that with ground truth
label y,`(·)is the loss function, is an arbitrary perturbation
with sufficiently small size, and δ`(·)denotes the gradient
of the loss w.r.t. perturbation δ. While the loss function
is highly non-linear, we can approximate it by making a
reasonable assumption that the loss is once-differentiable.
So for the adversarial loss `(x+δ), according to the first-
order Taylor expansion, it leads to:
`(x+δ) = `(x) + hδ, x`(x)i+ω(δ),(2)
where x+δ∈ Br(x)denotes a small neighborhood of the
x. Suppose that given a metric space M= (X, d), there
exists a radius r > 0satisfying Br(x) = {x+δ∈ X :
d(x, x +δ)< r}. Besides, ω(δ)denotes the higher order
terms in δ, which tends to be zero in the limit of a small
perturbation.
As we focus on single-step attacks in this work, we
choose δ=·sgn(x`(x)) where is a small value and
sgn(x`(x)) is a vector. Now Eq. (2) would be,
`(x+δ) = `(x)+(x`(x)T)·sgn(x`(x)) + ω(δ)
=`(x) + · k∇x`(x)k1+ω(δ)
As the adversarial perturbation δis carefully crafted to be
imperceptible, it naturally satisfies that ω(δ)is negligible.
Thus, we have:
`(x+δ)`(x) = · k∇x`(x)k1(3)
As described in the previous section, catastrophic overfit-
ting has been observed during the training phase of models
using single-step defenses. This indicates that the maxi-
mum magnitude is no longer the strongest step size in the
direction of perturbation δ. Subsequently, the maximization
cannot be satisfied when catastrophic overfitting occurs as
the loss surface is highly curved. Eq. (3) reveals that a fix
step of perturbation, i.e.,·k∇x`(x)k1is probably the prin-
ciple reason for catastrophic overfitting.
3.2. Justifications for Local Linearization
To resolve the issue mentioned above, we suggest a reg-
ularizer to prevent catastrophic overfitting by encouraging
local linearity, coupled with a scaled step size.
As described before, suppose that we are given a loss
function `(·)that is once-differentiable, the loss at the point
x+δ∈ Br(x),i.e., the adversarial loss `(x+δ), can be well
approximated by its first-order Taylor expansion at the point
xas `(x) + hδ, x`(x)i. In other words, we can measure
how linear the surface is within a neighborhood by comput-
ing the absolute difference between these two values as
ξ(θ, x) = |`(x+δ)`(x)− hδ, x`(x)i|,(4)
where the ξ(θ, x)defines linearity approximation error, and
the perturbation we focus on is generated by non-iterative
adversary as δ=·sgn(x`(x)).
Next, we provide theoretical analysis on how the linear-
ity approximation error ξ(θ, x)correlates with the catas-
trophic overfitting, thereby deriving a regularizer for stabi-
lizing the training phase.
Property 1. Consider a loss function `(·)that is once-
differentiable, and a local neighbourhood B(x)that of ra-
dius and centered at x. For any x+δ∈ B(x), we have
|`(x+δ)`(x)− hδ, x`(x)i|
(+1
n· kδk2)· k∇x`(x)k1.(5)
Proof. We start from the local Taylor expansion Eq. (2). It
is clear that when xx+δ, the higher order term ω(δ)
becomes negligible. Mostly, we instead use its equivalent,
i.e.,|`(x+δ)`(x)− hδ, x`(x)i| as a measure of how
linear the surface is within a neighbourhood. Apparently,
this is precisely the regularizer we defined in Eq. (4).
|`(x+δ)`(x)− hδ, x`(x)i|
≤|`(x+δ)`(x)|+|hδ, x`(x)i|
=· k∇`(x)k1+|hδ, x`(x)i|
· k∇`(x)k1+kδk2· k∇x`(x)k2
(+1
n· kδk2)· k∇`(x)k1
(6)
Remark 1. Upon Eq. (5)it is clear that by bounding the
linearity approximation error, one can implicitly introduces
a scaling parameter kδk2/nfor perturbation generation
instead of a fixed magnitude ·k∇x`(x)k1of FGSM attack,
which is the main cause of catastrophic overfitting.
Property 2. Consider a loss function `(·)that is locally Lip-
schitz continuous. For any point in B(x), linearity approx-
imation error ξ(θ, x)gives the following inequality:
K < sup
x+δ∈B(x)
1
kδk2·|`(x+δ)`(x)− hδ, x`(x)i|
+|hδ, x`(x)i|
Details of the proof are provided in Appendix B.
Remark 2. The linearity approximation error ξ(θ, x)com-
poses the expression on the RHS. Hence, minimizing ξ(θ, x)
is expected to induce a smaller locally Lipschitz constant K,
thereby encouraging the optimization procedure to find a
model that is locally Lipschitz continuous.
Based on the above favorable bound (as Property 1) and
local Lipschitz (as Property 2) continuity, we derive our
proposed regularizer from the linearity approximation error
J(θ) = λ· | max
δ∈B(x)`(x+δ)`(x)δTx`(x)|.(7)
where λRis a hyperparameter specifying the strength for
imposing local linearization. We set λ= 0.5based on the
experiment presented in Tab. 3, and also for a well balance.
3.3. Training Objective
The proposed method incorporates the linearization reg-
ularizer to enhance optimization in both attack generation
and defense training. In particular, we modify the training
scheme of vanilla multi-step defense and propose our train-
ing objective of SEAT as follows:
min
θ
E(x,y)e
Ladvfθ(x+δ), y+J(θ)(8)
δ= arg max
x+δ∈B(x)Ladvfθ(x+δ), y+J(θ)(9)
where Ladv is the standard loss, e.g., the cross-entropy loss
or maximum margin loss, J(θ)is the proposed lineariza-
tion regularizer and fθ(·)represents the neural network with
parameters θ. We will defer the definition of e
Ladv later.
Pseudo-code for our algorithm is given in Algorithm 1.
Algorithm 1 Stable and Efficient Adversarial Training
Require: Total Epoch N, Neural Network fθwith param-
eters θ, Training Set D={(xj, yj)}, Adversarial Pertur-
bation Radius , Step Size α, Flooding level b.
for all epoch = 1,··· , N do
for all (xj, yj)∈ D do
* Inner Maximization to update δ
δ=Uniform (, )
L=Ladv(xj+δ, yj) + J(θ)
=fyj
θ(xj+δ) + fj
θ(xj+δ) + J(θ)
δδ+α·sgn(δL)
* Outer Minimization to update θ
e
L=e
Ladv(xj+δ, yj) + J(θ)
=|Lce(xj+δ, yj)b|+b+J(θ)
θθη·(θe
L)
end for
end for
3.3.1 Outer Minimization
Upon previous empirical observation and theoretical analy-
sis, we assume that catastrophic overfitting is probably cor-
related with the deterioration of local linearity. To resolve
this problem, we motivate SEAT through local linearization.
The proposed training scheme caters to the dual objective of
minimizing the classification loss on adversarial examples,
while also explicitly minimizing the violation of linearity
assumption to assure the validity of FGSM solution. Our
preliminary is to use the cross-entropy loss and further in-
troduce the proposed regularizer derived from the linearity
approximation error.
As is observed that training models excessively towards
adversarial robustness may hurt the generalization [39], this
work takes one more step towards mitigating overfitting.
Based on the understand that flat minima tend to yield bet-
ter generalization, we leverage the recently proposed regu-
larization Flooding [20], which forces the training loss to
stay above a reasonably small values rather than approach-
ing zero to avoid overfitting. To the best of our knowledge,
we are the first ever to introduce the Flooding into single-
step adversarial training, motivating to produce models that
can be better generalized to multi-step optimized attacks,
thereby mitigating the catastrophic overfitting of the same.
The implementation of Flooding is surprisingly simple
as e
R(θ) = |R(θ)b|+b, where R(θ)and e
R(θ)respectively
denotes the original training objective and the flooded train-
ing objective, and b > 0is the flood level [20]. Thus, we
are inspired to use the flooded version instead of the basic
adversarial training loss.
Building on it, we replace the adversarial loss Ladv with
e
Ladv. Now, we arrive at our implementation of the outer
minimization for our single-step defense SAET, as Eq. (8).
3.3.2 Inner Maximization
On the attackers’ side, they are trying to find the perturba-
tion in which not only the loss on adversarial samples is
maximized, but also the linearity assumption in the vicin-
ity of each data point is maximally violated. Further, we
use the maximum margin loss for the implementation of
Ladv, as it is known to be beneficial especially for single-
step AT, which heavily relies on the initial gradient direc-
tion [17,36]. The maximum margin loss can be given by
fy
θ(ex) + max fj
θ(ex), where exdenotes the adversarial im-
age, fy
θ(ex)is the score on exwith ground truth yand j6=y.
As the ablation study in Appendix Fshown, the maximum
margin loss, coupled with our proposed linearization reg-
ularizer, improves the attack efficacy and thereby yielding
models that are significantly more robust.
4. Experiments and Analysis
We conduct comprehensive evaluations to verify the ef-
fectiveness of our proposed SEAT. We first present the
benchmarking robustness in white-box and black-box set-
tings, followed by extensive quantitative and qualitative
摘要:

StableandEfcientAdversarialTrainingthroughLocalLinearizationZhuorongLi,DaweiYuZhejiangUniversityCityCollegeHangzhou,Chinalizr@zucc.edu.cn,ydw.ccm@gmail.comAbstractTherehasbeenarecentsurgeinsingle-stepadversar-ialtrainingasitshowsrobustnessandefciency.How-ever,aphenomenonreferredtoas“catastrophicov...

展开>> 收起<<
Stable and Efficient Adversarial Training through Local Linearization Zhuorong Li Dawei Yu Zhejiang University City College.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:974.97KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注