Pruning Adversarially Robust Neural Networks without Adversarial Examples Tong Jian1y Zifeng Wang1y Yanzhi Wang2 Jennifer Dy1 Stratis Ioannidis1

2025-05-02 0 0 2.36MB 11 页 10玖币
侵权投诉
Pruning Adversarially Robust Neural Networks
without Adversarial Examples
Tong Jian1,, Zifeng Wang1,, Yanzhi Wang2, Jennifer Dy1, Stratis Ioannidis1
Department of Electrical and Computer Engineering
Northeastern University
1{jian, zifengwang, jdy, ioannidis}@ece.neu.edu
2yanz.wang@northeastern.edu
Abstract—Adversarial pruning compresses models while pre-
serving robustness. Current methods require access to adversarial
examples during pruning. This significantly hampers training
efficiency. Moreover, as new adversarial attacks and training
methods develop at a rapid rate, adversarial pruning methods
need to be modified accordingly to keep up. In this work, we
propose a novel framework to prune a previously trained robust
neural network while maintaining adversarial robustness, without
further generating adversarial examples. We leverage concurrent
self-distillation and pruning to preserve knowledge in the original
model as well as regularizing the pruned model via the Hilbert-
Schmidt Information Bottleneck. We comprehensively evaluate
our proposed framework and show its superior performance in
terms of both adversarial robustness and efficiency when pruning
architectures trained on the MNIST, CIFAR-10, and CIFAR-100
datasets against five state-of-the-art attacks. Code is available at
https://github.com/neu-spiral/PwoA/.
Index Terms—Adversarial Robustness, Adversarial Pruning,
Self-distillation, HSIC Bottleneck
I. INTRODUCTION
The vulnerability of deep neural networks (DNNs) to ad-
versarial attacks has been the subject of extensive research
recently [1]–[5]. Such attacks are intentionally crafted to
mislead DNNs towards incorrect predictions, e.g., by adding
delicately but visually imperceptible perturbations to original,
natural examples [6]. Adversarial robustness, i.e., the ability
of a trained model to maintain its predictive power despite
such attacks, is an important property for many safety-critical
applications [7]–[9]. The most common and effective way
to attain adversarial robustness is via adversarial training
[10]–[12], i.e., training a model over adversarially generated
examples. Adversarial training has shown reliable robustness
performance against improved attack techniques such as pro-
jected gradient descent (PGD) [3], the Carlini & Wagner attack
(CW) [4] and AutoAttack (AA) [5]. Nevertheless, adversarial
training is computationally expensive [3], [13], usually 3×
30×[14] longer than natural training, precisely due to the
additional cost of generating adversarial examples.
As noted by Madry et al. [3], achieving adversarial robust-
ness requires a significantly wider and larger architecture than
that for natural accuracy. The large network capacity required
by adversarial training may limit its deployment on resource-
constrained hardware or real-time applications. Weight prun-
Both authors contributed equally to this work.
adversarial
example
natural
example
adversarially
trained model
Researcher A Researcher B
public
release
pruned
model
(a) Motivation of our PwoA framework (b) Naïve Prune
vs. PwoA
natural
example
Tra in Prune
Fig. 1: (a) A DNN publicly released by researcher A, trained adver-
sarially at a large computational expense, is pruned by Researcher B
and made executable on a resource-constrained device. Using PwoA,
pruning by B is efficient, requiring only access to natural examples.
(b) Taking a pre-trained WRN34-10 pruned on CIFAR-100 as an
example, pruning an adversarially robust model in a na¨
ıve fashion,
without generating any adversarial examples, completely obliterates
robustness against AutoAttack [5] even under a 2×pruning ratio.
In contrast, our proposed PwoA framework efficiently preserves
robustness for a broad range of pruning ratios, without any access
to adversarially generated examples. To achieve similar robustness,
SOTA adversarial pruning methods require 4×7×more training
time (see Figure 3 in Section VI-C).
ing is a prominent compression technique to reduce model
size without notable accuracy degradation [15]–[21]. While
researchers have extensively explored weight pruning, only
a few recent works have studied it jointly with adversarial
robustness. Ye et al. [22], Gui et al. [23], and Sehwag et
al. [24] apply active defense techniques with pruning in their
research. However, these works require access to adversarial
examples during pruning. Pruning is itself a laborious process,
as effective pruning techniques simultaneously finetune an
existing, pre-trained network; incorporating adversarial exam-
ples to this process significantly hampers training efficiency.
Moreover, adversarial pruning techniques tailored to specific
adversarial training methods need to be continually revised as
new methods develop apace.
In this paper, we study how take a dense, adversarially
robust DNN, that has already been trained over adversarial
examples, and prune it without any additional adversarial
arXiv:2210.04311v1 [cs.LG] 9 Oct 2022
training. As a motivating example illustrated in Figure 1(a), a
DNN publicly released by researchers or a company, trained
adversarially at a large computational expense, could be sub-
sequently pruned by other researchers to be made executable
on a resource-constrained device, like an FPGA. Using our
method, the latter could be done efficiently, without access to
the computational resources required for adversarial pruning.
Restricting pruning to access only natural examples poses a
significant challenge. As shown in Figure 1(b), na¨
ıvely pruning
a model without adversarial examples can be catastrophic,
obliterating all robustness against AutoAttack. In contrast, our
PwoA is notably robust under a broad range of pruning rates.
Overall, we make the following contributions:
1) We propose PwoA, an end-to-end framework for pruning
a pre-trained adversarially robust model without gener-
ating adversarial examples, by (a) preserving robustness
from the original model via self-distillation [25]–[27]
and (b) enhancing robustness from natural examples
via Hilbert-Schmidt independence criterion (HSIC) as
a regularizer [28], [29].
2) Our work is the first to study how an adversarially pre-
trained model can be efficiently pruned without access
to adversarial examples. This is an important, novel
challenge: prior to our study, it was unclear whether this
was even possible. Our approach is generic, and is nei-
ther tailored nor restricted to specific pre-trained robust
models, architectures, or adversarial training methods.
3) We comprehensively evaluate PwoA on pre-trained ad-
versarially robust models publicly released by other
researchers. In particular, we prune five publicly avail-
able models that were pre-trained with state-of-the-art
(SOTA) adversarial methods on the MNIST, CIFAR-
10, and CIFAR-100 datasets. Compared to SOTA adver-
sarial pruning methods, PwoA can prune a large frac-
tion of weights while attaining comparable–or better–
adversarial robustness, at a 4×–7×training speed up.
The remainder of this paper is structured as follows. We
review related work in Section II. In Section III, we discuss
standard adversarial robustness, knowledge distillation, and
HSIC. In Section V, we present our method. Section VI
includes our experiments; we conclude in Section VII.
II. RELATED WORK
Adversarial Robustness. Popular adversarial attack methods
include projected gradient descent (PGD) [3], fast gradient
sign method (FGSM) [2], CW attack [4], and AutoAttack
(AA) [5]; see also [30] for a comprehensive review. Adver-
sarially robust models are typically obtained via adversarial
training [31], by augmenting the training set with adversarial
examples, generated by the aforementioned adversarial attacks.
Madry et al. [3] generate adversarial examples via PGD.
TRADES [11] and MART [12] extend adversarial training
by incorporating additional penalty terms. LBGAT [32] guide
adversarial training by a natural classifier boundary to improve
robustness. However, generating adversarial examples is com-
putationally expensive and time consuming.
Several recent works observe that information-bottleneck
penalties enhance robustness. Fischer [33] considers a con-
ditional entropy bottleneck (CEB), while Alemi et al. [34]
suggest a variational information bottleneck (VIB); both lead
to improved robustness properties. Ma et al. [28] and Wang
et al. [29] use a penalty based on the Hilbert Schmidt In-
dependence Criterion (HSIC), termed HSIC bottleneck as a
regularizer (HBaR). Wang et al. show that HBaR enhances
adversarial robustness even without generating adversarial
examples [29]. For this reason, we incorporate HBaR into our
unified robust pruning framework as a means of exploiting
adversarial robustness merely from natural examples during
the pruning process, without further adversarial training. We
are the first to study HBaR under a pruning context; our abla-
tion study (Section VI-B) indicates HBaR indeed contributes
to enhancing robustness in our setting.
Adversarial Pruning. Weight pruning is one of the prominent
compression techniques to reduce model size with acceptable
accuracy degradation. While extensively explored for effi-
ciency and compression purposes [15]–[20], only a few recent
works study pruning in the context of adversarial robustness.
Several works [35], [36] theoretically discuss the relationship
between adversarial robustness and pruning, but do not provide
any active defense techniques. Ye et al. [22] and Gui et al. [23]
propose AdvPrune to combine the alternating direction method
of multipliers (ADMM) pruning framework with adversarial
training. Lee et al. [37] propose APD to use knowledge distil-
lation for adversarial pruning optimized by a proximal gradient
method. Sehwag et al. [24] propose HYDRA, which uses a
robust training objective to learn a sparsity mask. However,
all these methods rely on adversarial training. HYDRA further
requires training additional sparsity masks, which hampers
training efficiency. In contrast, we distill from a pre-trained
adversarially robust model while pruning without generating
adversarial examples. Our compressed model can preserve
high adversarial robustness with considerable training speedup
compared to these methods, as we report in Section VI-C.
III. BACKGROUND
We use the following standard notation throughout the
paper. In the standard k-ary classification setting, we are given
a dataset D={(xi, yi)}n
i=1, where xiRdX, yi∈ {0,1}k
are i.i.d. samples drawn from joint distribution PXY . Given
an L-layer neural network hθ:RdXRkparameterized
by weights θ:= {θl}L
l=1 Rdθl, where θlis the weight
corresponding to the l-th layer, for l= 1, . . . , L, we define
the standard learning objective as follows:
L(θ) = EXY [`(hθ(X), Y )] 1
n
n
X
i=1
`(hθ(xi), yi),(1)
where `:Rk×RkRis a loss function, e.g., cross-entropy.
A. Adversarial Robustness
We call a network adversarially robust if it maintains
high prediction accuracy against a constrained adversary that
perturbs input samples. Formally, prior to submitting an input
sample xRdX, an adversary may perturb xby an arbitrary
δ∈ Br, where BrRdXis the `-ball of radius r, i.e.,
Br=B(0, r) = {δRdX:kδkr}.(2)
The adversarial robustness [3] of a model hθis measured by
the expected loss attained by such adversarial examples, i.e.,
˜
L(θ) = EXY max
δ∈Br
`(hθ(X+δ), Y )
1
n
n
X
i=1
max
δ∈Br
`(hθ(xi+δ), yi).
(3)
An adversarially robust neural network hθcan be obtained
via adversarial training, i.e., by minimizing the adversarial
robustness loss in (3) empirically over the training set D. In
practice, this amounts to stochastic gradient descent (SGD)
over adversarial examples xi+δ(see, e.g., [3]). In each epoch,
δis generated on a per sample basis via an inner optimization
over Br, e.g., via projected gradient descent (PGD).
Adversarial pruning preserves robustness while pruning.
Current approaches combine adversarial training into their
pruning objective. In particular, AdvPrune [22] directly min-
imizes adversarial loss ˜
L(θ)constrained by sparsity require-
ments. HYDRA [24] also uses ˜
L(θ)to jointly learn a sparsity
mask along with θl. Both are combined with and tailored to
specific adversarial training methods, and require considerable
training time. This motivates us to propose our PwoA frame-
work, described in Section V.
B. Knowledge Distillation
In knowledge distillation [25], [38], a student model learns
to mimic the output of a teacher. Consider a well-trained
teacher model T, and a student model hθthat we wish to
train to match the teacher’s output. Let σ:Rk[0,1]kbe
the softmax function, i.e., σ(z)j=ezj
Pj0ezj0,j= 1, . . . , k. Let
Tτ(x) = σT(x)
τand hτ
θ(x) = σhθ(x)
τ(4)
be the softmax outputs of the two models weighed by temper-
ature parameter τ > 0[25]. Then, the knowledge distillation
penalty used to train θis:
LKD(θ)=(1λ)L(θ)+λτ 2EX[KL(hτ
θ(X), T τ(X))],(5)
where Lis the classification loss of the tempered student
network hτ
θand KL is the Kullback–Leibler (KL) divergence.
Intuitively, the knowledge distillation loss LKD treats the output
of the teacher as soft labels to train the student, so that the
student exhibits some inherent properties of the teacher, such
as adversarial robustness.
C. Hilbert-Schmidt Independence Criterion
The Hilbert-Schmidt Independence Criterion (HSIC) is
a statistical dependency measure introduced by Gretton et
al. [39]. HSIC is the Hilbert-Schmidt norm of the cross-
covariance operator between the distributions in Reproducing
Kernel Hilbert Space (RKHS). Similar to Mutual Information
(MI), HSIC captures non-linear dependencies between random
variables. HSIC is defined as:
HSIC(X, Y ) = EXY X0Y0[kX(X, X0)kY0(Y, Y 0)]
+EXX0[kX(X, X0)] EY Y 0[kY(Y, Y 0)]
2EXY [EX0[kX(X, X0)] EY0[kY(Y, Y 0)]] ,
(6)
where X0and Y0are independent copies of Xand Y
respectively, and kXand kYare kernel functions. In practice,
we often approximate HSIC empirically. Given ni.i.d. samples
{(xi, yi)}n
i=1 drawn from PXY , we estimate HSIC via:
\
HSIC(X, Y )=(n1)2tr (KXHKYH),(7)
where KXand KYare kernel matrices with entries KXij =
kX(xi, xj)and KYij =kY(yi, yj), respectively, and H=
I1
n11Tis a centering matrix.
IV. PROBLEM FORMULATION
Given an adversarially robust model hθ, we wish to ef-
ficiently prune non-important weights from this pre-trained
model while preserving adversarial robustness of the final
pruned model. We minimize the loss function subject to
constraints specifying sparsity requirements. More specifically,
the weight pruning problem can be formulated as:
Minimize:
θL(θ),
subject to θlSl, l = 1,··· , L, (8)
where L(θ)is the loss function optimizing both the accuracy
and the robustness, and SlRdθlis a weight sparsity
constraint set applied to layer l, defined as
Sl={θl| kθlk0αl},(9)
where k·k0is the size of θls support (i.e., the number of non-
zero elements), and αlNis a constant specified as sparsity
degree parameter.
V. METHODOLOGY
We now describe PwoA, our unified framework for pruning
a robust network without additional adversarial training.
A. Robustness-Preserving Pruning
Given an adversarially pre-trained robust model, we aim to
preserve its robustness while sparsifying it via weight pruning.
In particular, we leverage soft labels generated by the robust
model and directly incorporate them into our pruning objective
with only access to natural examples. Formally, we denote the
well pre-trained model by Tand its sparse counterpart by hθ.
The optimization objective is defined as follows:
Min.:
θLD(θ) = τ2EX[KL(hτ
θ(X), T τ(X))],
subj. to θlSl, l = 1,··· , L, (10)
where τis the temperature hyperparameter. Intuitively, our
distillation-based objective forces the sparse model hθto
mimic the soft label produced by the original pre-trained
model T, while the constraint enforces that the learnt weights
are subject to the desired sparsity. This way, we preserve
摘要:

PruningAdversariallyRobustNeuralNetworkswithoutAdversarialExamplesTongJian1,y,ZifengWang1,y,YanzhiWang2,JenniferDy1,StratisIoannidis1DepartmentofElectricalandComputerEngineeringNortheasternUniversity1fjian,zifengwang,jdy,ioannidisg@ece.neu.edu2yanz.wang@northeastern.eduAbstract—Adversarialpruningcom...

展开>> 收起<<
Pruning Adversarially Robust Neural Networks without Adversarial Examples Tong Jian1y Zifeng Wang1y Yanzhi Wang2 Jennifer Dy1 Stratis Ioannidis1.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.36MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注