Pruning Adversarially Robust Neural Networks without Adversarial Examples Tong Jian1y Zifeng Wang1y Yanzhi Wang2 Jennifer Dy1 Stratis Ioannidis1

2025-05-02 0 0 2.36MB 11 页 10玖币

侵权投诉

Pruning Adversarially Robust Neural Networks

without Adversarial Examples

Tong Jian1,†, Zifeng Wang1,†, Yanzhi Wang2, Jennifer Dy1, Stratis Ioannidis1

Department of Electrical and Computer Engineering

Northeastern University

1{jian, zifengwang, jdy, ioannidis}@ece.neu.edu

2yanz.wang@northeastern.edu

Abstract—Adversarial pruning compresses models while pre-

serving robustness. Current methods require access to adversarial

examples during pruning. This signiﬁcantly hampers training

efﬁciency. Moreover, as new adversarial attacks and training

methods develop at a rapid rate, adversarial pruning methods

need to be modiﬁed accordingly to keep up. In this work, we

propose a novel framework to prune a previously trained robust

neural network while maintaining adversarial robustness, without

further generating adversarial examples. We leverage concurrent

self-distillation and pruning to preserve knowledge in the original

model as well as regularizing the pruned model via the Hilbert-

Schmidt Information Bottleneck. We comprehensively evaluate

our proposed framework and show its superior performance in

terms of both adversarial robustness and efﬁciency when pruning

architectures trained on the MNIST, CIFAR-10, and CIFAR-100

datasets against ﬁve state-of-the-art attacks. Code is available at

https://github.com/neu-spiral/PwoA/.

Index Terms—Adversarial Robustness, Adversarial Pruning,

Self-distillation, HSIC Bottleneck

I. INTRODUCTION

The vulnerability of deep neural networks (DNNs) to ad-

versarial attacks has been the subject of extensive research

recently [1]–[5]. Such attacks are intentionally crafted to

mislead DNNs towards incorrect predictions, e.g., by adding

delicately but visually imperceptible perturbations to original,

natural examples [6]. Adversarial robustness, i.e., the ability

of a trained model to maintain its predictive power despite

such attacks, is an important property for many safety-critical

applications [7]–[9]. The most common and effective way

to attain adversarial robustness is via adversarial training

[10]–[12], i.e., training a model over adversarially generated

examples. Adversarial training has shown reliable robustness

performance against improved attack techniques such as pro-

jected gradient descent (PGD) [3], the Carlini & Wagner attack

(CW) [4] and AutoAttack (AA) [5]. Nevertheless, adversarial

training is computationally expensive [3], [13], usually 3×–

30×[14] longer than natural training, precisely due to the

additional cost of generating adversarial examples.

As noted by Madry et al. [3], achieving adversarial robust-

ness requires a signiﬁcantly wider and larger architecture than

that for natural accuracy. The large network capacity required

by adversarial training may limit its deployment on resource-

constrained hardware or real-time applications. Weight prun-

†Both authors contributed equally to this work.

adversarial

example

natural

example

adversarially

trained model

Researcher A Researcher B

public

release

pruned

model

(a) Motivation of our PwoA framework (b) Naïve Prune

vs. PwoA

natural

example

Tra in Prune

Fig. 1: (a) A DNN publicly released by researcher A, trained adver-

sarially at a large computational expense, is pruned by Researcher B

and made executable on a resource-constrained device. Using PwoA,

pruning by B is efﬁcient, requiring only access to natural examples.

(b) Taking a pre-trained WRN34-10 pruned on CIFAR-100 as an

example, pruning an adversarially robust model in a na¨

ıve fashion,

without generating any adversarial examples, completely obliterates

robustness against AutoAttack [5] even under a 2×pruning ratio.

In contrast, our proposed PwoA framework efﬁciently preserves

robustness for a broad range of pruning ratios, without any access

to adversarially generated examples. To achieve similar robustness,

SOTA adversarial pruning methods require 4×–7×more training

time (see Figure 3 in Section VI-C).

ing is a prominent compression technique to reduce model

size without notable accuracy degradation [15]–[21]. While

researchers have extensively explored weight pruning, only

a few recent works have studied it jointly with adversarial

robustness. Ye et al. [22], Gui et al. [23], and Sehwag et

al. [24] apply active defense techniques with pruning in their

research. However, these works require access to adversarial

examples during pruning. Pruning is itself a laborious process,

as effective pruning techniques simultaneously ﬁnetune an

existing, pre-trained network; incorporating adversarial exam-

ples to this process signiﬁcantly hampers training efﬁciency.

Moreover, adversarial pruning techniques tailored to speciﬁc

adversarial training methods need to be continually revised as

new methods develop apace.

In this paper, we study how take a dense, adversarially

robust DNN, that has already been trained over adversarial

examples, and prune it without any additional adversarial

arXiv:2210.04311v1 [cs.LG] 9 Oct 2022

training. As a motivating example illustrated in Figure 1(a), a

DNN publicly released by researchers or a company, trained

adversarially at a large computational expense, could be sub-

sequently pruned by other researchers to be made executable

on a resource-constrained device, like an FPGA. Using our

method, the latter could be done efﬁciently, without access to

the computational resources required for adversarial pruning.

Restricting pruning to access only natural examples poses a

signiﬁcant challenge. As shown in Figure 1(b), na¨

ıvely pruning

a model without adversarial examples can be catastrophic,

obliterating all robustness against AutoAttack. In contrast, our

PwoA is notably robust under a broad range of pruning rates.

Overall, we make the following contributions:

1) We propose PwoA, an end-to-end framework for pruning

a pre-trained adversarially robust model without gener-

ating adversarial examples, by (a) preserving robustness

from the original model via self-distillation [25]–[27]

and (b) enhancing robustness from natural examples

via Hilbert-Schmidt independence criterion (HSIC) as

a regularizer [28], [29].

2) Our work is the ﬁrst to study how an adversarially pre-

trained model can be efﬁciently pruned without access

to adversarial examples. This is an important, novel

challenge: prior to our study, it was unclear whether this

was even possible. Our approach is generic, and is nei-

ther tailored nor restricted to speciﬁc pre-trained robust

models, architectures, or adversarial training methods.

3) We comprehensively evaluate PwoA on pre-trained ad-

versarially robust models publicly released by other

researchers. In particular, we prune ﬁve publicly avail-

able models that were pre-trained with state-of-the-art

(SOTA) adversarial methods on the MNIST, CIFAR-

10, and CIFAR-100 datasets. Compared to SOTA adver-

sarial pruning methods, PwoA can prune a large frac-

tion of weights while attaining comparable–or better–

adversarial robustness, at a 4×–7×training speed up.

The remainder of this paper is structured as follows. We

review related work in Section II. In Section III, we discuss

standard adversarial robustness, knowledge distillation, and

HSIC. In Section V, we present our method. Section VI

includes our experiments; we conclude in Section VII.

II. RELATED WORK

Adversarial Robustness. Popular adversarial attack methods

include projected gradient descent (PGD) [3], fast gradient

sign method (FGSM) [2], CW attack [4], and AutoAttack

(AA) [5]; see also [30] for a comprehensive review. Adver-

sarially robust models are typically obtained via adversarial

training [31], by augmenting the training set with adversarial

examples, generated by the aforementioned adversarial attacks.

Madry et al. [3] generate adversarial examples via PGD.

TRADES [11] and MART [12] extend adversarial training

by incorporating additional penalty terms. LBGAT [32] guide

adversarial training by a natural classiﬁer boundary to improve

robustness. However, generating adversarial examples is com-

putationally expensive and time consuming.

Several recent works observe that information-bottleneck

penalties enhance robustness. Fischer [33] considers a con-

ditional entropy bottleneck (CEB), while Alemi et al. [34]

suggest a variational information bottleneck (VIB); both lead

to improved robustness properties. Ma et al. [28] and Wang

et al. [29] use a penalty based on the Hilbert Schmidt In-

dependence Criterion (HSIC), termed HSIC bottleneck as a

regularizer (HBaR). Wang et al. show that HBaR enhances

adversarial robustness even without generating adversarial

examples [29]. For this reason, we incorporate HBaR into our

uniﬁed robust pruning framework as a means of exploiting

adversarial robustness merely from natural examples during

the pruning process, without further adversarial training. We

are the ﬁrst to study HBaR under a pruning context; our abla-

tion study (Section VI-B) indicates HBaR indeed contributes

to enhancing robustness in our setting.

Adversarial Pruning. Weight pruning is one of the prominent

compression techniques to reduce model size with acceptable

accuracy degradation. While extensively explored for efﬁ-

ciency and compression purposes [15]–[20], only a few recent

works study pruning in the context of adversarial robustness.

Several works [35], [36] theoretically discuss the relationship

between adversarial robustness and pruning, but do not provide

any active defense techniques. Ye et al. [22] and Gui et al. [23]

propose AdvPrune to combine the alternating direction method

of multipliers (ADMM) pruning framework with adversarial

training. Lee et al. [37] propose APD to use knowledge distil-

lation for adversarial pruning optimized by a proximal gradient

method. Sehwag et al. [24] propose HYDRA, which uses a

robust training objective to learn a sparsity mask. However,

all these methods rely on adversarial training. HYDRA further

requires training additional sparsity masks, which hampers

training efﬁciency. In contrast, we distill from a pre-trained

adversarially robust model while pruning without generating

adversarial examples. Our compressed model can preserve

high adversarial robustness with considerable training speedup

compared to these methods, as we report in Section VI-C.

III. BACKGROUND

We use the following standard notation throughout the

paper. In the standard k-ary classiﬁcation setting, we are given

a dataset D={(xi, yi)}n

i=1, where xi∈RdX, yi∈ {0,1}k

are i.i.d. samples drawn from joint distribution PXY . Given

an L-layer neural network hθ:RdX→Rkparameterized

by weights θ:= {θl}L

l=1 ∈Rdθl, where θlis the weight

corresponding to the l-th layer, for l= 1, . . . , L, we deﬁne

the standard learning objective as follows:

L(θ) = EXY [`(hθ(X), Y )] ≈1

i=1

`(hθ(xi), yi),(1)

where `:Rk×Rk→Ris a loss function, e.g., cross-entropy.

A. Adversarial Robustness

We call a network adversarially robust if it maintains

high prediction accuracy against a constrained adversary that

perturbs input samples. Formally, prior to submitting an input

sample x∈RdX, an adversary may perturb xby an arbitrary

δ∈ Br, where Br⊆RdXis the `∞-ball of radius r, i.e.,

Br=B(0, r) = {δ∈RdX:kδk∞≤r}.(2)

The adversarial robustness [3] of a model hθis measured by

the expected loss attained by such adversarial examples, i.e.,

L(θ) = EXY max

δ∈Br

`(hθ(X+δ), Y )

≈1

i=1

max

δ∈Br

`(hθ(xi+δ), yi).

(3)

An adversarially robust neural network hθcan be obtained

via adversarial training, i.e., by minimizing the adversarial

robustness loss in (3) empirically over the training set D. In

practice, this amounts to stochastic gradient descent (SGD)

over adversarial examples xi+δ(see, e.g., [3]). In each epoch,

δis generated on a per sample basis via an inner optimization

over Br, e.g., via projected gradient descent (PGD).

Adversarial pruning preserves robustness while pruning.

Current approaches combine adversarial training into their

pruning objective. In particular, AdvPrune [22] directly min-

imizes adversarial loss ˜

L(θ)constrained by sparsity require-

ments. HYDRA [24] also uses ˜

L(θ)to jointly learn a sparsity

mask along with θl. Both are combined with and tailored to

speciﬁc adversarial training methods, and require considerable

training time. This motivates us to propose our PwoA frame-

work, described in Section V.

B. Knowledge Distillation

In knowledge distillation [25], [38], a student model learns

to mimic the output of a teacher. Consider a well-trained

teacher model T, and a student model hθthat we wish to

train to match the teacher’s output. Let σ:Rk→[0,1]kbe

the softmax function, i.e., σ(z)j=ezj

Pj0ezj0,j= 1, . . . , k. Let

Tτ(x) = σT(x)

τand hτ

θ(x) = σhθ(x)

τ(4)

be the softmax outputs of the two models weighed by temper-

ature parameter τ > 0[25]. Then, the knowledge distillation

penalty used to train θis:

LKD(θ)=(1−λ)L(θ)+λτ 2EX[KL(hτ

θ(X), T τ(X))],(5)

where Lis the classiﬁcation loss of the tempered student

network hτ

θand KL is the Kullback–Leibler (KL) divergence.

Intuitively, the knowledge distillation loss LKD treats the output

of the teacher as soft labels to train the student, so that the

student exhibits some inherent properties of the teacher, such

as adversarial robustness.

C. Hilbert-Schmidt Independence Criterion

The Hilbert-Schmidt Independence Criterion (HSIC) is

a statistical dependency measure introduced by Gretton et

al. [39]. HSIC is the Hilbert-Schmidt norm of the cross-

covariance operator between the distributions in Reproducing

Kernel Hilbert Space (RKHS). Similar to Mutual Information

(MI), HSIC captures non-linear dependencies between random

variables. HSIC is deﬁned as:

HSIC(X, Y ) = EXY X0Y0[kX(X, X0)kY0(Y, Y 0)]

+EXX0[kX(X, X0)] EY Y 0[kY(Y, Y 0)]

−2EXY [EX0[kX(X, X0)] EY0[kY(Y, Y 0)]] ,

(6)

where X0and Y0are independent copies of Xand Y

respectively, and kXand kYare kernel functions. In practice,

we often approximate HSIC empirically. Given ni.i.d. samples

{(xi, yi)}n

i=1 drawn from PXY , we estimate HSIC via:

HSIC(X, Y )=(n−1)−2tr (KXHKYH),(7)

where KXand KYare kernel matrices with entries KXij =

kX(xi, xj)and KYij =kY(yi, yj), respectively, and H=

I−1

n11Tis a centering matrix.

IV. PROBLEM FORMULATION

Given an adversarially robust model hθ, we wish to ef-

ﬁciently prune non-important weights from this pre-trained

model while preserving adversarial robustness of the ﬁnal

pruned model. We minimize the loss function subject to

constraints specifying sparsity requirements. More speciﬁcally,

the weight pruning problem can be formulated as:

Minimize:

θL(θ),

subject to θl∈Sl, l = 1,··· , L, (8)

where L(θ)is the loss function optimizing both the accuracy

and the robustness, and Sl⊆Rdθlis a weight sparsity

constraint set applied to layer l, deﬁned as

Sl={θl| kθlk0≤αl},(9)

where k·k0is the size of θl’s support (i.e., the number of non-

zero elements), and αl∈Nis a constant speciﬁed as sparsity

degree parameter.

V. METHODOLOGY

We now describe PwoA, our uniﬁed framework for pruning

a robust network without additional adversarial training.

A. Robustness-Preserving Pruning

Given an adversarially pre-trained robust model, we aim to

preserve its robustness while sparsifying it via weight pruning.

In particular, we leverage soft labels generated by the robust

model and directly incorporate them into our pruning objective

with only access to natural examples. Formally, we denote the

well pre-trained model by Tand its sparse counterpart by hθ.

The optimization objective is deﬁned as follows:

Min.:

θLD(θ) = τ2EX[KL(hτ

θ(X), T τ(X))],

subj. to θl∈Sl, l = 1,··· , L, (10)

where τis the temperature hyperparameter. Intuitively, our

distillation-based objective forces the sparse model hθto

mimic the soft label produced by the original pre-trained

model T, while the constraint enforces that the learnt weights

are subject to the desired sparsity. This way, we preserve

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PruningAdversariallyRobustNeuralNetworkswithoutAdversarialExamplesTongJian1,y,ZifengWang1,y,YanzhiWang2,JenniferDy1,StratisIoannidis1DepartmentofElectricalandComputerEngineeringNortheasternUniversity1fjian,zifengwang,jdy,ioannidisg@ece.neu.edu2yanz.wang@northeastern.eduAbstractAdversarialpruningcom...

展开>> 收起<<

Pruning Adversarially Robust Neural Networks without Adversarial Examples Tong Jian1y Zifeng Wang1y Yanzhi Wang2 Jennifer Dy1 Stratis Ioannidis1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Pruning Adversarially Robust Neural Networks without Adversarial Examples Tong Jian1y Zifeng Wang1y Yanzhi Wang2 Jennifer Dy1 Stratis Ioannidis1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: