
training. As a motivating example illustrated in Figure 1(a), a
DNN publicly released by researchers or a company, trained
adversarially at a large computational expense, could be sub-
sequently pruned by other researchers to be made executable
on a resource-constrained device, like an FPGA. Using our
method, the latter could be done efficiently, without access to
the computational resources required for adversarial pruning.
Restricting pruning to access only natural examples poses a
significant challenge. As shown in Figure 1(b), na¨
ıvely pruning
a model without adversarial examples can be catastrophic,
obliterating all robustness against AutoAttack. In contrast, our
PwoA is notably robust under a broad range of pruning rates.
Overall, we make the following contributions:
1) We propose PwoA, an end-to-end framework for pruning
a pre-trained adversarially robust model without gener-
ating adversarial examples, by (a) preserving robustness
from the original model via self-distillation [25]–[27]
and (b) enhancing robustness from natural examples
via Hilbert-Schmidt independence criterion (HSIC) as
a regularizer [28], [29].
2) Our work is the first to study how an adversarially pre-
trained model can be efficiently pruned without access
to adversarial examples. This is an important, novel
challenge: prior to our study, it was unclear whether this
was even possible. Our approach is generic, and is nei-
ther tailored nor restricted to specific pre-trained robust
models, architectures, or adversarial training methods.
3) We comprehensively evaluate PwoA on pre-trained ad-
versarially robust models publicly released by other
researchers. In particular, we prune five publicly avail-
able models that were pre-trained with state-of-the-art
(SOTA) adversarial methods on the MNIST, CIFAR-
10, and CIFAR-100 datasets. Compared to SOTA adver-
sarial pruning methods, PwoA can prune a large frac-
tion of weights while attaining comparable–or better–
adversarial robustness, at a 4×–7×training speed up.
The remainder of this paper is structured as follows. We
review related work in Section II. In Section III, we discuss
standard adversarial robustness, knowledge distillation, and
HSIC. In Section V, we present our method. Section VI
includes our experiments; we conclude in Section VII.
II. RELATED WORK
Adversarial Robustness. Popular adversarial attack methods
include projected gradient descent (PGD) [3], fast gradient
sign method (FGSM) [2], CW attack [4], and AutoAttack
(AA) [5]; see also [30] for a comprehensive review. Adver-
sarially robust models are typically obtained via adversarial
training [31], by augmenting the training set with adversarial
examples, generated by the aforementioned adversarial attacks.
Madry et al. [3] generate adversarial examples via PGD.
TRADES [11] and MART [12] extend adversarial training
by incorporating additional penalty terms. LBGAT [32] guide
adversarial training by a natural classifier boundary to improve
robustness. However, generating adversarial examples is com-
putationally expensive and time consuming.
Several recent works observe that information-bottleneck
penalties enhance robustness. Fischer [33] considers a con-
ditional entropy bottleneck (CEB), while Alemi et al. [34]
suggest a variational information bottleneck (VIB); both lead
to improved robustness properties. Ma et al. [28] and Wang
et al. [29] use a penalty based on the Hilbert Schmidt In-
dependence Criterion (HSIC), termed HSIC bottleneck as a
regularizer (HBaR). Wang et al. show that HBaR enhances
adversarial robustness even without generating adversarial
examples [29]. For this reason, we incorporate HBaR into our
unified robust pruning framework as a means of exploiting
adversarial robustness merely from natural examples during
the pruning process, without further adversarial training. We
are the first to study HBaR under a pruning context; our abla-
tion study (Section VI-B) indicates HBaR indeed contributes
to enhancing robustness in our setting.
Adversarial Pruning. Weight pruning is one of the prominent
compression techniques to reduce model size with acceptable
accuracy degradation. While extensively explored for effi-
ciency and compression purposes [15]–[20], only a few recent
works study pruning in the context of adversarial robustness.
Several works [35], [36] theoretically discuss the relationship
between adversarial robustness and pruning, but do not provide
any active defense techniques. Ye et al. [22] and Gui et al. [23]
propose AdvPrune to combine the alternating direction method
of multipliers (ADMM) pruning framework with adversarial
training. Lee et al. [37] propose APD to use knowledge distil-
lation for adversarial pruning optimized by a proximal gradient
method. Sehwag et al. [24] propose HYDRA, which uses a
robust training objective to learn a sparsity mask. However,
all these methods rely on adversarial training. HYDRA further
requires training additional sparsity masks, which hampers
training efficiency. In contrast, we distill from a pre-trained
adversarially robust model while pruning without generating
adversarial examples. Our compressed model can preserve
high adversarial robustness with considerable training speedup
compared to these methods, as we report in Section VI-C.
III. BACKGROUND
We use the following standard notation throughout the
paper. In the standard k-ary classification setting, we are given
a dataset D={(xi, yi)}n
i=1, where xi∈RdX, yi∈ {0,1}k
are i.i.d. samples drawn from joint distribution PXY . Given
an L-layer neural network hθ:RdX→Rkparameterized
by weights θ:= {θl}L
l=1 ∈Rdθl, where θlis the weight
corresponding to the l-th layer, for l= 1, . . . , L, we define
the standard learning objective as follows:
L(θ) = EXY [`(hθ(X), Y )] ≈1
n
n
X
i=1
`(hθ(xi), yi),(1)
where `:Rk×Rk→Ris a loss function, e.g., cross-entropy.
A. Adversarial Robustness
We call a network adversarially robust if it maintains
high prediction accuracy against a constrained adversary that
perturbs input samples. Formally, prior to submitting an input