Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter

2025-05-08 0 0 2.67MB 20 页 10玖币
侵权投诉
Improving Adversarial Robustness via Joint Classification and Multiple
Explicit Detection Classes
Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter
USC
baharlou@usc.edu
Amazon Alexa AI
shfateme@amazon.com
USC
razaviya@usc.edu
Bosch Center for AI, CMU
zkolter@cs.cmu.edu
Abstract
This work concerns the development of deep
networks that are certifiably robust to adversarial
attacks. Joint robust classification-detection
was recently introduced as a certified defense
mechanism, where adversarial examples are
either correctly classified
or
assigned to the
“abstain” class. In this work, we show that such a
provable framework can benefit by extension to
networks with
multiple
explicit abstain classes,
where the adversarial examples are adaptively
assigned to those. We show that naïvely adding
multiple abstain classes can lead to “model
degeneracy”, then we propose a regularization
approach and a training method to counter this
degeneracy by promoting full use of the multiple
abstain classes. Our experiments demonstrate
that the proposed approach consistently achieves
favorable standard vs. robust verified accu-
racy tradeoffs, outperforming state-of-the-art
algorithms for various choices of number of
abstain classes. Our code is available at
https:
//github.com/sinaBaharlouei/
MultipleAbstainDetection.
1 Introduction
Deep Neural Networks (DNNs) have revolutionized
many machine learning tasks such as image process-
ing (Krizhevsky et al., 2012; Zhu et al., 2021) and speech
recognition (Graves et al., 2013; Nassif et al., 2019). How-
ever, despite their superior performance, DNNs are highly
vulnerable to adversarial attacks and perform poorly on out-
of-distributions samples (Goodfellow et al., 2014; Liang
et al., 2017; Yuan et al., 2019). To address the vulnerability
1
The work of FS was done when FS was with the Bosch Center
for AI.
Proceedings of the 26
th
International Conference on Artificial Intel-
ligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR:
Volume 206. Copyright 2023 by the author(s).
of DNNs to adversarial attacks, the community has designed
various defense mechanisms against such attacks (Papernot
et al., 2016; Jang et al., 2019; Goldblum et al., 2020; Madry
et al., 2017; Huang et al., 2021). These mechanisms provide
robustness against certain types of attacks, such as the Fast
Gradient Sign Method (FGSM) (Szegedy et al., 2013; Good-
fellow et al., 2014). However, the overwhelming majority
of these defense mechanisms are highly ineffective against
more complex attacks such as adaptive and brute-force meth-
ods (Tramer et al., 2020; Carlini and Wagner, 2017). This
ineffectiveness necessitates: 1) the design of rigorous ver-
ification approaches that can measure the robustness of a
given network; 2) the development of defense mechanisms
that are
verifiably
robust against any attack strategy within
the class of permissible attack strategies.
To verify the robustness of a given network against any at-
tack in a reasonable set of permissible attacks (e.g.
`p
-norm
ball around the given input data), one needs to solve a hard
non-convex optimization problem (see, e.g., Problem
(1)
in
this paper). Consequently, exact verifiers, such as the ones
developed in (Tjeng et al., 2017; Xiao et al., 2018), are not
scalable to large networks. To develop scalable verifiers, the
community turn to “inexact" verifiers which can only verify
a subset of perturbations to the input data that the network
can defend against successfully. These verifiers typically
rely on tractable lower bounds for the verification optimiza-
tion problem. Gowal et al. (2018) finds such a lower-bound
by interval bound propagation (IBP), which is essentially
an efficient convex relaxation of the constraint sets in the
verification problem. Despite its simplicity, this approach
demonstrates relatively superior performance compared to
prior works.
IBP-CROWN (Zhang et al., 2019) combines IBP with novel
linear relaxations to have a tighter approximation than stan-
dalone IBP.
β
-Crown (Wang et al., 2021) utilizes a branch-
and-bound technique combined with the linear bounds in
IBP-CROWN to tighten the relaxation gap further. While
β
-Crown demonstrates a tremendous performance gain over
other verifiers such as Zhang et al. (2019); Fazlyab et al.
(2019); Lu and Kumar (2019), it cannot be used as a tool
in large-scale training procedures due to its computation-
arXiv:2210.14410v2 [cs.CV] 10 May 2023
Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes
ally expensive branch-and-bound search. One can adopt a
composition of certified architectures to enhance the perfor-
mance of the obtained model on both natural and adversarial
accuracy (Müller et al., 2021; Horváth et al., 2022).
Another line of work for enhancing the performance of
certifiably robust neural networks relies on the idea of learn-
ing a detector alongside the classifier to capture adversar-
ial samples. Instead of trying to classify adversarial im-
ages correctly, these works design a detector to determine
whether a given sample is natural/in-distribution or a crafted
attack/out-of-distribution. Chen et al. (2020) train the detec-
tor on both in-distribution and out-of-distribution samples
to learn a detector distinguishing these samples. Hendrycks
and Gimpel (2016) develops a method based on a simple
observation that, for real samples, the output of softmax
layer is closer to
0
or
1
compared to out-of-distribution and
adversarial examples where the softmax output entries are
distributed more uniformly. DeVries and Taylor (2018);
Sheikholeslami et al. (2020); Stutz et al. (2020) learn un-
certainty regions around actual samples where the network
prediction remains the same. Interestingly, this approach
does not require out-of-distribution samples during train-
ing. Other approaches such as deep generative models (Ren
et al., 2019), self-supervised and ensemble methods (Vyas
et al., 2018; Chen et al., 2021b) are also used to learn out-of-
distribution samples. However, typically these methods are
vulnerable to adversarial attacks and can be easily fooled by
carefully designed out-of-distribution images (Fort, 2022)
as discussed in Tramer (2022). A more resilient approach
is to jointly learn the detector and the classifier (Laidlaw
and Feizi, 2019; Sheikholeslami et al., 2021; Chen et al.,
2021a) by adding an auxiliary abstain output class capturing
adversarial samples.
Building on these prior works, this paper develops a frame-
work for detecting adversarial examples using multiple ab-
stain classes. We observe that naïvely adding multiple ab-
stain classes (in the existing framework of Sheikholeslami
et al. (2021)) results in a model degeneracy phenomenon
where all adversarial examples are assigned to a small frac-
tion of abstain classes (while other abstain classes are not
utilized). To resolve this issue, we propose a novel regular-
izer and a training procedure to balance the assignment of
adversarial examples to abstain classes. Our experiments
demonstrate that utilizing multiple abstain classes in con-
junction with the proper regularization enhances the robust
verified accuracy on adversarial examples while maintaining
the standard accuracy of the classifier.
Challenges and Contribution.
We propose a framework
for training and verifying robust neural nets with multiple
detection classes. The resulting optimization problems for
training and verifying such networks is a constrained min-
max optimization problem over a probability simplex that
is more challenging from an optimization perspective than
the problems associated with networks with no or single
detection classes. We devise an efficient algorithm for this
problem. Furthermore, having multiple detectors leads to
the “model degeneracy" phenomenon, where not all detec-
tion classes are utilized. To prevent model degeneracy and
to avoid tuning the number of network detectors
, we intro-
duce a regularization mechanism guaranteeing that all de-
tectors contribute to detecting adversarial examples to the
extent possible. We propose convergent algorithms for the
verification (and training) problems using proximal gradient
descent with Bregman divergence. Compared to networks
with a single detection class, our experiments show that we
enhance the robust verified accuracy by more than
5%
and
2%
on CIFAR-10 and MNIST datasets, respectively, for
various perturbation sizes.
Roadmap.
In section 2 we review interval bound propaga-
tion (IBP) and
β
-crown as two existing efficient methods for
verifying the performance of multi-layer neural networks
against adversarial attacks. We discuss how to train and
verify joint classifier and detector networks (with a single
abstain class) based on these two approaches. Section 3 is
dedicated to the motivation and procedure of joint verifi-
cation and classification of neural networks with
multiple
abstain classes. In particular, we extend IBP and
β
-crown
verification procedures to networks with multiple detection
classes. In section 4, we show how to train neural networks
with multiple detection classes via IBP procedure. How-
ever, we show that the performance of the trained network
cannot be improved by only increasing the number of detec-
tion classes due to “model degeneracy" (a phenomenon that
happens when multiple detectors behave very similarly and
identify the same adversarial examples). To avoid model de-
generacy and to automatically/implicitly tune the number of
detection classes, we introduce a regularization mechanism
such that all detection classes are used in balance.
2 Background
2.1 Verification of feedforward neural networks
Consider an
L
-layer feedforward neural network with
{Wi,bi}
denoting the weight and bias parameters associ-
ated with layer
i
, and let
σi(·)
denote the activation function
applied at layer
i
. Throughout the paper, we assume the
activation function is the same for all hidden layers, i.e.,
σi(·) = σ(·) = ReLU(·),i= 1, . . . , L 1
. Thus, our
neural network can be described as
zi=σ(Wizi1+bi)i[L1],zL=WLzL1+bL,
where
z0=x
is the input to the neural network and
zi
is the
output of layer
i
and
[N]
denotes the set
{1, . . . , N}
. Note
that the activation function is not applied at the last layer.
Further, we use
[z]i
to denote the
i
-th element of the vector
z
. We consider a supervised classification task where
zL
represents the logits. To explicitly show the dependence of
zL
on the input data, we use the notation
zL(x)
to denote
logit values when xis used as the input data point.
Sina Baharlouei, Fatemeh Sheikholeslami1, Meisam Razaviyayn, Zico Kolter
Given an input
x0
with the ground-truth label
y
, and a pertur-
bation set
C(x0, )
(e.g.
C(x0, ) = {x|kxx0k}
),
the network is provably robust against adversarial attacks
on x0if 0min
x∈C(x0,)cT
ykzL(x),k6=y, (1)
where
cyk =eyek
with
ek
(resp.
ey
) denoting the
standard unit vector whose
k
-th row (resp.
y
-th row) is
1
and the other entries are zero. Condition
(1)
implies that
the logit score of the network for the true label
y
is always
greater than that of any other label
k
for all
x∈ C(x0, )
.
Thus, the network will correctly classify all the points inside
C(x0, )
. The objective function in Eq. (1) is non-convex
when
L2
. It is customary in many works to move
the non-convexity of the problem to the constraint set and
reformulate Eq. (1) as
0min
z∈Z(x0,)cT
ykz,k6=y, (2)
where
Z(x0, ) = {z|z=zL(x)for some x∈ C(x0, )}
.
This verification problem has a linear objective function
and a non-convex constraint set. Since both problems
(1)
and
(2)
are non-convex, existing works proposed efficiently
computable lower-bounds for the optimal objective value of
them. For example, Gowal et al. (2018); Wong and Kolter
(2018) utilize convex relaxation, while Tjeng et al. (2017);
Wang et al. (2021) rely on mixed integer programming and
branch-and-bound to find lower-bounds for the optimal ob-
jective value of
(2)
. In what follows, we explain two popular
and relatively successful approaches for solving the verifi-
cation problem (1) (or equivalently (2)) in detail.
2.2 Verification of neural networks via IBP
Interval Bound Propagation (IBP) of Gowal et al. (2018)
tackles problem (2) by convexification of the constraint set
Z(x0, )
to its convex hypercube super-set
[z(x0),¯
z(x0)]
,
i.e.,
Z(x0, )[z(x0),¯
z(x0)]
. After this relaxation, prob-
lem (2) can be lower-bounded by the convex problem:
min
z(x0)z¯
z(x0)cT
ykz(3)
The upper- and lower- bounds
z(x0)
and
¯
z(x0)
are obtained
by recursively finding the convex relaxation of the image of
the set
C(x0, )
at each layer of the network. In particular,
for the adversarial set
C(x0, ) = {x|kxx0k}
,
we start from
z0(x0) = x01
and
¯
z0(x0) = x0+1
.
Then, the lower-bound
zL(x0)
and upper-bound
¯
zL(x0)
are
computed by the recursions for all i[L]:
¯
zi(x0) = σ(WT
i
¯
zi1+zi1
2+|WT
i|¯
zi1zi1
2),
zi(x0) = σ(WT
i
¯
zi1+zi1
2− |WT
i|¯
zi1zi1
2).
(4)
Note that
|W|
denotes the element-wise absolute value of
matrix
W
. One of the main advantages of IBP is its efficient
computation: verification of a given input only requires two
forward passes for finding the lower and upper bounds,
followed by a linear programming.
2.3 Verification of neural networks via β-Crown
Despite its simplicity, IBP-based verification comes with
a certain limitation, namely the looseness of its layer-by-
layer bounds of the input. To overcome this limitation,
tighter verification methods have been proposed in the liter-
ature (Singh et al., 2018; Zhang et al., 2019; Dathathri et al.,
2020; Wang et al., 2021). Among these,
β
-crown (Wang
et al., 2021) utilizes the branch-and-bound technique to gen-
eralize and improve the IBP-CROWN proposed in Zhang
et al. (2019). Let
zi
and
¯
zi
be the estimated element-wise
lower-bound and upper-bounds for the pre-activation value
of
zi
, i.e.,
zizi¯
zi
, where these lower and upper
bounds are obtained by the method in Zhang et al. (2019).
Let
ˆ
zi
be the value we obtain by applying ReLU function
to
zi
. A neuron is called unstable if its sign after apply-
ing ReLU activation cannot be determined based on only
knowing the corresponding lower and upper bounds. That
is, a neuron is unstable if
zi<0<¯
zi
. For
stable
neu-
rons, no relaxation is needed to enforce convexity of
σ(z)
(since the neuron operates in a linear regime). On the other
hand, given an unstable neuron, they use a branch-and-
bound (BAB) approach to split the input range of the neuron
into two sub-domains
Cil ={x∈ C(x0, )|ˆzi0}
and
Ciu ={x∈ C(x0, )|ˆzi>0}
. The neuron operates linearly
within each subdomain. Thus we can verify each subdo-
main separately. If we have
N
unstable nodes, the BAB
algorithm requires the investigation of
2N
sub-domains in
the worst case.
β
-Crown proposes a heuristic for traversing
all these subdomains: The higher the absolute value of the
corresponding lower bound of a node is, the sooner the ver-
ifier visits it. For verifying each sub-problem, Wang et al.
(2021) proposed a lower-bounded which requires solving a
maximization problem over two parameters αand β:
min
z∈Z(x0,)cT
ykzmax
α,βg(x,α,β)
where g(x,α,β) = (a+Pαβ)Tx+qT
αβ+dα.(5)
Here, the matrix
P
and the vectors
q,a
and
d
are functions
of
Wi,bi,zi,¯
zi,α,
and
β
parameters. See Appendix D for
the precise definition of
g
. Notice that any choice of
(α,β)
provides a valid lower bound for verification. However,
optimizing αand βin (5) leads to a tighter bound.
2.4 Training a joint robust classifier and detector
Sheikholeslami et al. (2021) improves the performance
tradeoff on natural and adversarial examples by introducing
an auxiliary class for detecting adversarial examples. If
this auxiliary class is selected as the output, the network
“abstains" from declaring any of the original
K
classes for
the given input. Let
a
be the abstain class. The classification
network performs correctly on an adversarial image if it is
Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes
classified correctly as the class of the original (unperturbed)
image (similar to robust networks without detectors) or it
is classified as the abstain class (detected as an adversarial
example). Hence, for input image
(x0, y)
the network is
verified against a certain class k6=yif
0min
z∈Z(x0,)max(cT
ykz,cT
akz),(6)
i.e., if the score of the true label
y
or the score of the abstain
class ais larger than the score of class k.
To train a neural network that can jointly detect and classify
a dataset of images, Sheikholeslami et al. (2021) relies on
the loss function of the form:
LTotal =LRobust +λ1LAbstain
Robust +λ2LNatural,(7)
where the term
LNatural
denotes the natural loss when
no adversarial examples are considered. More precisely,
LNatural =1
nPn
i=1 `xentzL(xi), yi
, where
`xent
is the stan-
dard cross-entropy loss. The term
LRobust
in
(7)
represents
the worst-case adversarial loss used in (Madry et al., 2017),
without considering the abstain class. Precisely,
LRobust = max
δ1,...,δn
1
n
n
X
i=1
`xentzL(xi+δi), yi
s.t.kδik, i= 1, . . . , n.
Finally, the Robust-Abstain loss
LAbstain
Robust
is the minimum of
the detector and the classifier losses:
LAbstain
Robust = max
δ1,...,δn
1
n
n
X
i=1
min `xentzL(xi+δi), yi,
`xentzL(xi+δi), a
s.t.kδik, i= 1,...,n.
(8)
In
(7)
, tuning
λ1
and
λ2
controls the trade-off between
standard and robust accuracy. Furthermore, to obtain non-
trivial results, IBP-relaxation should be incorporated during
training for the minimization sub-problems in
Lrobust
and
Labstain
robust (Sheikholeslami et al., 2021; Gowal et al., 2018).
3 Verification of Neural Networks with
Multiple Detection Classes
Motivation:
The set of all adversarial images that can be
generated within the
-neighborhood of clean images might
not be detectable only by a single detection class. Hence,
the robust verified accuracy of the joint classifier and detec-
tor can be enhanced by introducing multiple abstain classes
instead of a single abstain class to detect adversarial exam-
ples. This observation is illustrated in a simple example in
Appendix F where we theoretically show that
2
detection
classes can drastically increase the performance of the de-
tector compared to
1
detection class. Note that a network
with multiple detection classes can be equivalently modeled
by another network with one more layer and a single abstain
Figure 1: The IBP verification for
400
input data points of
2
-layer
and
3
-layer neural networks. Part
(a)
shows the assigned four
labels to data points. Part
(b)
demonstrates that IBP can verify
14
points using one of two abstain classes (black triangles), while
it cannot verify
13
data points (red
×
).
c)
shows that when IBP
is applied to a network with one more layer and one detection
class,
8
points are verified by the detection class, while it fails to
verify
21
points. The description of both networks can be found in
Appendix G.
class. This added layer, which can be a fully connected
layer with a max activation function, can merge all abstain
classes and collapse them into a single class. Thus, any
L
-layer neural network with multiple abstain classes can
be equivalently modeled by an
L+ 1
-layer neural network
with a single abstain class. However, the performance of
verifiers such as IBP reduces as we increase the number of
layers. The reason is that increasing the number of layers
leads to looser bounds in
(4)
for the last layer. To illustrate
this fact, Figure 1 shows that the number of verified points
by a
2
layer neural network is higher than the number of
points verified by an equivalent network with 3layers.
Thus, it is beneficial to train/verify the original
L
-layer
neural network with multiple abstain classes instead of
L+1
-
layer network with a single abstain class. This fact will be
illustrated further in the experiments on MNIST and CIFAR-
10 datasets depicted in Figure 2. Next, we present how one
can verify a network with multiple abstain classes:
Let
a1, a2, . . . , aM
be
M
abstain classes detecting adver-
sarial samples. A sample is considered adversarial if the
network’s output is any of the
M
abstain classes. A
neural network with
K
regular classes and
M
abstain
classes outputs the label of a given sample as
ˆy(x) =
argmaxi∈{1,...,K,a1,...,aM}[zL(x)]i.
An input
(x, y)
is ver-
ified if the network either correctly classifies it as class
y
or assigns it to any of the explicit
M
abstain classes. More
formally and following equation
(6)
, the neural network is
verified for input x0against a target class kif
0min
zL∈Z(x0,)max cT
ykzL,cT
a1kzL,...,cT
aMkzL.(9)
摘要:

ImprovingAdversarialRobustnessviaJointClassicationandMultipleExplicitDetectionClassesSinaBaharloueiFatemehSheikholeslami1MeisamRazaviyaynZicoKolterUSCbaharlou@usc.eduAmazonAlexaAIshfateme@amazon.comUSCrazaviya@usc.eduBoschCenterforAI,CMUzkolter@cs.cmu.eduAbstractThisworkconcernsthedevelopmentofdeep...

展开>> 收起<<
Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:2.67MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注