Improving Adversarial Robustness via Joint Classiﬁcation and Multiple Explicit Detection Classes Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter

2025-05-08 0 0 2.67MB 20 页 10玖币

侵权投诉

Improving Adversarial Robustness via Joint Classiﬁcation and Multiple

Explicit Detection Classes

Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter

USC

baharlou@usc.edu

Amazon Alexa AI

shfateme@amazon.com

USC

razaviya@usc.edu

Bosch Center for AI, CMU

zkolter@cs.cmu.edu

Abstract

This work concerns the development of deep

networks that are certiﬁably robust to adversarial

attacks. Joint robust classiﬁcation-detection

was recently introduced as a certiﬁed defense

mechanism, where adversarial examples are

either correctly classiﬁed

assigned to the

“abstain” class. In this work, we show that such a

provable framework can beneﬁt by extension to

networks with

multiple

explicit abstain classes,

where the adversarial examples are adaptively

assigned to those. We show that naïvely adding

multiple abstain classes can lead to “model

degeneracy”, then we propose a regularization

approach and a training method to counter this

degeneracy by promoting full use of the multiple

abstain classes. Our experiments demonstrate

that the proposed approach consistently achieves

favorable standard vs. robust veriﬁed accu-

racy tradeoffs, outperforming state-of-the-art

algorithms for various choices of number of

abstain classes. Our code is available at

https:

//github.com/sinaBaharlouei/

MultipleAbstainDetection.

1 Introduction

Deep Neural Networks (DNNs) have revolutionized

many machine learning tasks such as image process-

ing (Krizhevsky et al., 2012; Zhu et al., 2021) and speech

recognition (Graves et al., 2013; Nassif et al., 2019). How-

ever, despite their superior performance, DNNs are highly

vulnerable to adversarial attacks and perform poorly on out-

of-distributions samples (Goodfellow et al., 2014; Liang

et al., 2017; Yuan et al., 2019). To address the vulnerability

The work of FS was done when FS was with the Bosch Center

for AI.

Proceedings of the 26

International Conference on Artiﬁcial Intel-

ligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR:

of DNNs to adversarial attacks, the community has designed

various defense mechanisms against such attacks (Papernot

et al., 2016; Jang et al., 2019; Goldblum et al., 2020; Madry

et al., 2017; Huang et al., 2021). These mechanisms provide

robustness against certain types of attacks, such as the Fast

Gradient Sign Method (FGSM) (Szegedy et al., 2013; Good-

fellow et al., 2014). However, the overwhelming majority

of these defense mechanisms are highly ineffective against

more complex attacks such as adaptive and brute-force meth-

ods (Tramer et al., 2020; Carlini and Wagner, 2017). This

ineffectiveness necessitates: 1) the design of rigorous ver-

iﬁcation approaches that can measure the robustness of a

given network; 2) the development of defense mechanisms

that are

veriﬁably

robust against any attack strategy within

the class of permissible attack strategies.

To verify the robustness of a given network against any at-

tack in a reasonable set of permissible attacks (e.g.

-norm

ball around the given input data), one needs to solve a hard

non-convex optimization problem (see, e.g., Problem

(1)

this paper). Consequently, exact veriﬁers, such as the ones

developed in (Tjeng et al., 2017; Xiao et al., 2018), are not

scalable to large networks. To develop scalable veriﬁers, the

community turn to “inexact" veriﬁers which can only verify

a subset of perturbations to the input data that the network

can defend against successfully. These veriﬁers typically

rely on tractable lower bounds for the veriﬁcation optimiza-

tion problem. Gowal et al. (2018) ﬁnds such a lower-bound

by interval bound propagation (IBP), which is essentially

an efﬁcient convex relaxation of the constraint sets in the

veriﬁcation problem. Despite its simplicity, this approach

demonstrates relatively superior performance compared to

prior works.

IBP-CROWN (Zhang et al., 2019) combines IBP with novel

linear relaxations to have a tighter approximation than stan-

dalone IBP.

-Crown (Wang et al., 2021) utilizes a branch-

and-bound technique combined with the linear bounds in

IBP-CROWN to tighten the relaxation gap further. While

-Crown demonstrates a tremendous performance gain over

other veriﬁers such as Zhang et al. (2019); Fazlyab et al.

(2019); Lu and Kumar (2019), it cannot be used as a tool

in large-scale training procedures due to its computation-

arXiv:2210.14410v2 [cs.CV] 10 May 2023

Improving Adversarial Robustness via Joint Classiﬁcation and Multiple Explicit Detection Classes

ally expensive branch-and-bound search. One can adopt a

composition of certiﬁed architectures to enhance the perfor-

mance of the obtained model on both natural and adversarial

accuracy (Müller et al., 2021; Horváth et al., 2022).

Another line of work for enhancing the performance of

certiﬁably robust neural networks relies on the idea of learn-

ing a detector alongside the classiﬁer to capture adversar-

ial samples. Instead of trying to classify adversarial im-

ages correctly, these works design a detector to determine

whether a given sample is natural/in-distribution or a crafted

attack/out-of-distribution. Chen et al. (2020) train the detec-

tor on both in-distribution and out-of-distribution samples

to learn a detector distinguishing these samples. Hendrycks

and Gimpel (2016) develops a method based on a simple

observation that, for real samples, the output of softmax

layer is closer to

compared to out-of-distribution and

adversarial examples where the softmax output entries are

distributed more uniformly. DeVries and Taylor (2018);

Sheikholeslami et al. (2020); Stutz et al. (2020) learn un-

certainty regions around actual samples where the network

prediction remains the same. Interestingly, this approach

does not require out-of-distribution samples during train-

ing. Other approaches such as deep generative models (Ren

et al., 2019), self-supervised and ensemble methods (Vyas

et al., 2018; Chen et al., 2021b) are also used to learn out-of-

distribution samples. However, typically these methods are

vulnerable to adversarial attacks and can be easily fooled by

carefully designed out-of-distribution images (Fort, 2022)

as discussed in Tramer (2022). A more resilient approach

is to jointly learn the detector and the classiﬁer (Laidlaw

and Feizi, 2019; Sheikholeslami et al., 2021; Chen et al.,

2021a) by adding an auxiliary abstain output class capturing

adversarial samples.

Building on these prior works, this paper develops a frame-

work for detecting adversarial examples using multiple ab-

stain classes. We observe that naïvely adding multiple ab-

stain classes (in the existing framework of Sheikholeslami

et al. (2021)) results in a model degeneracy phenomenon

where all adversarial examples are assigned to a small frac-

tion of abstain classes (while other abstain classes are not

utilized). To resolve this issue, we propose a novel regular-

izer and a training procedure to balance the assignment of

adversarial examples to abstain classes. Our experiments

demonstrate that utilizing multiple abstain classes in con-

junction with the proper regularization enhances the robust

veriﬁed accuracy on adversarial examples while maintaining

the standard accuracy of the classiﬁer.

Challenges and Contribution.

We propose a framework

for training and verifying robust neural nets with multiple

detection classes. The resulting optimization problems for

training and verifying such networks is a constrained min-

max optimization problem over a probability simplex that

is more challenging from an optimization perspective than

the problems associated with networks with no or single

detection classes. We devise an efﬁcient algorithm for this

problem. Furthermore, having multiple detectors leads to

the “model degeneracy" phenomenon, where not all detec-

tion classes are utilized. To prevent model degeneracy and

to avoid tuning the number of network detectors

, we intro-

duce a regularization mechanism guaranteeing that all de-

tectors contribute to detecting adversarial examples to the

extent possible. We propose convergent algorithms for the

veriﬁcation (and training) problems using proximal gradient

descent with Bregman divergence. Compared to networks

with a single detection class, our experiments show that we

enhance the robust veriﬁed accuracy by more than

and

on CIFAR-10 and MNIST datasets, respectively, for

various perturbation sizes.

Roadmap.

In section 2 we review interval bound propaga-

tion (IBP) and

-crown as two existing efﬁcient methods for

verifying the performance of multi-layer neural networks

against adversarial attacks. We discuss how to train and

verify joint classiﬁer and detector networks (with a single

abstain class) based on these two approaches. Section 3 is

dedicated to the motivation and procedure of joint veriﬁ-

cation and classiﬁcation of neural networks with

multiple

abstain classes. In particular, we extend IBP and

-crown

veriﬁcation procedures to networks with multiple detection

classes. In section 4, we show how to train neural networks

with multiple detection classes via IBP procedure. How-

ever, we show that the performance of the trained network

cannot be improved by only increasing the number of detec-

tion classes due to “model degeneracy" (a phenomenon that

happens when multiple detectors behave very similarly and

identify the same adversarial examples). To avoid model de-

generacy and to automatically/implicitly tune the number of

detection classes, we introduce a regularization mechanism

such that all detection classes are used in balance.

2 Background

2.1 Veriﬁcation of feedforward neural networks

Consider an

-layer feedforward neural network with

{Wi,bi}

denoting the weight and bias parameters associ-

ated with layer

, and let

σi(·)

denote the activation function

applied at layer

. Throughout the paper, we assume the

activation function is the same for all hidden layers, i.e.,

σi(·) = σ(·) = ReLU(·),∀i= 1, . . . , L −1

. Thus, our

neural network can be described as

zi=σ(Wizi−1+bi)∀i∈[L−1],zL=WLzL−1+bL,

where

z0=x

is the input to the neural network and

is the

output of layer

and

[N]

denotes the set

{1, . . . , N}

. Note

that the activation function is not applied at the last layer.

Further, we use

[z]i

to denote the

-th element of the vector

. We consider a supervised classiﬁcation task where

represents the logits. To explicitly show the dependence of

on the input data, we use the notation

zL(x)

to denote

logit values when xis used as the input data point.

Sina Baharlouei, Fatemeh Sheikholeslami1, Meisam Razaviyayn, Zico Kolter

Given an input

with the ground-truth label

, and a pertur-

bation set

C(x0, )

(e.g.

C(x0, ) = {x|kx−x0k∞≤}

the network is provably robust against adversarial attacks

on x0if 0≤min

x∈C(x0,)cT

ykzL(x),∀k6=y, (1)

where

cyk =ey−ek

with

(resp.

) denoting the

standard unit vector whose

-th row (resp.

-th row) is

and the other entries are zero. Condition

(1)

implies that

the logit score of the network for the true label

is always

greater than that of any other label

for all

x∈ C(x0, )

Thus, the network will correctly classify all the points inside

C(x0, )

. The objective function in Eq. (1) is non-convex

when

L≥2

. It is customary in many works to move

the non-convexity of the problem to the constraint set and

reformulate Eq. (1) as

0≤min

z∈Z(x0,)cT

ykz,∀k6=y, (2)

where

Z(x0, ) = {z|z=zL(x)for some x∈ C(x0, )}

This veriﬁcation problem has a linear objective function

and a non-convex constraint set. Since both problems

(1)

and

(2)

are non-convex, existing works proposed efﬁciently

computable lower-bounds for the optimal objective value of

them. For example, Gowal et al. (2018); Wong and Kolter

(2018) utilize convex relaxation, while Tjeng et al. (2017);

Wang et al. (2021) rely on mixed integer programming and

branch-and-bound to ﬁnd lower-bounds for the optimal ob-

jective value of

(2)

. In what follows, we explain two popular

and relatively successful approaches for solving the veriﬁ-

cation problem (1) (or equivalently (2)) in detail.

2.2 Veriﬁcation of neural networks via IBP

Interval Bound Propagation (IBP) of Gowal et al. (2018)

tackles problem (2) by convexiﬁcation of the constraint set

Z(x0, )

to its convex hypercube super-set

[z(x0),¯

z(x0)]

i.e.,

Z(x0, )⊆[z(x0),¯

z(x0)]

. After this relaxation, prob-

lem (2) can be lower-bounded by the convex problem:

min

z(x0)≤z≤¯

z(x0)cT

ykz(3)

The upper- and lower- bounds

z(x0)

and

z(x0)

are obtained

by recursively ﬁnding the convex relaxation of the image of

the set

C(x0, )

at each layer of the network. In particular,

for the adversarial set

C(x0, ) = {x|kx−x0k∞≤}

we start from

z0(x0) = x0−1

and

z0(x0) = x0+1

Then, the lower-bound

zL(x0)

and upper-bound

zL(x0)

are

computed by the recursions for all i∈[L]:

zi(x0) = σ(WT

zi−1+zi−1

2+|WT

i|¯

zi−1−zi−1

2),

zi(x0) = σ(WT

zi−1+zi−1

2− |WT

i|¯

zi−1−zi−1

2).

(4)

Note that

|W|

denotes the element-wise absolute value of

matrix

. One of the main advantages of IBP is its efﬁcient

computation: veriﬁcation of a given input only requires two

forward passes for ﬁnding the lower and upper bounds,

followed by a linear programming.

2.3 Veriﬁcation of neural networks via β-Crown

Despite its simplicity, IBP-based veriﬁcation comes with

a certain limitation, namely the looseness of its layer-by-

layer bounds of the input. To overcome this limitation,

tighter veriﬁcation methods have been proposed in the liter-

ature (Singh et al., 2018; Zhang et al., 2019; Dathathri et al.,

2020; Wang et al., 2021). Among these,

-crown (Wang

et al., 2021) utilizes the branch-and-bound technique to gen-

eralize and improve the IBP-CROWN proposed in Zhang

et al. (2019). Let

and

be the estimated element-wise

lower-bound and upper-bounds for the pre-activation value

, i.e.,

zi≤zi≤¯

, where these lower and upper

bounds are obtained by the method in Zhang et al. (2019).

Let

be the value we obtain by applying ReLU function

. A neuron is called unstable if its sign after apply-

ing ReLU activation cannot be determined based on only

knowing the corresponding lower and upper bounds. That

is, a neuron is unstable if

zi<0<¯

. For

stable

neu-

rons, no relaxation is needed to enforce convexity of

σ(z)

(since the neuron operates in a linear regime). On the other

hand, given an unstable neuron, they use a branch-and-

bound (BAB) approach to split the input range of the neuron

into two sub-domains

Cil ={x∈ C(x0, )|ˆzi≤0}

and

Ciu ={x∈ C(x0, )|ˆzi>0}

. The neuron operates linearly

within each subdomain. Thus we can verify each subdo-

main separately. If we have

unstable nodes, the BAB

algorithm requires the investigation of

sub-domains in

the worst case.

-Crown proposes a heuristic for traversing

all these subdomains: The higher the absolute value of the

corresponding lower bound of a node is, the sooner the ver-

iﬁer visits it. For verifying each sub-problem, Wang et al.

(2021) proposed a lower-bounded which requires solving a

maximization problem over two parameters αand β:

min

z∈Z(x0,)cT

ykz≥max

α,βg(x,α,β)

where g(x,α,β) = (a+Pαβ)Tx+qT

αβ+dα.(5)

Here, the matrix

and the vectors

q,a

and

are functions

Wi,bi,zi,¯

zi,α,

and

parameters. See Appendix D for

the precise deﬁnition of

. Notice that any choice of

(α,β)

provides a valid lower bound for veriﬁcation. However,

optimizing αand βin (5) leads to a tighter bound.

2.4 Training a joint robust classiﬁer and detector

Sheikholeslami et al. (2021) improves the performance

tradeoff on natural and adversarial examples by introducing

an auxiliary class for detecting adversarial examples. If

this auxiliary class is selected as the output, the network

“abstains" from declaring any of the original

classes for

the given input. Let

be the abstain class. The classiﬁcation

network performs correctly on an adversarial image if it is

Improving Adversarial Robustness via Joint Classiﬁcation and Multiple Explicit Detection Classes

classiﬁed correctly as the class of the original (unperturbed)

image (similar to robust networks without detectors) or it

is classiﬁed as the abstain class (detected as an adversarial

example). Hence, for input image

(x0, y)

the network is

veriﬁed against a certain class k6=yif

0≤min

z∈Z(x0,)max(cT

ykz,cT

akz),(6)

i.e., if the score of the true label

or the score of the abstain

class ais larger than the score of class k.

To train a neural network that can jointly detect and classify

a dataset of images, Sheikholeslami et al. (2021) relies on

the loss function of the form:

LTotal =LRobust +λ1LAbstain

Robust +λ2LNatural,(7)

where the term

LNatural

denotes the natural loss when

no adversarial examples are considered. More precisely,

LNatural =1

nPn

i=1 `xentzL(xi), yi

, where

`xent

is the stan-

dard cross-entropy loss. The term

LRobust

(7)

represents

the worst-case adversarial loss used in (Madry et al., 2017),

without considering the abstain class. Precisely,

LRobust = max

δ1,...,δn

i=1

`xentzL(xi+δi), yi

s.t.kδik∞≤, ∀i= 1, . . . , n.

Finally, the Robust-Abstain loss

LAbstain

Robust

is the minimum of

the detector and the classiﬁer losses:

LAbstain

Robust = max

δ1,...,δn

i=1

min `xentzL(xi+δi), yi,

`xentzL(xi+δi), a

s.t.kδik∞≤, ∀i= 1,...,n.

(8)

(7)

, tuning

λ1

and

λ2

controls the trade-off between

standard and robust accuracy. Furthermore, to obtain non-

trivial results, IBP-relaxation should be incorporated during

training for the minimization sub-problems in

Lrobust

and

Labstain

robust (Sheikholeslami et al., 2021; Gowal et al., 2018).

3 Veriﬁcation of Neural Networks with

Multiple Detection Classes

Motivation:

The set of all adversarial images that can be

generated within the



-neighborhood of clean images might

not be detectable only by a single detection class. Hence,

the robust veriﬁed accuracy of the joint classiﬁer and detec-

tor can be enhanced by introducing multiple abstain classes

instead of a single abstain class to detect adversarial exam-

ples. This observation is illustrated in a simple example in

Appendix F where we theoretically show that

detection

classes can drastically increase the performance of the de-

tector compared to

detection class. Note that a network

with multiple detection classes can be equivalently modeled

by another network with one more layer and a single abstain

Figure 1: The IBP veriﬁcation for

400

input data points of

-layer

and

-layer neural networks. Part

(a)

shows the assigned four

labels to data points. Part

(b)

demonstrates that IBP can verify

points using one of two abstain classes (black triangles), while

it cannot verify

data points (red

shows that when IBP

is applied to a network with one more layer and one detection

class,

points are veriﬁed by the detection class, while it fails to

verify

points. The description of both networks can be found in

Appendix G.

class. This added layer, which can be a fully connected

layer with a max activation function, can merge all abstain

classes and collapse them into a single class. Thus, any

-layer neural network with multiple abstain classes can

be equivalently modeled by an

L+ 1

-layer neural network

with a single abstain class. However, the performance of

veriﬁers such as IBP reduces as we increase the number of

layers. The reason is that increasing the number of layers

leads to looser bounds in

(4)

for the last layer. To illustrate

this fact, Figure 1 shows that the number of veriﬁed points

by a

2−

layer neural network is higher than the number of

points veriﬁed by an equivalent network with 3layers.

Thus, it is beneﬁcial to train/verify the original

-layer

neural network with multiple abstain classes instead of

L+1

layer network with a single abstain class. This fact will be

illustrated further in the experiments on MNIST and CIFAR-

10 datasets depicted in Figure 2. Next, we present how one

can verify a network with multiple abstain classes:

Let

a1, a2, . . . , aM

abstain classes detecting adver-

sarial samples. A sample is considered adversarial if the

network’s output is any of the

abstain classes. A

neural network with

regular classes and

abstain

classes outputs the label of a given sample as

ˆy(x) =

argmaxi∈{1,...,K,a1,...,aM}[zL(x)]i.

An input

(x, y)

is ver-

iﬁed if the network either correctly classiﬁes it as class

or assigns it to any of the explicit

abstain classes. More

formally and following equation

(6)

, the neural network is

veriﬁed for input x0against a target class kif

0≤min

zL∈Z(x0,)max cT

ykzL,cT

a1kzL,...,cT

aMkzL.(9)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovingAdversarialRobustnessviaJointClassicationandMultipleExplicitDetectionClassesSinaBaharloueiFatemehSheikholeslami1MeisamRazaviyaynZicoKolterUSCbaharlou@usc.eduAmazonAlexaAIshfateme@amazon.comUSCrazaviya@usc.eduBoschCenterforAI,CMUzkolter@cs.cmu.eduAbstractThisworkconcernsthedevelopmentofdeep...

展开>> 收起<<

Improving Adversarial Robustness via Joint Classiﬁcation and Multiple Explicit Detection Classes Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improving Adversarial Robustness via Joint Classiﬁcation and Multiple Explicit Detection Classes Sina Baharlouei Fatemeh Sheikholeslami1Meisam Razaviyayn Zico Kolter

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: