MultiGuard Provably Robust Multi-label Classiﬁcation against Adversarial Examples Jinyuan Jia

2025-05-02 0 0 1.36MB 22 页 10玖币

侵权投诉

MultiGuard: Provably Robust Multi-label

Classiﬁcation against Adversarial Examples

Jinyuan Jia∗

University of Illinois Urbana-Champaign

jinyuan@illinois.edu

Wenjie Qu∗

Huazhong University of Science and Technology

wen_jie_qu@outlook.com

Neil Zhenqiang Gong

Duke University

neil.gong@duke.edu

Abstract

Multi-label classiﬁcation, which predicts a set of labels for an input, has many ap-

plications. However, multiple recent studies showed that multi-label classiﬁcation

is vulnerable to adversarial examples. In particular, an attacker can manipulate the

labels predicted by a multi-label classiﬁer for an input via adding carefully crafted,

human-imperceptible perturbation to it. Existing provable defenses for multi-class

classiﬁcation achieve sub-optimal provable robustness guarantees when general-

ized to multi-label classiﬁcation. In this work, we propose MultiGuard, the ﬁrst

provably robust defense against adversarial examples to multi-label classiﬁcation.

Our MultiGuard leverages randomized smoothing, which is the state-of-the-art

technique to build provably robust classiﬁers. Speciﬁcally, given an arbitrary

multi-label classiﬁer, our MultiGuard builds a smoothed multi-label classiﬁer

via adding random noise to the input. We consider isotropic Gaussian noise in

this work. Our major theoretical contribution is that we show a certain num-

ber of ground truth labels of an input are provably in the set of labels predicted

by our MultiGuard when the

-norm of the adversarial perturbation added to

the input is bounded. Moreover, we design an algorithm to compute our prov-

able robustness guarantees. Empirically, we evaluate our MultiGuard on VOC

2007, MS-COCO, and NUS-WIDE benchmark datasets. Our code is available at:

https://github.com/quwenjie/MultiGuard

1 Introduction

Multi-class classiﬁcation assumes each input only has one ground truth label and thus often predicts

a single label for an input. In contrast, in multi-label classiﬁcation [

], each input has

multiple ground truth labels and thus a multi-label classiﬁer predicts a set of labels for an input.

For instance, an image could have multiple objects, attributes, or scenes. Multi-label classiﬁcation

has many applications such as diseases detection [

], object recognition [

], retail checkout

recognition [18], document classiﬁcation [34], etc..

However, similar to multi-class classiﬁcation, multiple recent studies [

] showed that

multi-label classiﬁcation is also vulnerable to adversarial examples. In particular, an attacker can

manipulate the set of labels predicted by a multi-label classiﬁer for an input via adding carefully

crafted perturbation to it. Adversarial examples pose severe security threats to the applications of

multi-label classiﬁcation in security-critical domains. To mitigate adversarial examples to multi-label

∗Equal contribution. Wenjie Qu performed this research when he was a remote intern in Gong’s group.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.01111v1 [cs.CR] 3 Oct 2022

classiﬁcation, several empirical defenses [

] have been proposed. For instance, Melacci et

al. [

] proposed to use the domain knowledge on the relationships among the classes to improve

the robustness of multi-label classiﬁcation. However, these defenses have no provable robustness

guarantees, and thus they are often broken by more advanced attacks. For instance, Melacci et al. [

]

showed that their proposed defense can be broken by an adaptive attack that exploits the domain

knowledge used in the defense. Moreover, existing provably robust defenses [

]

are all for multi-class classiﬁcation, which achieve sub-optimal provable robustness guarantee when

extended to multi-label classiﬁcation as shown by our experimental results.

Our work:

We propose MultiGuard, the ﬁrst provably robust defense against adversarial examples

for multi-label classiﬁcation. MultiGuard leverages randomized smoothing [

], which

is the state-of-the-art technique to build provably robust classiﬁers. In particular, compared to other

provably robust techniques, randomized smoothing has two advantages: 1) scalable to large-scale

neural networks, and 2) applicable to any classiﬁers. Suppose we have an arbitrary multi-label

classiﬁer (we call it base multi-label classiﬁer), which predicts

labels for an input. We build a

smoothed multi-label classiﬁer via randomizing an input. Speciﬁcally, given an input, we ﬁrst create

arandomized input via adding random noise to it. We consider the random noise to be isotropic

Gaussian in this work. Then, we use the base multi-label classiﬁer to predict labels for the randomized

input. Due to the randomness in the randomized input, the

labels predicted by the base multi-label

classiﬁer are also random. We use

to denote the probability that the label

is among the set of

labels predicted by the base multi-label classiﬁer for the randomized input, where i∈ {1,2,· · · , c}.

We call

label probability. Our smoothed multi-label classiﬁer predicts the

labels with the largest

label probabilities for the input. We note that k0and kare two different parameters.

Our main theoretical contribution is to show that, given a set of labels (e.g., the ground truth labels)

for an input, at least

of them are provably in the set of

labels predicted by MultiGuard for

the input, when the

-norm of the adversarial perturbation added to the input is no larger than a

threshold. We call

certiﬁed intersection size. We aim to derive the certiﬁed intersection size for

MultiGuard. However, existing randomized smoothing studies [

] achieves sub-optimal

provable robustness guarantees when generalized to derive our certiﬁed intersection size. The

key reason is they were designed for multi-class classiﬁcation instead of multi-label classiﬁcation.

Speciﬁcally, they can guarantee that a smoothed multi-class classiﬁer provably predicts the same

single label for an input [

] or a certain label is provably among the top-

labels predicted by

the smoothed multi-class classiﬁer [

]. In contrast, our certiﬁed intersection size characterizes the

intersection between the set of ground truth labels of an input and the set of labels predicted by a

smoothed multi-label classiﬁer. In fact, previous provable robustness results [

] are special cases

of ours, e.g., our results reduce to Cohen et al. [

] when

k0=k= 1

and Jia et al. [

] when

k0= 1

In particular, there are two challenges in deriving the certiﬁed intersection size. The ﬁrst challenge

is that the base multi-label classiﬁer predicts multiple labels for an input. The second challenge is

that an input has multiple ground truth labels. To solve the ﬁrst challenge, we propose a variant of

Neyman-Pearson Lemma [

] that is applicable to multiple functions, which correspond to multiple

labels predicted by the base multi-label classiﬁer. In contrast, existing randomized smoothing

studies [24, 26, 12, 22] for multi-class classiﬁcation use the standard Neyman-Pearson Lemma [33]

that is only applicable for a single function, since their base multi-class classiﬁer predicts a single

label for an input. To address the second challenge, we propose to use the law of contraposition

to simultaneously consider multiple ground truth labels of an input when deriving the certiﬁed

intersection size.

Our derived certiﬁed intersection size is the optimal solution to an optimization problem, which

involves the label probabilities. However, it is very challenging to compute the exact label probabilities

due to the continuity of the isotropic Gaussian noise and the complexity of the base multi-label

classiﬁers (e.g., complex deep neural networks). In response, we design a Monte Carlo algorithm

to estimate the lower or upper bounds of label probabilities with probabilistic guarantees. More

speciﬁcally, we can view the estimation of lower or upper bounds of label probabilities as a binomial

proportion conﬁdence interval estimation problem in statistics. Therefore, we use the Clopper-

Pearson [

] method from the statistics community to obtain the label probability bounds. Given the

estimated lower or upper bounds of label probabilities, we design an efﬁcient algorithm to solve the

optimization problem to obtain the certiﬁed intersection size.

Empirically, we evaluate our MultiGuard on VOC 2007, MS-COCO, and NUS-WIDE benchmark

datasets. We use the certiﬁed top-

precision@

,certiﬁed top-

recall@

, and certiﬁed top-

f1-score@

to evaluate our MultiGuard. Roughly speaking, certiﬁed top-

precision@

is the least

fraction of the

predicted labels that are ground truth labels of an input when the

-norm of the

adversarial perturbation is at most

; certiﬁed top-

recall@

is the least fraction of ground truth

labels of an input that are in the set of

labels predicted by our MultiGuard; and certiﬁed top-

f1-score@

is the harmonic mean of certiﬁed top-

precision@

and certiﬁed top-

recall@

. Our

experimental results show that our MultiGuard outperforms the state-of-the-art certiﬁed defense [

]

when extending it to multi-label classiﬁcation. For instance, on VOC 2007 dataset, Jia et al. [

]

and our MultiGuard respectively achieve 24.3% and 31.3% certiﬁed top-

precision@

, 51.6%

and 66.4% certiﬁed top-

recall@

, as well as 33.0% and 42.6% certiﬁed top-

f1-score@

when

k0= 1,k= 3, and R= 0.5.

Our major contributions can be summarized as follows:

•

We propose MultiGuard, the ﬁrst provably robust defense against adversarial examples for multi-

label classiﬁcation.

• We design a Monte Carlo algorithm to compute the certiﬁed intersection size.

• We evaluate our MultiGuard on VOC 2007, MS-COCO, and NUS-WIDE benchmark datasets.

2 Background and Related Work

Multi-label classiﬁcation:

In multi-label classiﬁcation, a multi-label classiﬁer predicts multiple

labels for an input. Many deep learning classiﬁers [

] have

been proposed for multi-label classiﬁcation. For instance, a naive method for multi-label classiﬁcation

is to train independent binary classiﬁers for each label and use ranking or thresholding to derive

the ﬁnal predicted labels. This method, however, ignores the topology structure among labels and

thus cannot capture the label co-occurrence dependency (e.g., mouse and keyboard usually appear

together). In response, several methods [

] have been proposed to improve the performance of

multi-label classiﬁcation via exploiting the label dependencies in an input. Despite their effectiveness,

these methods rely on complicated architecture modiﬁcations. To mitigate the issue, some recent

studies [

] proposed to design new loss functions. For instance, Baruch et al. [

] introduced an

asymmetric loss (ASL). Roughly speaking, their method is based on the observation that, in multi-

label classiﬁcation, most inputs contain only a small fraction of the possible candidate labels, which

leads to under-emphasizing gradients from positive labels during training. Their experimental results

indicate that their method achieves state-of-the-art performance on multiple benchmark datasets.

Adversarial examples to multi-label classiﬁcation:

Several recent studies [

]

showed that multi-label classiﬁcation is vulnerable to adversarial examples. An attacker can manip-

ulate the set of labels predicted by a multi-label classiﬁer for an input via adding carefully crafted

perturbation to it. For instance, Song et al. [

] proposed white-box, targeted attacks to multi-label

classiﬁcation. In particular, they ﬁrst formulate their attacks as optimization problems and then use

gradient descent to solve them. Their experimental results indicate that they can make a multi-label

classiﬁer produce an arbitrary set of labels for an input via adding adversarial perturbation to it. Yang

et al. [

] explored the worst-case mis-classiﬁcation risk of a multi-label classiﬁer. In particular, they

formulate the problem as a bi-level set function optimization problem and leverage random greedy

search to ﬁnd an approximate solution. Zhou et al. [

] proposed to generate

`∞

-norm adversarial

perturbations to fool a multi-label classiﬁer. In particular, they transform the optimization problem of

ﬁnding adversarial perturbations into a linear programming problem which can be solved efﬁciently.

Existing empirically robust defenses:

Some studies [

] developed empirical defenses to

mitigate adversarial examples in multi-label classiﬁcation. For instance, Wu et al. [

] applied

adversarial training, a method developed to train robust multi-class classiﬁers, to improve the

robustness of multi-label classiﬁers. Melacci et al. [

] showed that domain knowledge, which

measures the relationships among classes, can be used to detect adversarial examples and improve the

robustness of multi-label classiﬁers. However, all these defenses lack provable robustness guarantees

and thus, they are often broken by advanced adaptive attacks. For instance, Melacci et al. [

] showed

that their defenses can be broken by adaptive attacks that also consider the domain knowledge.

Existing provably robust defenses:

All existing provably robust defenses [

] were designed for multi-class classiﬁcation instead

of multi-label classiﬁcation. In particular, they can guarantee that a robust multi-class classiﬁer

predicts the same single label for an input or a label (e.g., the single ground truth label of the input) is

among the top-

labels predicted by a robust multi-class classiﬁer. These defenses are sub-optimal

for multi-label classiﬁcation. Speciﬁcally, in multi-label classiﬁcation, we aim to guarantee that at

least some ground truth labels of an input are in the set of labels predicted by a robust multi-label

classiﬁer.

MultiGuard leverages randomized smoothing [

]. Existing randomized smoothing

studies (e.g., Jia et al. [

]) achieve sub-optimal provable robustness guarantees (i.e., certiﬁed

intersection size) for multi-label classiﬁcation, because they are designed for multi-class classiﬁcation.

For example, as our empirical evaluation results will show, MultiGuard signiﬁcantly outperforms

Jia et al. [

] when extending it to multi-label classiﬁcation. Technically speaking, our work has

two key differences with Jia et al.. First, the base multi-class classiﬁer in Jia et al. only predicts a

single label for an input while our base multi-label classiﬁer predicts multiple labels for an input.

Second, Jia et al. can only guarantee that a single label is provably among the

labels predicted by a

smoothed multi-class classiﬁer, while we aim to show that multiple labels (e.g., ground truth labels of

an input) are provably among the

labels predicted by a smoothed multi-label classiﬁer. Due to such

key differences, we require new techniques to derive the certiﬁed intersection size of MultiGuard.

For instance, we develop a variant of Neyman-Pearson Lemma [

] which is applicable to multiple

functions while Jia et al. uses the standard Neyman-Pearson Lemma [

] which is only applicable to

a single function. Moreover, we use the law of contraposition to derive our certiﬁed intersection size,

which is not required by Jia et al..

3 Our MultiGuard

3.1 Building our MultiGuard

Label probability:

Suppose we have a multi-label classiﬁer

which we call base multi-label

classiﬁer. Given an input

, the base multi-label classiﬁer

predicts

labels for it. For simplicity,

we use

fk0(x)

to denote the set of

labels predicted by

for

. We use



to denote an isotropic

Gaussian noise, i.e.,

∼ N (0, σ2·I)

, where

is the standard deviation and

is an identity matrix.

Given

x+

as input, the output of

would be random due to the randomness of



, i.e.,

fk0(x+)

a random set of

labels. We deﬁne label probability

as the probability that the label

is among

the set of top-

labels predicted by

when adding isotropic Gaussian noise to an input

, where

i∈ {1,2,· · · , c}. Formally, we have pi=Pr(i∈fk0(x+)).

Our smoothed multi-label classiﬁer:

Given the label probability

’s for an input

, our smoothed

multi-label classiﬁer

predicts the

labels with the largest label probabilities for

. For simplicity,

we use

gk(x)

to denote the set of

labels predicted by our smoothed multi-label classiﬁer for an

input x.

Certiﬁed intersection size:

An attacker adds a perturbation

to an input

gk(x+δ)

is the set of

labels predicted by our smoothed multi-label classiﬁer for the perturbed input

x+δ

. Given a set

of labels

L(x)

(e.g., the ground truth labels of

), our goal is to show that at least

of them are in

the set of

labels predicted by our smoothed multi-label classiﬁer for the perturbed input, when the

`2-norm of the adversarial perturbation is at most R. Formally, we aim to show the following:

min

δ,kδk2≤R|L(x)∩gk(x+δ)| ≥ e, (1)

where we call

certiﬁed intersection size. Note that different inputs may have different certiﬁed

intersection sizes.

3.2 Deriving the Certiﬁed Intersection Size

Deﬁning two random variables:

Given an input

, we deﬁne two random variables

X=x+, Y=

x+δ+

, where

is an adversarial perturbation and



is isotropic Gaussian noise. Roughly speaking,

the random variables

and

respectively denote the inputs derived by adding isotropic Gaussian

noise to the input

and its adversarially perturbed version

x+δ

. Based on the deﬁnition of the

label probability, we have

pi=Pr(i∈fk0(X))

. We deﬁne adversarial label probability

p∗

i=Pr(i∈fk0(Y)), i ∈ {1,2,· · · , c}

. Intuitively, adversarial label probability

p∗

is the probability

that the label

is in the set of

labels predicted by the base multi-label classiﬁer

for

. Given an

adversarially perturbed input

x+δ

, our smoothed multi-label classiﬁer predicts the

labels with the

largest adversarial label probabilities p∗

i’s for it.

Derivation sketch:

We leverage the law of contraposition in our derivation. Roughly speaking, if we

have a statement:

P−→ Q

, then its contrapositive is:

¬Q−→ ¬P

, where

is the logical negation

symbol. The law of contraposition claims that a statement is true if and only if its contrapositive is

true. In particular, we deﬁne the following predicate:

Q: min

δ,kδk2≤R|L(x)∩gk(x+δ)| ≥ e. (2)

Intuitively,

is true if at least

labels in

L(x)

can be found in

gk(x+δ)

for an arbitrary adversarial

perturbation

whose

-norm is no larger than

. Then, we have

¬Q: minδ,kδk2≤R|L(x)∩

gk(x+δ)|< e

. Moreover, we derive a necessary condition (denoted as

¬P

) for

¬Q

to be true, i.e.,

¬Q−→ ¬P

. Roughly speaking,

¬P

compares upper bounds of the adversarial label probabilities

of the labels in

{1,2,· · · , c} \ L(x)

with lower bounds of those in

L(x)

. More speciﬁcally,

¬P

represents that the lower bound of the

th largest adversarial label probability of labels in

L(x)

is no

larger than the upper bound of the

(k−e+ 1)

th largest adversarial label probability of the labels in

{1,2,· · · , c} \ L(x)

. Finally, based on the law of contraposition, we have

P−→ Q

, i.e.,

is true if

Pis true (i.e., ¬Pis false).

The major challenges we face when deriving the necessary condition

¬P

are as follows: (1) the

adversarial perturbation

can be arbitrary as long as its

-norm is no larger than

, which has

inﬁnitely many values, and (2) the complexity of the classiﬁer (e.g., a complex deep neural network)

and the continuity of the random variable

make it hard to compute the adversarial label probabilities.

We propose an innovative method to solve the challenges based on two key observations: (1) the

random variable

reduces to

under no attacks (i.e.,

δ=0

) and (2) the adversarial perturbation

is bounded, i.e.,

kδk2≤R

. Our core idea is to bound the adversarial label probabilities using the

label probabilities. Suppose we have the following bounds for the label probabilities (we propose an

algorithm to estimate such bounds in Section 3.3):

pi≥pi,∀i∈L(x),(3)

pj≤pj,∀j∈ {1,2,· · · , c} \ L(x).(4)

Given the bounds for label probabilities, we derive a lower bound of the adversarial label probability

for each label

i∈L(x)

and an upper bound of the adversarial label probability for each label

j∈ {1,2,· · · , c} \ L(x)

. To derive these bounds, we propose a variant of the Neyman-Pearson

Lemma [

] which enables us to consider multiple functions. In contrast, the standard Neyman-

Pearson Lemma [

] is insufﬁcient as it is only applicable to a single function while the base

multi-label classiﬁer outputs multiple labels.

We give an overview of our derivation of the bounds of the adversarial label probabilities and show the

details in the proof of the Theorem 1 in supplementary material. Our idea is to construct some regions

in the domain space of

and

via our variant of the Neyman-Pearson Lemma. Speciﬁcally, given

the constructed regions, we can obtain the lower/upper bounds of the adversarial label probabilities

using the probabilities that the random variable

is in these regions. Note that the probabilities

that the random variables

and

are in these regions can be easily computed as we know their

probability density functions.

Next, we derive a lower bound of the adversarial label probability

p∗

(

i∈L(x)

) as an example to

illustrate our main idea. Our derivation of the upper bound of the adversarial label probability for

a label in

{1,2,· · · , c} \ L(x)

follows a similar procedure. Given a label

i∈L(x)

, we can ﬁnd a

region

via our variant of Neyman-Pearson Lemma [

] such that

Pr(X∈ Ai) = pi

. Then, we

can derive a lower bound of

p∗

via computing the probability of the random variable

in the region

Ai, i.e., we have:

p∗

i≥Pr(Y∈ Ai).(5)

The above lower bound can be further improved via jointly considering multiple labels in

L(x)

Suppose we use

Γu⊆L(x)

to denote an arbitrary set of

labels. We can craft a region

AΓu

via

our variant of Neyman-Pearson Lemma such that we have

Pr(X∈ AΓu) = Pi∈Γupi

. Then, we can

derive the following lower bound:

max

i∈Γu

p∗

i≥k0

u·Pr(Y∈ AΓu).(6)

The

th largest lower bounds of adversarial label probabilities of labels in

L(x)

can be derived by

combing the lower bounds in Equation 5 and 6. Formally, we have the following theorem:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MultiGuard:ProvablyRobustMulti-labelClassicationagainstAdversarialExamplesJinyuanJiaUniversityofIllinoisUrbana-Champaignjinyuan@illinois.eduWenjieQuHuazhongUniversityofScienceandTechnologywen_jie_qu@outlook.comNeilZhenqiangGongDukeUniversityneil.gong@duke.eduAbstractMulti-labelclassication,which...

展开>> 收起<<

MultiGuard Provably Robust Multi-label Classiﬁcation against Adversarial Examples Jinyuan Jia.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MultiGuard Provably Robust Multi-label Classiﬁcation against Adversarial Examples Jinyuan Jia

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: