MultiGuard Provably Robust Multi-label Classification against Adversarial Examples Jinyuan Jia

2025-05-02 0 0 1.36MB 22 页 10玖币
侵权投诉
MultiGuard: Provably Robust Multi-label
Classification against Adversarial Examples
Jinyuan Jia
University of Illinois Urbana-Champaign
jinyuan@illinois.edu
Wenjie Qu
Huazhong University of Science and Technology
wen_jie_qu@outlook.com
Neil Zhenqiang Gong
Duke University
neil.gong@duke.edu
Abstract
Multi-label classification, which predicts a set of labels for an input, has many ap-
plications. However, multiple recent studies showed that multi-label classification
is vulnerable to adversarial examples. In particular, an attacker can manipulate the
labels predicted by a multi-label classifier for an input via adding carefully crafted,
human-imperceptible perturbation to it. Existing provable defenses for multi-class
classification achieve sub-optimal provable robustness guarantees when general-
ized to multi-label classification. In this work, we propose MultiGuard, the first
provably robust defense against adversarial examples to multi-label classification.
Our MultiGuard leverages randomized smoothing, which is the state-of-the-art
technique to build provably robust classifiers. Specifically, given an arbitrary
multi-label classifier, our MultiGuard builds a smoothed multi-label classifier
via adding random noise to the input. We consider isotropic Gaussian noise in
this work. Our major theoretical contribution is that we show a certain num-
ber of ground truth labels of an input are provably in the set of labels predicted
by our MultiGuard when the
`2
-norm of the adversarial perturbation added to
the input is bounded. Moreover, we design an algorithm to compute our prov-
able robustness guarantees. Empirically, we evaluate our MultiGuard on VOC
2007, MS-COCO, and NUS-WIDE benchmark datasets. Our code is available at:
https://github.com/quwenjie/MultiGuard
1 Introduction
Multi-class classification assumes each input only has one ground truth label and thus often predicts
a single label for an input. In contrast, in multi-label classification [
42
,
41
,
35
,
43
], each input has
multiple ground truth labels and thus a multi-label classifier predicts a set of labels for an input.
For instance, an image could have multiple objects, attributes, or scenes. Multi-label classification
has many applications such as diseases detection [
16
], object recognition [
43
], retail checkout
recognition [18], document classification [34], etc..
However, similar to multi-class classification, multiple recent studies [
56
,
53
,
30
] showed that
multi-label classification is also vulnerable to adversarial examples. In particular, an attacker can
manipulate the set of labels predicted by a multi-label classifier for an input via adding carefully
crafted perturbation to it. Adversarial examples pose severe security threats to the applications of
multi-label classification in security-critical domains. To mitigate adversarial examples to multi-label
Equal contribution. Wenjie Qu performed this research when he was a remote intern in Gong’s group.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.01111v1 [cs.CR] 3 Oct 2022
classification, several empirical defenses [
49
,
1
,
30
] have been proposed. For instance, Melacci et
al. [
30
] proposed to use the domain knowledge on the relationships among the classes to improve
the robustness of multi-label classification. However, these defenses have no provable robustness
guarantees, and thus they are often broken by more advanced attacks. For instance, Melacci et al. [
30
]
showed that their proposed defense can be broken by an adaptive attack that exploits the domain
knowledge used in the defense. Moreover, existing provably robust defenses [
10
,
8
,
17
,
46
,
12
,
22
]
are all for multi-class classification, which achieve sub-optimal provable robustness guarantee when
extended to multi-label classification as shown by our experimental results.
Our work:
We propose MultiGuard, the first provably robust defense against adversarial examples
for multi-label classification. MultiGuard leverages randomized smoothing [
5
,
29
,
24
,
26
,
12
], which
is the state-of-the-art technique to build provably robust classifiers. In particular, compared to other
provably robust techniques, randomized smoothing has two advantages: 1) scalable to large-scale
neural networks, and 2) applicable to any classifiers. Suppose we have an arbitrary multi-label
classifier (we call it base multi-label classifier), which predicts
k0
labels for an input. We build a
smoothed multi-label classifier via randomizing an input. Specifically, given an input, we first create
arandomized input via adding random noise to it. We consider the random noise to be isotropic
Gaussian in this work. Then, we use the base multi-label classifier to predict labels for the randomized
input. Due to the randomness in the randomized input, the
k0
labels predicted by the base multi-label
classifier are also random. We use
pi
to denote the probability that the label
i
is among the set of
k0
labels predicted by the base multi-label classifier for the randomized input, where i∈ {1,2,· · · , c}.
We call
pi
label probability. Our smoothed multi-label classifier predicts the
k
labels with the largest
label probabilities for the input. We note that k0and kare two different parameters.
Our main theoretical contribution is to show that, given a set of labels (e.g., the ground truth labels)
for an input, at least
e
of them are provably in the set of
k
labels predicted by MultiGuard for
the input, when the
`2
-norm of the adversarial perturbation added to the input is no larger than a
threshold. We call
e
certified intersection size. We aim to derive the certified intersection size for
MultiGuard. However, existing randomized smoothing studies [
24
,
12
,
22
] achieves sub-optimal
provable robustness guarantees when generalized to derive our certified intersection size. The
key reason is they were designed for multi-class classification instead of multi-label classification.
Specifically, they can guarantee that a smoothed multi-class classifier provably predicts the same
single label for an input [
24
,
12
] or a certain label is provably among the top-
k
labels predicted by
the smoothed multi-class classifier [
22
]. In contrast, our certified intersection size characterizes the
intersection between the set of ground truth labels of an input and the set of labels predicted by a
smoothed multi-label classifier. In fact, previous provable robustness results [
12
,
22
] are special cases
of ours, e.g., our results reduce to Cohen et al. [
12
] when
k0=k= 1
and Jia et al. [
22
] when
k0= 1
.
In particular, there are two challenges in deriving the certified intersection size. The first challenge
is that the base multi-label classifier predicts multiple labels for an input. The second challenge is
that an input has multiple ground truth labels. To solve the first challenge, we propose a variant of
Neyman-Pearson Lemma [
33
] that is applicable to multiple functions, which correspond to multiple
labels predicted by the base multi-label classifier. In contrast, existing randomized smoothing
studies [24, 26, 12, 22] for multi-class classification use the standard Neyman-Pearson Lemma [33]
that is only applicable for a single function, since their base multi-class classifier predicts a single
label for an input. To address the second challenge, we propose to use the law of contraposition
to simultaneously consider multiple ground truth labels of an input when deriving the certified
intersection size.
Our derived certified intersection size is the optimal solution to an optimization problem, which
involves the label probabilities. However, it is very challenging to compute the exact label probabilities
due to the continuity of the isotropic Gaussian noise and the complexity of the base multi-label
classifiers (e.g., complex deep neural networks). In response, we design a Monte Carlo algorithm
to estimate the lower or upper bounds of label probabilities with probabilistic guarantees. More
specifically, we can view the estimation of lower or upper bounds of label probabilities as a binomial
proportion confidence interval estimation problem in statistics. Therefore, we use the Clopper-
Pearson [
11
] method from the statistics community to obtain the label probability bounds. Given the
estimated lower or upper bounds of label probabilities, we design an efficient algorithm to solve the
optimization problem to obtain the certified intersection size.
2
Empirically, we evaluate our MultiGuard on VOC 2007, MS-COCO, and NUS-WIDE benchmark
datasets. We use the certified top-
k
precision@
R
,certified top-
k
recall@
R
, and certified top-
k
f1-score@
R
to evaluate our MultiGuard. Roughly speaking, certified top-
k
precision@
R
is the least
fraction of the
k
predicted labels that are ground truth labels of an input when the
`2
-norm of the
adversarial perturbation is at most
R
; certified top-
k
recall@
R
is the least fraction of ground truth
labels of an input that are in the set of
k
labels predicted by our MultiGuard; and certified top-
k
f1-score@
R
is the harmonic mean of certified top-
k
precision@
R
and certified top-
k
recall@
R
. Our
experimental results show that our MultiGuard outperforms the state-of-the-art certified defense [
22
]
when extending it to multi-label classification. For instance, on VOC 2007 dataset, Jia et al. [
22
]
and our MultiGuard respectively achieve 24.3% and 31.3% certified top-
k
precision@
R
, 51.6%
and 66.4% certified top-
k
recall@
R
, as well as 33.0% and 42.6% certified top-
k
f1-score@
R
when
k0= 1,k= 3, and R= 0.5.
Our major contributions can be summarized as follows:
We propose MultiGuard, the first provably robust defense against adversarial examples for multi-
label classification.
We design a Monte Carlo algorithm to compute the certified intersection size.
We evaluate our MultiGuard on VOC 2007, MS-COCO, and NUS-WIDE benchmark datasets.
2 Background and Related Work
Multi-label classification:
In multi-label classification, a multi-label classifier predicts multiple
labels for an input. Many deep learning classifiers [
27
,
52
,
43
,
45
,
57
,
32
,
21
,
7
,
54
,
48
,
2
,
13
] have
been proposed for multi-label classification. For instance, a naive method for multi-label classification
is to train independent binary classifiers for each label and use ranking or thresholding to derive
the final predicted labels. This method, however, ignores the topology structure among labels and
thus cannot capture the label co-occurrence dependency (e.g., mouse and keyboard usually appear
together). In response, several methods [
43
,
7
] have been proposed to improve the performance of
multi-label classification via exploiting the label dependencies in an input. Despite their effectiveness,
these methods rely on complicated architecture modifications. To mitigate the issue, some recent
studies [
48
,
2
] proposed to design new loss functions. For instance, Baruch et al. [
2
] introduced an
asymmetric loss (ASL). Roughly speaking, their method is based on the observation that, in multi-
label classification, most inputs contain only a small fraction of the possible candidate labels, which
leads to under-emphasizing gradients from positive labels during training. Their experimental results
indicate that their method achieves state-of-the-art performance on multiple benchmark datasets.
Adversarial examples to multi-label classification:
Several recent studies [
40
,
56
,
53
,
30
,
20
]
showed that multi-label classification is vulnerable to adversarial examples. An attacker can manip-
ulate the set of labels predicted by a multi-label classifier for an input via adding carefully crafted
perturbation to it. For instance, Song et al. [
40
] proposed white-box, targeted attacks to multi-label
classification. In particular, they first formulate their attacks as optimization problems and then use
gradient descent to solve them. Their experimental results indicate that they can make a multi-label
classifier produce an arbitrary set of labels for an input via adding adversarial perturbation to it. Yang
et al. [
53
] explored the worst-case mis-classification risk of a multi-label classifier. In particular, they
formulate the problem as a bi-level set function optimization problem and leverage random greedy
search to find an approximate solution. Zhou et al. [
56
] proposed to generate
`
-norm adversarial
perturbations to fool a multi-label classifier. In particular, they transform the optimization problem of
finding adversarial perturbations into a linear programming problem which can be solved efficiently.
Existing empirically robust defenses:
Some studies [
49
,
1
,
30
] developed empirical defenses to
mitigate adversarial examples in multi-label classification. For instance, Wu et al. [
49
] applied
adversarial training, a method developed to train robust multi-class classifiers, to improve the
robustness of multi-label classifiers. Melacci et al. [
30
] showed that domain knowledge, which
measures the relationships among classes, can be used to detect adversarial examples and improve the
robustness of multi-label classifiers. However, all these defenses lack provable robustness guarantees
and thus, they are often broken by advanced adaptive attacks. For instance, Melacci et al. [
30
] showed
that their defenses can be broken by adaptive attacks that also consider the domain knowledge.
Existing provably robust defenses:
All existing provably robust defenses [
37
,
10
,
6
,
19
,
8
,
17
,
46
,
4
,
24
,
12
,
26
,
36
,
23
,
47
,
39
,
31
,
44
,
38
,
55
,
50
] were designed for multi-class classification instead
of multi-label classification. In particular, they can guarantee that a robust multi-class classifier
3
predicts the same single label for an input or a label (e.g., the single ground truth label of the input) is
among the top-
k
labels predicted by a robust multi-class classifier. These defenses are sub-optimal
for multi-label classification. Specifically, in multi-label classification, we aim to guarantee that at
least some ground truth labels of an input are in the set of labels predicted by a robust multi-label
classifier.
MultiGuard leverages randomized smoothing [
24
,
26
,
12
,
22
,
51
]. Existing randomized smoothing
studies (e.g., Jia et al. [
22
]) achieve sub-optimal provable robustness guarantees (i.e., certified
intersection size) for multi-label classification, because they are designed for multi-class classification.
For example, as our empirical evaluation results will show, MultiGuard significantly outperforms
Jia et al. [
22
] when extending it to multi-label classification. Technically speaking, our work has
two key differences with Jia et al.. First, the base multi-class classifier in Jia et al. only predicts a
single label for an input while our base multi-label classifier predicts multiple labels for an input.
Second, Jia et al. can only guarantee that a single label is provably among the
k
labels predicted by a
smoothed multi-class classifier, while we aim to show that multiple labels (e.g., ground truth labels of
an input) are provably among the
k
labels predicted by a smoothed multi-label classifier. Due to such
key differences, we require new techniques to derive the certified intersection size of MultiGuard.
For instance, we develop a variant of Neyman-Pearson Lemma [
33
] which is applicable to multiple
functions while Jia et al. uses the standard Neyman-Pearson Lemma [
33
] which is only applicable to
a single function. Moreover, we use the law of contraposition to derive our certified intersection size,
which is not required by Jia et al..
3 Our MultiGuard
3.1 Building our MultiGuard
Label probability:
Suppose we have a multi-label classifier
f
which we call base multi-label
classifier. Given an input
x
, the base multi-label classifier
f
predicts
k0
labels for it. For simplicity,
we use
fk0(x)
to denote the set of
k0
labels predicted by
f
for
x
. We use
to denote an isotropic
Gaussian noise, i.e.,
N (0, σ2·I)
, where
σ
is the standard deviation and
I
is an identity matrix.
Given
x+
as input, the output of
f
would be random due to the randomness of
, i.e.,
fk0(x+)
is
a random set of
k0
labels. We define label probability
pi
as the probability that the label
i
is among
the set of top-
k0
labels predicted by
f
when adding isotropic Gaussian noise to an input
x
, where
i∈ {1,2,· · · , c}. Formally, we have pi=Pr(ifk0(x+)).
Our smoothed multi-label classifier:
Given the label probability
pi
s for an input
x
, our smoothed
multi-label classifier
g
predicts the
k
labels with the largest label probabilities for
x
. For simplicity,
we use
gk(x)
to denote the set of
k
labels predicted by our smoothed multi-label classifier for an
input x.
Certified intersection size:
An attacker adds a perturbation
δ
to an input
x
.
gk(x+δ)
is the set of
k
labels predicted by our smoothed multi-label classifier for the perturbed input
x+δ
. Given a set
of labels
L(x)
(e.g., the ground truth labels of
x
), our goal is to show that at least
e
of them are in
the set of
k
labels predicted by our smoothed multi-label classifier for the perturbed input, when the
`2-norm of the adversarial perturbation is at most R. Formally, we aim to show the following:
min
δ,kδk2R|L(x)gk(x+δ)| ≥ e, (1)
where we call
e
certified intersection size. Note that different inputs may have different certified
intersection sizes.
3.2 Deriving the Certified Intersection Size
Defining two random variables:
Given an input
x
, we define two random variables
X=x+, Y=
x+δ+
, where
δ
is an adversarial perturbation and
is isotropic Gaussian noise. Roughly speaking,
the random variables
X
and
Y
respectively denote the inputs derived by adding isotropic Gaussian
noise to the input
x
and its adversarially perturbed version
x+δ
. Based on the definition of the
label probability, we have
pi=Pr(ifk0(X))
. We define adversarial label probability
p
i
as
p
i=Pr(ifk0(Y)), i ∈ {1,2,· · · , c}
. Intuitively, adversarial label probability
p
i
is the probability
that the label
i
is in the set of
k0
labels predicted by the base multi-label classifier
f
for
Y
. Given an
adversarially perturbed input
x+δ
, our smoothed multi-label classifier predicts the
k
labels with the
largest adversarial label probabilities p
is for it.
4
Derivation sketch:
We leverage the law of contraposition in our derivation. Roughly speaking, if we
have a statement:
PQ
, then its contrapositive is:
¬Q→ ¬P
, where
¬
is the logical negation
symbol. The law of contraposition claims that a statement is true if and only if its contrapositive is
true. In particular, we define the following predicate:
Q: min
δ,kδk2R|L(x)gk(x+δ)| ≥ e. (2)
Intuitively,
Q
is true if at least
e
labels in
L(x)
can be found in
gk(x+δ)
for an arbitrary adversarial
perturbation
δ
whose
`2
-norm is no larger than
R
. Then, we have
¬Q: minδ,kδk2R|L(x)
gk(x+δ)|< e
. Moreover, we derive a necessary condition (denoted as
¬P
) for
¬Q
to be true, i.e.,
¬Q→ ¬P
. Roughly speaking,
¬P
compares upper bounds of the adversarial label probabilities
of the labels in
{1,2,· · · , c} \ L(x)
with lower bounds of those in
L(x)
. More specifically,
¬P
represents that the lower bound of the
e
th largest adversarial label probability of labels in
L(x)
is no
larger than the upper bound of the
(ke+ 1)
th largest adversarial label probability of the labels in
{1,2,· · · , c} \ L(x)
. Finally, based on the law of contraposition, we have
PQ
, i.e.,
Q
is true if
Pis true (i.e., ¬Pis false).
The major challenges we face when deriving the necessary condition
¬P
are as follows: (1) the
adversarial perturbation
δ
can be arbitrary as long as its
`2
-norm is no larger than
R
, which has
infinitely many values, and (2) the complexity of the classifier (e.g., a complex deep neural network)
and the continuity of the random variable
Y
make it hard to compute the adversarial label probabilities.
We propose an innovative method to solve the challenges based on two key observations: (1) the
random variable
Y
reduces to
X
under no attacks (i.e.,
δ=0
) and (2) the adversarial perturbation
δ
is bounded, i.e.,
kδk2R
. Our core idea is to bound the adversarial label probabilities using the
label probabilities. Suppose we have the following bounds for the label probabilities (we propose an
algorithm to estimate such bounds in Section 3.3):
pipi,iL(x),(3)
pjpj,j∈ {1,2,· · · , c} \ L(x).(4)
Given the bounds for label probabilities, we derive a lower bound of the adversarial label probability
for each label
iL(x)
and an upper bound of the adversarial label probability for each label
j∈ {1,2,· · · , c} \ L(x)
. To derive these bounds, we propose a variant of the Neyman-Pearson
Lemma [
33
] which enables us to consider multiple functions. In contrast, the standard Neyman-
Pearson Lemma [
33
] is insufficient as it is only applicable to a single function while the base
multi-label classifier outputs multiple labels.
We give an overview of our derivation of the bounds of the adversarial label probabilities and show the
details in the proof of the Theorem 1 in supplementary material. Our idea is to construct some regions
in the domain space of
X
and
Y
via our variant of the Neyman-Pearson Lemma. Specifically, given
the constructed regions, we can obtain the lower/upper bounds of the adversarial label probabilities
using the probabilities that the random variable
Y
is in these regions. Note that the probabilities
that the random variables
X
and
Y
are in these regions can be easily computed as we know their
probability density functions.
Next, we derive a lower bound of the adversarial label probability
p
i
(
iL(x)
) as an example to
illustrate our main idea. Our derivation of the upper bound of the adversarial label probability for
a label in
{1,2,· · · , c} \ L(x)
follows a similar procedure. Given a label
iL(x)
, we can find a
region
Ai
via our variant of Neyman-Pearson Lemma [
33
] such that
Pr(X∈ Ai) = pi
. Then, we
can derive a lower bound of
p
i
via computing the probability of the random variable
Y
in the region
Ai, i.e., we have:
p
iPr(Y∈ Ai).(5)
The above lower bound can be further improved via jointly considering multiple labels in
L(x)
.
Suppose we use
ΓuL(x)
to denote an arbitrary set of
u
labels. We can craft a region
AΓu
via
our variant of Neyman-Pearson Lemma such that we have
Pr(X∈ AΓu) = PiΓupi
k0
. Then, we can
derive the following lower bound:
max
iΓu
p
ik0
u·Pr(Y∈ AΓu).(6)
The
e
th largest lower bounds of adversarial label probabilities of labels in
L(x)
can be derived by
combing the lower bounds in Equation 5 and 6. Formally, we have the following theorem:
5
摘要:

MultiGuard:ProvablyRobustMulti-labelClassicationagainstAdversarialExamplesJinyuanJiaUniversityofIllinoisUrbana-Champaignjinyuan@illinois.eduWenjieQuHuazhongUniversityofScienceandTechnologywen_jie_qu@outlook.comNeilZhenqiangGongDukeUniversityneil.gong@duke.eduAbstractMulti-labelclassication,which...

展开>> 收起<<
MultiGuard Provably Robust Multi-label Classification against Adversarial Examples Jinyuan Jia.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:1.36MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注