classification, several empirical defenses [
49
,
1
,
30
] have been proposed. For instance, Melacci et
al. [
30
] proposed to use the domain knowledge on the relationships among the classes to improve
the robustness of multi-label classification. However, these defenses have no provable robustness
guarantees, and thus they are often broken by more advanced attacks. For instance, Melacci et al. [
30
]
showed that their proposed defense can be broken by an adaptive attack that exploits the domain
knowledge used in the defense. Moreover, existing provably robust defenses [
10
,
8
,
17
,
46
,
12
,
22
]
are all for multi-class classification, which achieve sub-optimal provable robustness guarantee when
extended to multi-label classification as shown by our experimental results.
Our work:
We propose MultiGuard, the first provably robust defense against adversarial examples
for multi-label classification. MultiGuard leverages randomized smoothing [
5
,
29
,
24
,
26
,
12
], which
is the state-of-the-art technique to build provably robust classifiers. In particular, compared to other
provably robust techniques, randomized smoothing has two advantages: 1) scalable to large-scale
neural networks, and 2) applicable to any classifiers. Suppose we have an arbitrary multi-label
classifier (we call it base multi-label classifier), which predicts
k0
labels for an input. We build a
smoothed multi-label classifier via randomizing an input. Specifically, given an input, we first create
arandomized input via adding random noise to it. We consider the random noise to be isotropic
Gaussian in this work. Then, we use the base multi-label classifier to predict labels for the randomized
input. Due to the randomness in the randomized input, the
k0
labels predicted by the base multi-label
classifier are also random. We use
pi
to denote the probability that the label
i
is among the set of
k0
labels predicted by the base multi-label classifier for the randomized input, where i∈ {1,2,· · · , c}.
We call
pi
label probability. Our smoothed multi-label classifier predicts the
k
labels with the largest
label probabilities for the input. We note that k0and kare two different parameters.
Our main theoretical contribution is to show that, given a set of labels (e.g., the ground truth labels)
for an input, at least
e
of them are provably in the set of
k
labels predicted by MultiGuard for
the input, when the
`2
-norm of the adversarial perturbation added to the input is no larger than a
threshold. We call
e
certified intersection size. We aim to derive the certified intersection size for
MultiGuard. However, existing randomized smoothing studies [
24
,
12
,
22
] achieves sub-optimal
provable robustness guarantees when generalized to derive our certified intersection size. The
key reason is they were designed for multi-class classification instead of multi-label classification.
Specifically, they can guarantee that a smoothed multi-class classifier provably predicts the same
single label for an input [
24
,
12
] or a certain label is provably among the top-
k
labels predicted by
the smoothed multi-class classifier [
22
]. In contrast, our certified intersection size characterizes the
intersection between the set of ground truth labels of an input and the set of labels predicted by a
smoothed multi-label classifier. In fact, previous provable robustness results [
12
,
22
] are special cases
of ours, e.g., our results reduce to Cohen et al. [
12
] when
k0=k= 1
and Jia et al. [
22
] when
k0= 1
.
In particular, there are two challenges in deriving the certified intersection size. The first challenge
is that the base multi-label classifier predicts multiple labels for an input. The second challenge is
that an input has multiple ground truth labels. To solve the first challenge, we propose a variant of
Neyman-Pearson Lemma [
33
] that is applicable to multiple functions, which correspond to multiple
labels predicted by the base multi-label classifier. In contrast, existing randomized smoothing
studies [24, 26, 12, 22] for multi-class classification use the standard Neyman-Pearson Lemma [33]
that is only applicable for a single function, since their base multi-class classifier predicts a single
label for an input. To address the second challenge, we propose to use the law of contraposition
to simultaneously consider multiple ground truth labels of an input when deriving the certified
intersection size.
Our derived certified intersection size is the optimal solution to an optimization problem, which
involves the label probabilities. However, it is very challenging to compute the exact label probabilities
due to the continuity of the isotropic Gaussian noise and the complexity of the base multi-label
classifiers (e.g., complex deep neural networks). In response, we design a Monte Carlo algorithm
to estimate the lower or upper bounds of label probabilities with probabilistic guarantees. More
specifically, we can view the estimation of lower or upper bounds of label probabilities as a binomial
proportion confidence interval estimation problem in statistics. Therefore, we use the Clopper-
Pearson [
11
] method from the statistics community to obtain the label probability bounds. Given the
estimated lower or upper bounds of label probabilities, we design an efficient algorithm to solve the
optimization problem to obtain the certified intersection size.
2