
Previous work in margin-aware adversarial training [
41
,
42
,
39
,
2
,
9
] typically substitutes the robust
loss
ˆ
Ltr
for
Ltr
and largely focuses on designing heuristic functions of various notions of margin to
use for the sample weight wi.
For example, in GAIRAT [
42
,
41
,
9
], the margin is defined as the least number of PGD steps, denoted
κ
, that leads the classifier to make an incorrect prediction. The sample’s weight is computed as
ωGAIRAT(xi) = 1
2(1 + tanh(λ+ 5(1 −2κ/K)))
with hyperparameters
K
and
λ
. A small
κ
indicates
that the sample lies close to the decision boundary. Larger
κ
values imply that associated samples lie
far from the decision boundary, and are therefore more robust, requiring smaller weights. However,
due to the non-linearity of the loss-surface in practice, PGD-based attacks with finite iterations may
suffer from the same issues that plague standard iterative first-order methods in non-convex settings.
In other words,
κ
is heavily dependent on the optimization path taken by PGD. This is demonstrated
by GAIRAT’s vulnerability to sophisticated attacks, e.g. AutoAttack [8].
Zhang et al.
[41]
define the margin as the difference between the loss of a network suffered at a
clean sample and its adversarial variant. Zeng et al.
[39]
, Wang et al.
[30]
, Balaji et al.
[2]
propose a
definition of margin corresponding to taking differences between logits, as follows.
Definition 2.2
(Zeng et al.
[39]
, Wang et al.
[30]
)
.
The margin of a classifier
f
on sample
(x, y)
is
the difference between the confidence of
f
in the true label
y
and the maximal probability of an
incorrect label t, margin(x, y;θ) = p(f(x;θ) = y)−maxt6=yp(f(x;θ) = t).
Given this definition, Zeng et al.
[39]
, Wang et al.
[30]
propose to use exponential (WMMR)
and sigmoidal (MAIL) functions respectively:
ωWMMR(xi) = exp(−αm)
with parameter
α
, and
ωMAIL(xi) = sigmoid(−γ(m−β))
with parameters
γ
and
β
. WMMR and MAIL rely on the local
linearity of ReLU networks and that for samples near the margin, the relative scale of predicted
class-likelihoods directly corresponds to the distance to the decision boundary. However, similarly to
GAIRAT’s
κ
, even for samples very close to the decision boundary, simple functions of the difference
between class likelihoods may not necessarily correspond to the true distance to the decision boundary.
In contrast, we propose a more fine-grained notion of margin, the multi-class margin, and a method
to learn a mapping between the margin at a sample and its associated weight, rather than use a
predefined heuristic function.
Previous work has explored theoretical notions of a multi-class margin. For example, Zou
[43]
defined the margin vector in the context of boosting as a proxy for a vector of conditional class
probabilities. However, this notion of margin is unaware of the true class of a sample. In contrast,
the multi-class margin proposed by Saberian and Vasconcelos
[24]
, Cortes et al.
[6]
are both closely
related to Wang et al.
[30]
, Zeng et al.
[39]
, i.e. defined as the minimal distance between an arbitrary
predicted logit and the logit of the true class.
In Fig. 1 we explore the relationship between the logits of a network evaluated at a clean sample
and the predicted class of the adversarially perturbed variant. Methods which rely on the canonical
notions of margin reasonably assume that samples at which a classifier is vulnerable have small
margin according to Def. 2.2, i.e. the magnitude of the smallest difference between the logits of
any class and the logit corresponding the true class is small. However, we demonstrate in Fig. 1(b)
that a significant number of predictions made by vulnerable classifiers on perturbed samples do not
correspond to the classes with minimal margin. In other words, the class for which the margin is
smallest does not always correspond to the adversarial class. Furthermore, this issue is exacerbated
for robust networks as shown by the difference in count distribution between networks whose relative
robustness varies.
2.3 Bi-level Optimization and Meta-learning
Bilevel optimization, first introduced by Bracken and McGill
[4]
is an optimization framework
involving nested optimization problems. A typical bilevel optimization problem takes on the form:
min
x∈RpΦ(x) := f(x, y∗(x)) s.t. y∗∈arg min
y∈Rp
g(x, y),(3)
where
f
and
g
are respectively denoted the upper-level and lower-level objectives. The goal of the
framework is to minimize the primary objective
Φ(x)
with respect to
x
where
y∗(x)
is obtained
by solving the lower-level minimization problem. The framework of bilevel optimization has seen
adoption by the machine learning community—in particular in the context of hyperparameter tuning
4