
2 Adversarial Robustness
DNNs are susceptible to adversarial perturbations (Szegedy et al., 2013; Biggio et al., 2013). In the
adversarial setting, the adversary adds a small imperceptible noise to the image, which fools the
network into making an incorrect prediction. To ensure that the adversarial noise is imperceptible to
the human eye, usually noise with bounded
ℓp
-norms have been studied (Sharif et al., 2018). In such
settings, the objective for the adversary is to maximize the following loss:
max
|δ|p≤ϵXent(f(x+δ, w), y),(1)
where
Xent
is the Cross-entropy loss,
δ
is the adversarial perturbation,
x
is the clean example,
y
is
the ground truth label, ϵis the adversarial perturbation budget, and wis the DNN paramater.
A multitude of papers concentrate on the adversarial task and propose methods to generate robust
adversarial examples through various approaches, including the modification of the loss function
and the provision of optimization techniques to effectively optimize the adversarial generation loss
functions (Goodfellow et al., 2014; Madry et al., 2017; Carlini & Wagner, 2017; Izmailov et al.,
2018; Croce & Hein, 2020a; Andriushchenko et al., 2020). An additional area of research centers on
mitigating the impact of potent adversarial examples. While certain studies on adversarial defense
prioritize approaches with theoretical guarantees (Wong & Kolter, 2018; Cohen et al., 2019), in
practical applications, variations of adversarial training have emerged as the prevailing defense
strategy against adversarial attacks (Madry et al., 2017; Shafahi et al., 2019; Wong et al., 2020;
Rebuffi et al., 2021; Gowal et al., 2020). Adversarial training involves on the fly generation of
adversarial examples during the training process and subsequently training the model using these
examples. The adversarial training loss can be formulated as a min-max optimization problem:
min
wmax
|δ|p≤ϵXent(f(x+δ, w), y),(2)
2.1 Robust overfitting and relationship to weight decay
Adversarial training is a strong baseline for defending against adversarial attacks; however, it often
suffers from a phenomenon referred to as Robust Overfitting (Rice et al., 2020). Weight decay
regularization, as discussed in 2.1.1, is a common technique used for preventing overfitting.
2.1.1 Weight Decay
Weight decay encourages weights of networks to have smaller magnitudes (Zhang et al., 2018)
and is widely used to improve generalization. Weight decay regularization can have many forms
(Loshchilov & Hutter, 2017), and we focus on the popular
ℓ2
-norm variant. More precisely, we focus
on classification problems with cross-entropy as the main loss – such as adversarial training – and
weight decay as the regularizer, which was popularized by Krizhevsky et al. (2017):
Lossw(x, y) = Xent(f(x, w), y) + λwd
2∥w∥2
2,(3)
where
w
is the network parameters, (
x
,
y
) is the training data, and
λwd
is the weight-decay hyper-
parameter.
λwd
is a crucial hyper-parameter in weight decay, determining the weight penalty
compared to the main loss (e.g., cross-entropy). A small
λwd
may cause overfitting, while a large
value can yield a low weight-norm solution that poorly fits the training data. Thus, selecting an
appropriate λwd value is essential for achieving an optimal balance.
2.1.2 Robust overfitting phenomenon revisited
To study robust overfitting, we focus on evaluating the
ℓ∞
adversarial robustness on the CIFAR-
10 dataset while limiting the adversarial budget of the attacker to
ϵ= 8
– a common setting for
evaluating robustness. For these experiments, we use a WideResNet 28-10 architecture (Zagoruyko
& Komodakis, 2016) and widely adopted PGD adversarial training (Madry et al., 2017) to solve the
adversarial training loss with weight decay regularization:
min
wmax
|δ|∞≤8Xent(f(x+δ, w), y) + λwd
2∥w∥2
2,(4)
2