
2 Related Work
In this section, we first introduce the standard adversarial training with a single type of perturbation, as
well as its theoretical analysis. We then introduce the adversarial training against multiple perturbations.
Adversarial training Adversarial training (AT) has been demonstrated to be one of the most effective
ways to increase the adversarial robustness (Szegedy et al.,2013). The key idea of AT is to augment the
training set with adversarial examples during training. Currently, most AT-based methods are trained with
a single type of adversarial examples, and the `p(p=1, 2, or ∞) is commonly used to generate adversarial
examples during training (Madry et al.,2017). It is shown that AT overfits the adversarial examples on
the training set and generalizes badly on the test sets. Many approaches have been proposed to increase
the adversarial generalization (Raghunathan et al.,2019;Schmidt et al.,2018). Meanwhile, there have been
some attempts for the theoretical understanding of adversarial training, mainly focusing on the convergence
properties and generalization bound. For example, the work of (Gao et al.,2019) studies the convergence of
adversarial training in the neural tangent kernel (NTK) regime. Liu et al. study the smoothness of the loss
function of adversarial training (Liu et al.,2020b). In terms of generalization bound, the work of (Yin et al.,
2019;Awasthi et al.,2020) study the generalization bound in terms of Rademacher complexity. The work of
(Gao et al.,2019) considers the VC-dimension bound of adversarial training. Xing et al. (Xing et al.,2021)
study the generalization of adversarial linear regression.
Adversarial robustness against multiple perturbations models Recently, some works have demon-
strated that adversarial training with a single type of perturbation cannot provide well defense against other
types of adversarial attacks (Tramèr and Boneh,2019) and several ATMP algorithms have been proposed
accordingly (Maini et al.,2020;Madaan et al.,2020;Zhang et al.,2021;Stutz et al.,2020). The work
of (Tramèr and Boneh,2019) proposed to augment different types of adversarial examples into adversarial
training and developed two augmentation strategies, i.e., MAX and AVG. The MAX adopts the worst-case
adversarial example among different attacks, while the AVG takes all types of adversarial examples into
training. Following the above pipeline, some later works developed different aggregation strategies (e.g.,
the MSD (Maini et al.,2020), and SAT (Madaan et al.,2020)) for better robustness or training efficiency.
While these works can boost the adversarial robustness against multiple perturbations to some extent, the
training process of ATMP is highly unstable, and there is no theoretical analysis about this. The theoretical
understanding of the training difficulty of ATMP is important for the further development of adversarial
robustness for multiple perturbations. Besides, there have also been some other works for adversarial ro-
bustness against multiple perturbations, such as Ensemble models (Maini et al.,2021;Cheng et al.,2021),
Prepossessing (Nandy et al.,2020) and Neural architectures search (NAS) (Liu et al.,2020a). The weakness
of ensemble models or prepossess methods is that the performance is highly related to the classification
quality or detection of different types of adversarial examples. These methods either have lower performance
or consider different tasks from the work we considered. Therefore, we mainly compare the algorithms MAX,
AVG, MSD, and SAT in this work.
3 Preliminaries of Adversarial Training for Multiple Perturbations
Adversarial training is an approach to train a classifier that minimizes the worst-case loss within a norm-
bounded constraint. Let g(θ, z)be the loss function of the standard counterpart. Given training dataset
S={zi}i=1···n, the optimization problem of adversarial training is
min
θ
1
n
n
X
i=1
max
kzi−z0
ikp≤p
g(θ, z0
i),(3.1)
where pis the perturbation threshold, p= 1,2or ∞for different types of attacks. Usually, gcan also be
written in the form of `(fθ(x), y), where fθis the neural network to be trained and (x, y)is the input-label
3