
2.2 Robustness as a domain generalisation problem
Domain generalisation – Out-of-Distribution generalisation (OoD) is an approach to dealing with
(typically non-adversarial) distributional shifts. In the domain generalisation setting, the training
data is assumed to come from several different domains, each with a different data distribution. The
goal is to use the variability across training (or seen) domains to learn a model that can generalise to
unseen domains while performing well on the seen domains. In other words, the goal is for the model
to have consistent performance by learning to be invariant under distributional shifts. Typically, we
also assume access to domain labels, i.e. we know which domain each data point belongs to. Many
methods for domain generalisation have been proposed – see (Wang et al.,2021) for a survey.
Our work views adversarial robustness as a domain generalisation problem, where the domains stem
from different adversarial attacks. Because different attacks use different methods of searching for
adversarial examples, and sometimes different search spaces, they may produce different distributions
of adversarial examples
2
. One might draw an analogy to Hendrycks & Dietterich (2019)’s work on
natural pertubations, where both the type and the strength of the perturbations play a similar role
as varying the attacks or their tuning, respectively. There are several reasons why the domains we
consider may be distributionally shifted with one another (although the distributions may have some
overlap). To non-exhaustively name a few, first, we already evoked how different
p
-norms affect the
distributions of adversarial examples yielded by PGD (Khoury & Hadfield-Menell,2018;Tramèr &
Boneh,2019). Second, different attacks may optimise different losses – for example when comparing
P2
and
L2
CW – which may yield different solutions. Third, the same attack tuned differently (e.g.
different
ϵ
or iteration budget) may yield different distributions of adversarial examples since they do
not have the same support. Therefore, robustness to attacks unseen during training means robustness
against the corresponding distributional shifts at test time. It is natural to frame adversarial robustness
as a domain generalisation problem, as we seek a model that is robust to any method to generate
adversarially distributional shifts within a threat model, including novel attacks.
In spite of this intuition, it is not obvious that such methods would work in the case of adversarial
machine learning. First, recent work demonstrates that domain generalisation methods often fail
to improve upon the standard empirical risk minimisation (ERM), i.e. minimising loss on the
combined training domains without making use of domain labels (Gulrajani & Lopez-Paz,2020). On
the other hand, success may depend on choosing a method appropriate for the type of shifts at play.
Second, a key difference with most work in domain generalisation, is that when adversarially training,
the training distribution shifts every epoch, as the attacks are computed from the continuously-
updated values of the weights. In contrast, in domain generalisation, the training domains are usually
fixed. Non-stationarity is known to cause generalisation failure in many areas of machine learning,
notably reinforcement learning (Igl et al.,2020), thereby potentially affecting the success of domain
generalisation methods in adversarial machine learning. Third, MSD does not generate multiple
domains, which domain generalisation approaches would typically require.
We note that interestingly, the Avg approach of Tramèr & Boneh (2019) can be interpreted as doing
domain generalisation with ERM over the 3 PGD adversaries as training domains. Similarly, the
max approach consists in applying the Robust Optimisation approach on the same set of domains.
Furthermore, Song et al. (2018) and Bashivan et al. (2021) propose to treat the clean and PGD-
perturbed data as training and testing domains from which some samples are accessible during
training, and adopt domain adaptation approaches. Therefore, it is difficult to predict in advance how
much a domain generalisation approach can successfully improve adversarial defenses.
In this work, we apply the method of variance-based risk extrapolation (REx) (Krueger et al.,
2021), which simply adds as a loss penalty the variance of the ERM loss across different domains.
This encourages worst-case robustness over more extreme versions of the shifts (here, shifts are
between different attacks) observed between the training domains. This can be motivated in the
setting of adversarial robustness by the observation that adversaries might shift their distribution
of attacks to better exploit vulnerabilities in a model. In that light, REx is particularly appropriate
given our objective of mitigating trade-offs in performance between different attacks to achieve
a more consistent degree of robustness. We note that our implementation of REx has the same
computational complexity per epoch as the MSD, Avg and max approaches, requiring the computation
of 3 adversarial perturbations per sample.
2
Another way to think about this, is that if different attacks or tunings yielded identical distributions, then
standard results from statistical learning theory would imply similar performance on the various attacks.
4