
Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games
2 RELATED WORK
2.1 Adversarial Robustness
Variants of adversarial training methods have been pro-
posed to improve adversarial robustness of neural networks
(Zhang et al., 2019; Shafahi et al., 2019; Rice et al., 2020;
Wong et al., 2019; Xie et al., 2019; Qin et al., 2019).
Recent works utilize extra unlabeled data (Carmon et al.,
2019; Zhai et al., 2019; Deng et al., 2021; Rebuffi et al.,
2021) or synthetic data from a generative model (Gowal
et al., 2021; Sehwag et al., 2021) to improve the robust
accuracy. Another line of works consider ensembling
techniques (Tram`
er et al., 2018; Sen et al., 2019; Pang
et al., 2019; Zhang et al., 2022). A line of theoretical works
analyzed adversarial robustness by linear models, from the
trade-off between robustness and accuracy (Tsipras et al.,
2018; Javanmard et al., 2020; Raghunathan et al., 2020), to
the generalization property (Schmidt et al., 2018). Recent
works further analyze a more complex class of models
such as 2-layer neural networks (Allen-Zhu and Li, 2022;
Bubeck et al., 2021; Bartlett et al., 2021; Bubeck and
Sellke, 2021).
Specifically, prior works considered adversarial robustness
as a 2-player zero-sum game (Pal and Vidal, 2020; Meu-
nier et al., 2021; Bose et al., 2020; Bul`
o et al., 2016; Per-
domo and Singer, 2019; Pinot et al., 2020). For instance,
Pal and Vidal (2020) proved that randomized smoothing
(Cohen et al., 2019) and FGSM attack (Goodfellow et al.,
2015) form Nash equilibria. Bose et al. (2020) introduced
a framework to find adversarial examples that transfer to
an unseen model in the same hypothesis class. Pinot et al.
(2020) shows the non-existence of a Nash equilibrium in
the adversarial robustness game when the classifier and the
Adversary are both deterministic. However, our settings
are different from prior papers. The key assumption in
previous work (Pinot et al., 2020) is that an adversary is
regularized and would not attack if the adversarial exam-
ple does not change the model prediction. We consider the
case where the adversary attacks even though the adversar-
ial example does not change the model prediction (as in the
standard adversarial training).
While most works focused on the existence of Nash equi-
librium and proposed algorithms that converge to the equi-
librium, to the best of our knowledge, no prior works
showed that a Nash equilibrium is robust.
2.2 Dynamics in Games
The dynamics of a 2-player zero-sum game has been well-
studied, especially when each player takes an alternat-
ing best-response strategy in the finite action space. A
classical question is whether players’ actions will con-
verge to an equilibrium as the two players alternatively
play a game (Nash Jr, 1950; Nash, 1951). It is known
that the alternating best-response strategy converges to a
Nash equilibrium for many types of games, such as poten-
tial games (Monderer and Shapley, 1996b), weakly acyclic
games (Fabrikant et al., 2010), aggregative games (Dindoˇ
s
and Mezzetti, 2006), super modular games (Milgrom and
Roberts, 1990), and random games (Heinrich et al., 2021;
Amiet et al., 2021). However, this general phenomenon
may not apply to adversarial robustness games since this
natural learning algorithm may not converge even in sim-
ple games (Balcan Maria-Florina, 2012), as these results
rely on specific properties of those games. In addition,
there are also works on different strategies such as ficti-
tious play (Brown, 1951; Robinson, 1951; Monderer and
Shapley, 1996a; Benaım and Hirsch, 1999) and its exten-
sion to an infinite action space (Oechssler and Riedel, 2001;
Perkins and Leslie, 2014) or continuous time space (Hop-
kins, 1999; Hofbauer and Sorin, 2006). Furthermore, there
is a connection between a 2-player zero-sum game with on-
line learning where it is possible to show that an average
payoff of a player with a sub-linear regret algorithm (such
as follow the regularized leader or follow the perturbed
leader) converges to a Nash equilibrium (Cesa-Bianchi and
Lugosi, 2006; Syrgkanis et al., 2015; Suggala and Netra-
palli, 2020). However, Mertikopoulos et al. (2018) studied
the dynamics of such no-regret algorithms and showed that
when both players play the follow-the-regularized-leader
algorithms, the actions of each player do not converge to a
Nash equilibrium with a cycling behavior in the game.
3 SETUP
We consider a binary classification problem where we want
to learn a linear function f:Rd→R, such that our pre-
diction is given by sign(f(x)). Let Dbe the underlying
distribution of (x, y)and let x= [x1, . . . , xd]. We assume
that the distribution of each feature xihas a symmetrical
mean and is independent of the others given the label.
Assumption 1. (Symmetrical mean) The mean of each fea-
ture xiis symmetrical over class y=−1,1. That is
E[xi|y] = yµi,
where µiis a constant.
Assumption 2. (Independent features given the label)
Each feature xiis independent of each other given the la-
bel.
We study this problem on a soft-SVM objective
min
wL(w),(1)