
•
Using the above algorithm as a subroutine, we develop exponential weights based algorithms that
can provably find
∆
-rationalizable
-CCE using
e
OLNA
∆2+NA
2
samples, and
∆
-rationalizable
-CE
using
e
OLNA
∆2+NA2
min{2,∆2}
samples. To the best of our knowledge, these are the first guarantees for
learning rationalizable approximate CCE and CE.
•
We also provide reduction schemes that find
∆
-rationalizable
-CCE/CE using black-box algorithms for
-CCE/CE. Despite having slightly worse rates, these algorithms can directly leverage the progress in
equilibria finding, which may be of independent interest.
1.1 Related work
Rationalizability and iterative dominance elimination.
Rationalizability (Bernheim,1984;Pearce,1984)
is a notion that captures rational reasoning in games and relaxes Nash Equilibrium. Rationalizability is
closely related to the iterative elimination of dominated actions, which has been a focus of game theory
research since the 1950s (Luce & Raiffa,1957). It can be shown that an action is rationalizable if and only
if it survives iterative elimination of strictly dominated actions
2
(Pearce,1984). There is also experimental
evidence supporting iterative elimination of dominated strategies as a model of human reasoning (Camerer,
2011).
Equilibria learning in games.
There is a rich literature on applying online learning algorithms to learning
equilibria in games. It is well-known that if all agents have no-regret, the resulting empirical average would be
an
-CCE (Young,2004), while if all agents have no swap-regret, the resulting empirical average would be an
-CE (Hart & Mas-Colell,2000;Cesa-Bianchi & Lugosi,2006). Later work continuing this line of research
include those with faster convergence rates (Syrgkanis et al.,2015;Chen & Peng,2020;Daskalakis et al.,
2021), last-iterate convergence guarantees (Daskalakis & Panageas,2018;Wei et al.,2020), and extension to
extensive-form games (Celli et al.,2020;Bai et al.,2022b,a;Song et al.,2022) and Markov games (Song
et al.,2021;Jin et al.,2021).
Computational and learning aspect of rationalizability.
Despite its conceptual importance, rationaliz-
ability and iterative dominance elimination are not well studied from a computational or learning perspective.
For iterative strict dominance elimination in two-player games, Knuth et al. (1988) provided a cubic-time
algorithm and proved that the problem is P-complete. The weak dominance version of the problem is proven
to be NP-complete by Conitzer & Sandholm (2005).
Hofbauer & Weibull (1996) showed that in a class of learning dynamics which includes replicator dynamics
— the continuous-time variant of FTRL, all iteratively strictly dominated actions vanish over time, while Mer-
tikopoulos & Moustakas (2010) proved similar results for stochastic replicator dynamics; however, neither
work provides finite-time guarantees. Cohen et al. (2017) proved that Hedge eliminates dominated actions in
finite time, but did not extend their results to the more challenging case of iteratively dominated actions.
The most related work in literature is the work on learning rationalizable actions by Wu et al. (2021),
who proposed the Exp3-DH algorithm to find a strategy mostly supported on rationalizable actions with a
polynomial rate. Our Algorithm 2accomplishes the same task with a faster rate, while our Algorithms 3
&4deal with the more challenging problems of finding
-CE/CCE that are also rationalizable. Although
2
For this equivalence to hold, we need to allow dominance by mixed strategies, and correlated beliefs when there are more than
two players. These conditions are met in the setting of this work.
3