take a prohibitive amount of time to do so, and (iii) may fail unpredictably on ill-conditioned problems.
Furthermore, classical methods [
16
,
36
,
43
] (i) do not scale, and (ii) are non-differentiable. This
limits the applicability of equilibrium solution concepts in multiagent machine learning algorithms.
Therefore, there exists an important niche for approximately solving equilibria in medium sized
normal-form games, quickly, in batches, reliably, and in a deterministic amount of time. With
appropriate care, this goal can be accomplished with a Neural Network which amortizes up-front
training cost to map normal-form payoffs to equilibrium solution concepts quickly at test time. We
propose the Neural Equilibirum Solver (NES). This network is trained to optimize a composite
objective function that weights accuracy of the returned equilibrium against auxiliary objectives
that a user may desire such as maximum entropy, maximum welfare, or minimum distance to some
target distribution. We introduce several innovations into the design and training of NES so that
it is efficient and accurate. Unlike most supervised deep learning models, NES avoids the need to
explicitly construct a labeled dataset of (game, equilibrium) pairs. Instead we derive a loss function
that can be minimized in an unsupervised fashion from only game inputs. We also exploit the
duality of the equilibrium problem. Instead of solving for equilibria in the primal space, we train
NES to solve for them in the dual space, which has a much smaller representation. We utilize a
training distribution that efficiently represents the space of all normal-form games of a desired shape
and use an invariant preprocessing step to map games at test time to this space. In terms of the
network architecture, we design a series of layers that are equivariant to symmetries in games such
as permutations of players and strategies, which reduces the number of training steps and improves
generalization performance. The network architecture is independent of the number of strategies in
the game and we show interesting zero-shot generalization to larger games. This network can either
be pretrained before being deployed, trained online alongside another machine learning algorithm, or
a mixture of both.
2 Preliminaries
Game Theory
Game theory is the study of the interactive behaviour of rational payoff maximizing
agents in the presence of other agents. The environment that the agents operate in is called a game.
We focus on a particular type of single-shot, simultaneous move game called a normal-form game.
A normal-form game consists of
N
players, a set of strategies available to each player,
ap∈ Ap
,
and a payoff for each player under a particular joint action,
Gp(a)
, where
a= (a1, ..., aN) =
(ap, a−p)∈ A =⊗pAp
. The subscript notation
−p
is used to mean “all players apart from player
p
”.
Games are sometimes referred to by their shape, for example:
|A1| × ... × |AN|
. The distribution
of play is described by a joint
σ(a)
. The goal of each player is to maximize their expected payoff,
Pa∈A σ(a)Gp(a)
. Players could play independently by selecting a strategy according to their
marginal
σ(ap)
over joint strategies, such that
σ(a) = ⊗pσ(ap)
. However this is limiting because
it does not allow players to coordinate. A mediator called a correlation device could be employed
to allow players to execute arbitrary joint strategies
σ(a)
that do not necessarily factorize into their
marginals. Such a mediator would sample from a publicly known joint
σ(a)
and secretly communicate
to each player their recommended strategy. Game theory is most developed in a subset of games:
those with two players and a restriction on the payoffs,
G1(a1, a2) = −G2(a1, a2)
, known as zero-
sum. Particularly in N-player, general-sum games, it is difficult to define a single criterion to find
solutions to games. One approach is instead to consider joints that are in equilibrium: distributions
such that no player has incentive to unilaterally deviate from a recommendation.
Equilibrium Solution Concepts
Correlated Equilibria (CEs) [
3
] can be defined in terms of linear
inequality constraints. The deviation gain of a player is the change in payoff the player achieves
when deviating to action a0
pfrom a recommended action a00
p, when the other players play a−p.
ACE
p(a0
p, a00
p, a) = ACE
p(a0
p, a00
p, ap, a−p) = Gp(a0
p, a−p)−Gp(a00
p, a−p)ap=a00
p
0otherwise (1)
A distribution,
σ(a)
, is in
-CE if the deviation gain is no more than some constant
p≤
for every
pair of recommendation,
a00
p
, and deviation strategies,
a0
p
, for every player,
p
. These linear constraints
geometrically form a convex polytope of feasible solutions.
-CE: X
a∈A
σ(a)ACE
p(a0
p, a00
p, a)≤p∀p∈[1, N], a00
p6=a0
p∈ Ap(2)
2