
Game Theoretic Rating in N-player general-sum games with Equilibria
to other areas of machine learning and the relevance of the
work to machine learning.
2. Preliminaries
First we cover background on game theory (Section 2.1),
rating and ranking (Section 2.2), equilibria (Section 2.3),
and game theoretic rating (Section 2.4).
2.1. Empirical Game Theory
A normal-form game is a single step game where all play-
ers take an action simultaneously, and each receive a pay-
off. With n players, we denote a joint strategy as the tuple
a= (a1, ..., an)∈ A
, where
ap∈ Ap
is the strategy space
available to player
p
. The payoff given to player
p
from a
joint strategy
a
is
Gp(a)
. A player’s objective is to maxi-
mize their payoff in the presence of other players, who are
maximizing their own payoffs.
It is possible to construct normal-form game representations
from observations of much larger systems. This process is
known as empirical game theoretic analysis (EGTA) (Well-
man, 2006; Walsh et al., 2002). The most common example
in artificial intelligence is when studying a set of policies
1
in large extensive form games (e.g., Chess or Go). Often the
set of policies is too large to enumerate entirely so we retain
a subset of them and track their performance against one an-
other, therefore constructing a normal-form game from the
policies’ expected returns within the environment (Lanctot
et al., 2017). For example, a given season of the Premier
League can be modeled as a normal-form game involving
a set of 20 team policies, out of many possible football
teams
2
. EGTA has proved essential to multiagent reinforce-
ment learning (MARL) recently in scaling to human-level
StarCraft (Vinyals et al., 2019).
2.2. Rating and Ranking
Ranking is the problem of assigning a partial ordering to
a set. Rating is the more general problem of assigning a
scalar value to each element of a set, which then could be
used to describe a ranking. The simplest rating procedure
is to take the mean performance of a strategy against all
other strategies. Viewed through a game theoretic lens, this
is equivalent to assuming each opponent strategy is equally
likely to be encountered: the opponent player is playing a
uniform distribution. The key drawback of this approach
is that it is heavily influenced by the strategies available,
and that an opponent player, in practice, will favour their
strongest strategies. This argument is made thoroughly in
Balduzzi et al. (2018).
1A mapping from information states to actions.
2The full permutation of all players and coaches in the world.
Perhaps the best known rating algorithm is Elo (Elo, 1978).
It is used to rate players in many two-player, zero-sum
games, most famously in Chess. It is also commonly used
for evaluating artificial agents in two-player settings (Silver
et al., 2016; Schrittwieser et al., 2019). The skill (Elo) of
each competitor,
a1
, is described using a single variable
r(a1)
which maps to a win probability compared to other
competitors,
a2
,
G1(a1, a2) = (1 −10r(a1)−r(a2)
400 )−1
. This
therefore defines a two-player, symmetric, constant-sum
game where competitors are strategies, with payoff defined
as
G2= 1 −G1=GT
1
. It is only suitable for describing
highly transitive games. Multi-dimensional Elo (Balduzzi
et al., 2018) is a method that attempts to overcome the
limitations of Elo by introducing additional variables for
each competitor which describe cyclic structure in the game.
This gives a more accurate approximation of the payoff,
however it does not provide a way of rating strategies on its
own
3
. Decomposed Elo (Jaderberg et al., 2019) is a method
that works for
m
vs
m
constant-sum, team games. It is
capable of assigning a rating to each competitor in the team
as well as a rating for the team. However, it is also only
suitable for transitive games, both where team compositions
and between-team interactions are transitive.
2.3. Equilibria
The joint strategy distribution over the set of all joint
strategies is denoted
σ(a) = σ(a1, ..., an)
, where
Pa∈A σ(a) = 1
and
σ(a)≥0∀a∈ A
. Furthermore
we define the marginal strategy distribution as
σ(ap) =
Pq∈−pPaq∈Aqσ(a1, ..., an)
, where
−p={1, ..., p −
1, p+1, ..., n}
. Sometimes it is possible to factorize the joint
strategy distribution into its marginals
σ(a) = ⊗pσ(ap)
. Fi-
nally, the conditional distribution
σ(a−p|ap) = σ(a)
σ(ap)
is
defined if
σ(ap)>0
. Sometimes we may denote the joint
in terms of one players strategies versus all other players:
σ(a)≡σ(a1, ..., an)≡σ(ap, a−p).
A popular class of solution concepts are equilibrium based:
joint distributions,
σ(a)
, where under certain definitions,
no player has incentive to deviate. The most well known
is Nash equilibrium (Nash, 1951) (NE), which is tractable,
interchangeable and unexploitable in two-player, zero-sum
games (Shoham & Leyton-Brown, 2009). NEs are always
factorizable joint distributions. A related solution concept is
correlated equilibrium (Aumann, 1974) (CE) which is more
suitable for N-player, general-sum settings where players
are allowed to coordinate strategies with each other if it is
mutually beneficial. Furthermore, CEs are more compatible
with the Bayesian perspective, and arise as a result of learn-
ing rules (Foster & Vohra, 1997; Cesa-Bianchi & Lugosi,
2006). The mechanism of implementing a CE is via a corre-
3It assigns vectors, not scalars, to strategies.