Exploitability Minimization in Games and Beyond Denizalp Goktas Department of Computer Science

2025-05-06 0 0 959.96KB 28 页 10玖币
侵权投诉
Exploitability Minimization in Games and Beyond
Denizalp Goktas
Department of Computer Science
Brown University
Providence, RI 02906, USA
denizalp_goktas@brown.edu
Amy Greenwald
Brown University
Providence, RI 02906, USA
amy_greenwald@brown.edu
Abstract
Pseudo-games are a natural and well-known generalization of normal-form games,
in which the actions taken by each player aect not only the other players’ payos,
as in games, but also the other players’ strategy sets. The solution concept par
excellence for pseudo-games is the generalized Nash equilibrium (GNE), i.e., a
strategy prole at which each player’s strategy is feasible and no player can
improve their payos by unilaterally deviating to another strategy in the strategy
set determined by the other players’ strategies. The computation of GNE in
pseudo-games has long been a problem of interest, due to applications in a wide
variety of elds, from environmental protection to logistics to telecommunications.
Although computing GNE is PPAD-hard in general, it is still of interest to try to
compute them in restricted classes of pseudo-games. One approach is to search for
a strategy prole that minimizes exploitability, i.e., the sum of the regrets across
all players. As exploitability is nondierentiable in general, developing ecient
rst-order methods that minimize it might not seem possible at rst glance. We
observe, however, that the exploitability-minimization problem can be recast
as a min-max optimization problem, and thereby obtain polynomial-time rst-
order methods to compute a renement of GNE, namely the variational equilibria
(VE), in convex-concave cumulative regret pseudo-games with jointly convex
constraints. More generally, we also show that our methods nd the stationary
points of the exploitability in polynomial time in Lipschitz-smooth pseudo-games
with jointly convex constraints. Finally, we demonstrate in experiments that our
methods not only outperform known algorithms, but that even in pseudo-games
where they are not guaranteed to converge to a GNE, they may do so nonetheless,
with proper initialization.
1 Introduction
Nash equilibrium [1] is the canonical solution concept used to predict the outcome of models of
rational agents known as games. A
game
comprises a set of players, each of whom simultaneously
chooses a strategy (or action) from a strategy set, and then receives a payo based on all the players’
choices. A
Nash equilibrium (NE)
is a strategy prole, i.e., a collection of strategies, one per
player, from which no player can improve their payo via a unilateral deviation.
Recently, the literature on NE has expanded beyond classic economic applications to the question of
its computation, a problem known as the
Nash equilibrium problem (NEP)
[2, 3]. Unfortunately,
it is unlikely that there exists an ecient algorithm to compute NE, as NEP is PPAD-complete, i.e.,
NEP is equivalent to the problem of computing solutions to a class of xed-point problems known
as PPAD [4], which are believed to be computationally intractable [5, 6, 7]. Nonetheless, there has
been a great deal of recent progress developing ecient algorithms to compute NE in special classes
of games, such as two-player zero-sum [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.10207v1 [cs.GT] 18 Oct 2022
26, 27, 28, 29, 30, 31], potential [32, 33, 34], and monotone (or, more generally, variationally stable)
games [35, 36, 37, 38, 39, 40, 41]. Interest in two-player zero-sum games, for example, stems in part
from emerging applications in machine learning, e.g., the training of GANs [42, 43].
A natural generalization of games are
pseudo-games
or
abstract economies
, rst introduced by
Arrow and Debreu [44] in the context of markets, in which the actions taken by each player aect not
only the other players’ payos, as in games, but also the other players’ strategy sets. In other words,
players (simultaneously) take actions that can restrict the other players’ strategy sets. Such a model
is not technically a game, giving rise to the pseudo-game terminology. Pseudo-games generalize
games, and hence are even more widely applicable. Some recently studied applications include
adversarial classication [45, 46], energy resource allocation [47, 48], environmental protection [49,
50], cloud computing [51, 52], adversarial classication, ride sharing services [53], transportation
[54], wireless and network communication [55, 56].
The solution concept par excellence for pseudo-games is the
generalized Nash equilibrium
(GNE)
[44, 57, 58], i.e., a strategy prole at which each player’s strategy is feasible and no player can
improve their payos by unilaterally deviating to another strategy in the strategy set determined by
the other players’ strategies. The aforementioned applications speak to the importance of developing
algorithms that can approximate GNEs eciently in the largest class of pseudo-games possible.
Work in this direction, on the
generalized Nash equilibrium problem (GNEP)
, is progressing;
see, for example, [59, 57, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77]. Nonetheless,
there are still few, if any [78], GNE-nding algorithms with computational guarantees, even for
restricted classes of pseudo-games.
Exploitability
, or the Nikoida-Isoda potential [79], is dened as the sum of the players’ payo-
maximizing unilateral deviations w.r.t. a given strategy prole. Minimizing exploitability is a logical
approach to computing GNE, especially in cases where exploitability is convex, such as in the
pseudo-games that result from replacing each player’s payo function in a monotone pseudo-game
by a second-order Taylor approximation [80]. When the players’ payos are also Lipschitz-smooth
and strongly concave (in their own action), this approximation can be quite good, because the error
in the approximation will be bounded by the Lipschitz-smoothness and strong concavity constant.
Minimizing exploitability can thus be an eective way of nding GNEs in pseudo-games; however, to
our knowledge, no convergence rates for this approach are known, and few methods have been proposed
for pseudo-games whose exploitability is non-convex.
In this paper, we develop ecient exploitability-minimization algorithms that converge to variational
equilibrium (VE), a renement of GNE, exactly in a large class of pseudo-games with jointly convex
constraints (which includes, but is not limited to, two-player zero-sum,
n
-player pairwise zero-sum,
and a large class of monotone and bilinear pseudo-games, as well as Cournot oligopoly games); and
approximately, in Lipschitz-smooth pseudo-games with jointly convex constraints, where exact
computation is intractable. Our algorithms apply not only to games with discrete action spaces
but also to games with continuous action spaces; previous exploitability-minimization algorithms
concern only nite games [81, 82].
The
cumulative regret
of two strategy proles is dened as the sum, across all players, of the
change in utility due to unilateral deviations from one strategy prole to the other. Our algorithms
and their analysis rely on the following key insights: assuming jointly convex constraints (i.e.,
symmetric and convex-valued joint constraint correspondences) the exploitability-minimization
problem is equivalent to solving a cumulative regret min-max optimization problem, which although
non-convex-concave in general can nonetheless be solved by
gradient descent ascent (GDA)
methods
1
[25, 85] using a parametric variant of Moreau’s envelope theorem that we introduce
(Theorem 4.1). We apply this logic to develop two algorithms for Lipschitz-smooth pseudo-games
with jointly convex constraints. To the best of our knowledge, this makes ours the rst exploitability-
minimization methods with convergence rate guarantees in this class of pseudo-games.
The rst (EDA; Algorithm 1) is an extragradient method that converges in average iterates to an
ε
-VE, and hence an
ε
-GNE, in
O(1
/ε)
iterations in pseudo-games with convex-concave cumulative
regret, where
ε
is the exploitability of the game (Theorem 3.4). This class of pseudo-games includes
zero-sum, potential, Cournot, a large class of monotone pseudo-games, and a class of bilinear
1
When the objective function is convex-concave, a saddle point is guaranteed to exist, in which case this
optimization problem can be solved using simultaneous-GDA-style algorithms (e.g., [21, 83, 84]).
2
pseudo-games, namely those with convex second-order approximations. Our second algorithm
(ADA; Algorithm 2) is a nested GDA algorithm, whose best iterate converges more generally to a
stationary point of (regularized) exploitability in all Lipschitz-smooth pseudo-games with jointly
convex constraints at the slightly slower rate of
O(log(1
/ε)
/ε)
(Theorem 4.2). When the (regularized)
exploitability is additionally invex, our method also converges to a GNE in best iterates at the
same rate. This convergence can be strengthened to convergence in last iterates in
O(log(1
/ε)2)
iterations in strongly-convex exploitability pseudo-games and in a certain class of pseudo-games
with non-convex exploitability (Theorem 4.3).
Pseudo-games with jointly convex constraints are of interest in decentralized resource allocation
problems with feasibility (e.g., supply) constraints. Feasibility constraints are almost always joint,
and often also jointly convex, since we generally require the sum of the allocations to be less
than or equal to the supply. One of the many examples of such pseudo-games is wireless and
network communication, where agents communicate packets across a shared network, with limited
capacities [55]. The experiments we run in this paper suggest that our algorithms have the potential
to yield improved solutions to such pseudo-games as compared to other known approaches, and
thus open up avenues for practical applications of our work in the future.
Related Work
Following Arrow and Debreu’s introduction of GNE, Rosen [86] initiated the
study of the mathematical and computational properties of GNE in pseudo-games with jointly
convex constraints, proposing a projected gradient method to compute GNE. Thirty years later,
Uryas’ev and Rubinstein [87] developed the rst relaxation methods for nding GNEs, which were
improved upon in subsequent works [88, 89]. Two other types of algorithms were also introduced
to the literature: Newton-style methods [59, 64, 68, 69, 70, 90] and interior-point potential methods
[90]. Many of these approaches are based on minimizing the exploitability of the pseudo-game, but
others use variational inequality[91, 92] and Lemke methods [93].
More recently, novel methods that transform GNEP to NEP were analyzed. These models take the
form of either exact penalization methods, which lift the constraints into the objective function via
a penalty term [72, 73, 76, 77, 94], or augmented Lagrangian methods [71, 74, 76, 95], which do the
same, augmented by dual Lagrangian variables. Using these methods, Jordan, Lin, and Zampetakis
[78] provide the rst convergence rates to a
ε
-GNE in monotone (resp. strongly monotone) pseudo-
games with jointly ane constraints in
˜
O(1
/ε)
(
˜
O(1
/ε)
) iterations. These algorithms, despite being
highly ecient in theory, are numerically unstable in practice [78]. Nearly all of the aforementioned
approaches concerned pseudo-games with jointly convex constraints.
Exploitability minimization has also been a valuable tool in multi-agent reinforcement learning;
algorithms in this literature that aim to minimize exploitability are known as exploitability descent
algorithms [81]. Lockhart et al. [81] analyzed exploitability descent in two-player, zero-sum,
extensive-form games with nite action spaces. Variants of exploitability-descent have also been
combined with entropic regularization and homotopy methods to solve for NE in large games [82].
2 Preliminaries
Notation
We use caligraphic uppercase letters to denote sets (e.g.,
X
); bold lowercase letters
to denote vectors (e.g.,
p,π
); bold uppercase letters to denote matrices (e.g.,
X
) and lowercase
letters to denote scalar quantities (e.g.,
x, δ
). We denote the
i
th row vector of a matrix (e.g.,
X
)
by the corresponding bold lowercase letter with subscript
i
(e.g.,
xi)
. Similarly, we denote the
j
th
entry of a vector (e.g.,
p
or
xi
) by the corresponding Roman lowercase letter with subscript
j
(e.g.,
pj
or
xij
). We denote functions by a letter determined by the value of the function, e.g.,
f
if the
mapping is scalar-valued,
f
if the mapping is vector-valued, and
F
if the mapping is set-valued.
We denote the set of integers
{1, . . . , n}
by
[n]
, the set of natural numbers by
N
, the set of real
numbers by
R
, and the positive and strictly positive elements of a set by a
+
and
++
subscript,
respectively, e.g.,
R+
and
R++
. We dene the gradient
x:C1(X)C0(X)
as the operator
which takes as input a functional
f:X R
, and outputs a vector-valued function consisting of
the partial derivatives of
f
w.r.t.
x
. We denote the orthogonal projection onto a set
C
by
ΠC
, i.e.,
ΠC(x) = arg minyCkxyk2
2
. Finally, we write
X(a)
to denote the indicator function of the
set X, i.e., X(x)=0if x∈ X and X(x) = if x/∈ X.
3
A
pseudo-game
[44]
G.
= (n, A,X,h,u)
comprises
nN+
players, each
i[n]
of whom
chooses an
ai
from an action space
Ai
. We denote the players’ joint action space by
A=×i[n]Ai
.
Each player
i
aims to maximize their continuous utility
ui:A → R
, which is concave in
ai
,
by choosing a feasible action from a set of actions
Xi(ai)⊆ Ai
determined by the actions
ai∈ AiRm
of the other players, where
Xi:AiAi
is a non-empty, continuous, compact-
and convex-valued action correspondence, which for convenience we represent as
Xi(ai) =
{ai∈ Ai|hik(ai,ai)0,for all k[d]}
, where for all
i[n]
, and
k[d]
,
hik
is a
continuous and concave function in
ai
, which denes the constraints. We denote the product
of the feasible action correspondences of all players by
X(a) =×i[n]Xi(ai)
, which we note
is guaranteed to be non-empty, continuous, and compact-valued, but not necessarily convex-
valued. Additionally, overloading notation, we dene the set of jointly feasible strategy proles
X={a∈ A | hik(a)0,for all i[n], k [d]}.
A two player pseudo-game is called
zero-sum
if for all
a∈ A
,
u1(a) = u2(a)
. A pseudo-game
is called
monotone
if for all
a,b A Pi[n]aiui(a)− ∇aiui(b)T(ab)0
. A pseudo-
game is said to have
joint constraints
if there exists a function
g:A → Rd
s.t.
g=h1=. . . =hd
.
If additionally, for all
k[d]
,
gk(a)
is concave in
a
, then we say that the pseudo-game has
jointly
convex constraints
, as this assumption implies that
X
is a convex set. A
game
[1] is a pseudo-
game where, for all players
i[n]
,
Xi
is a constant correspondence, i.e., for all players
i[n],
Xi(ai) = Xi(bi)
, for all
a,b∈ A
. Moreover, while
X(a)
is convex, for all action proles
a∈ A
,
Xis not guaranteed to be convex unless one assumes joint convexity.
Given a pseudo-game
G
, an
ε
-
generalized Nash equilibrium (GNE)
is strategy prole
a
X(a)
s.t. for all
i[n]
and
ai∈ Xi(a
i)
,
ui(a)ui(ai,a
i)ε
. An
ε
-
variational equilib-
rium (VE)
(or
normalized GNE
) of a pseudo-game is a strategy prole
a∈ X(a)
s.t. for all
i[n]
and
a∈ X
,
ui(a)ui(ai,a
i)ε
. We note that in the above denitions, one could just
as well write a∈ X(a)as a∈ X, as any xed point of the joint action correspondence is also
a jointly feasible action prole, and vice versa. A GNE (VE) is an
ε
-GNE (VE) with
ε= 0
. Under
our assumptions, while GNE are guaranteed to exist in all pseudo-games by Arrow and Debreu’s
lemma on abstract economies [44], VE are only guaranteed to exist in pseudo-games with jointly
convex constraints [65]. Note that the set of
ε
-VE of a pseudo-game is a subset of the set of the
ε
-GNE, as
X(a)⊆ X
, for all
a
which are GNE of
G
. The converse, however, is not true, unless
A⊆X. Further, when Gis a game, GNE and VE coincide; we refer to this set simply as NE.
Mathematical Preliminaries
Finally, we dene several mathematical concepts that are relevant
to our convergence proofs. Given
ARn
, the function
f:A → R
is said to be
`f
-
Lipschitz-
continuous
i
x1,x2∈ X,kf(x1)f(x2)k ≤ `fkx1x2k
. If the gradient of
f
is
`f
-
Lipschitz-continuous, we then refer to
f
as
`f
-
Lipschitz-smooth
. A function
f:X R
is said
to be
invex
w.r.t. a function
h:X × X X
if
f(x)f(y)h(x,y)· ∇f(y)
, for all
x,y∈ X
.
A function
f:X R
is
convex
if it is invex w.r.t.
h(x,y) = xy
and concave if
f
is convex.
A function
f:X R
is
µ
-
strongly convex (SC)
if
f(x1)f(x2) + h∇xf(x2),x1x2i+
µ
/2kx1x1k2, and µ-strongly concave if fis µ-strongly convex.
3 From Exploitability Minimization To Min-Max Optimization
In this section, we reformulate the exploitability minimization problem as a min-max optimization
problem. This reformulation is a consequence of the following observation, which is key to our
analysis: In pseudo-games for which a VE exists, e.g., pseudo-games with jointly convex constraints,
the quasi-optimization problem of minimizing exploitability, whose solutions characterize the set of
GNEs, reduces to the reduces to a standard min-max optimization problem (with independent constraint
sets), which characterizes the set of VEs, a subset of the GNEs. This new perspective allows us
to eciently solve GNEP by computing VEs in a large class of pseudo-games as an immediate
consequence of known results on the computation of saddle points in convex-concave min-max
optimization problems.
Given a pseudo-game
G
, we dene the
regret
felt by any player
i[n]
for an action
ai
as compared to another action
bi
, given the action prole
ai
of other players, as follows:
Regreti(ai,bi;ai) = ui(bi,ai)ui(ai,ai)
. Additionally, the
cumulative regret
, or the
Nikaida-Isoda function
,
ψ:A × A R
between two action proles
a∈ X
and
b∈ X
4
across all players in a pseudo-game is given by
ψ(a,b) = Pi[n]Regreti(ai,bi;ai)
. Further,
the
exploitability
, or
Nikaido-Isoda potential function
,
ϕ:A → R
of an action prole
a
is
dened as
ϕ(a) = Pi[n]maxbi∈Xi(ai)Regreti(ai,bi;ai)
[57]. Notice that the max is taken
over Xi(ai), since a player can only deviate within the set of available strategies.
A well-known [57]2result is that any unexploitable strategy prole in a pseudo-game is a GNE.
Lemma 3.1.
Given a pseudo-game
G
, for all
a∈ A
,
ϕ(a)0
. Additionally, a strategy prole
a∈ X(a)is a GNE i it achieves the lowerbound, i.e., ϕ(a) = 0.
This lemma tells us that we can reformulate GNEP as the quasi-optimization problem of minimizing
exploitability, i.e.,
mina∈X(a)ϕ(a)
. Here, “quasi” refers to the fact that a solution to this problem
is both a minimizer of
ϕ
and a xed point
a
s.t.
a∈ X(a)
.
3
Despite this reformulation of
GNEP in terms of exploitability, no exploitability-minimization algorithms with convergence rate
guarantees are known. The unexploitability (!) of exploitability may be due to the fact that it is not
Lipschitz-smooth. The key insight that allows us to obtain convergence guarantees is that we treat
the GNE problem not as a quasi-minimization problem, but rather as a quasi-min-max optimization
problem, namely mina∈X(a)ϕ(a) = mina∈X(a)maxb∈X(a)ψ(a,b).
Observation 3.2. Given a pseudo-game G, for all a∈ A,ϕ(a) = maxb∈X(a)ψ(a,b).
Proof.
The per-player maximum operators can be pulled out of the sum, as the
i
th player’s best
action in hindsight is independent of the other players’ best actions in hindsight, since the action
prole ais xed:
ϕ(a) = X
i[n]
max
bi∈Xi(ai)Regreti(ai,bi;ai) = max
b∈X(a)X
i[n]
Regreti(ai,bi;ai) = max
b∈X(a)ψ(a,b)
That is, the optimization problem in the
i
th summand is independent of the optimization
problem in the other summands, e.g., for any
y[0,1
/2]2
,
maxx1[0,1]:x1y10{x1y1}+
maxx2[0,1]:x2y20{x2y2}= maxx[0,1]2:xy0x1y1+x2y2.
If we restrict our attention to VEs, a subset of GNEs, VE exploitability—hereafter exploitability,
for short—is conveniently expressed as
ϕ(a) = maxb∈X ψ(a,b)
. When a VE exists, e.g., in
pseudo-games with jointly convex constraints, this exploitability is guaranteed to achieve the
lower bound of 0 for some
a∈ X
. In such cases, formulation of exploitability minimization as a
quasi-min-max optimization problem reduces to a standard min-max optimization problem, namely
mina∈X maxb∈X ψ(a,b), which characterizes VEs.
This problem is well understood when
ψ
is a convex-concave objective function [21, 83, 84, 96].
Furthermore, the cumulative regret
ψ
is indeed convex-concave, i.e., convex in
a
and concave in
b
,
in many pseudo-games of interest: e.g., two-player zero-sum,
n
-player pairwise zero-sum, and a
large class of monotone and bilinear pseudo-games, as well as Cournot oligopoly games.
Going forward, we restrict our attention to pseudo-games satisfying the following assumptions.
Lipschitz-smoothness is a standard assumption in the convex optimization literature [97], while
joint convexity is a standard assumption in the GNE literature [57]. Furthermore, these are weaker
assumptions than ones made to obtain the few known results on convergence rates to GNEs [78].
Assumption 3.3.
The pseudo-game
G
has joint constraints
g:A → Rd
, and additionally, 1. (Lipschitz
smoothness and concavity) for any player
i[n]
, their utility function
ui
is
`u
-Lipchitz smooth;
2. (Joint Convexity) gis component-wise concave.
Using the simple observation that every VE of a pseudo-game is the solution to a min-max opti-
mization problem we introduce our rst algorithm (EDA; Algorithm 1), an extragradient method
[83]. The algorithm works by interleaving extragradient ascent and descent steps: at iteration
t
,
given
a(t)
, it ascends on
ψ(a(t),·)
, thereby generating a better response
b(t+1)
, and then descends
on
ψ(·,b(t+1))
, thereby decreasing exploitability. We combine several known results about the
2yet hard to exploit(!)
3
This problem could also be formulated as minimizing over the set of jointly feasible actions, i.e.,
mina∈X ϕ(a); however, without assuming joint convexity, Xis not necessarily convex.
5
摘要:

ExploitabilityMinimizationinGamesandBeyondDenizalpGoktasDepartmentofComputerScienceBrownUniversityProvidence,RI02906,USAdenizalp_goktas@brown.eduAmyGreenwaldBrownUniversityProvidence,RI02906,USAamy_greenwald@brown.eduAbstractPseudo-gamesareanaturalandwell-knowngeneralizationofnormal-formgames,inwhic...

展开>> 收起<<
Exploitability Minimization in Games and Beyond Denizalp Goktas Department of Computer Science.pdf

共28页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:28 页 大小:959.96KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 28
客服
关注