Exploitability Minimization in Games and Beyond Denizalp Goktas Department of Computer Science

2025-05-06 2 0 959.96KB 28 页 10玖币

侵权投诉

Exploitability Minimization in Games and Beyond

Denizalp Goktas

Department of Computer Science

Brown University

Providence, RI 02906, USA

denizalp_goktas@brown.edu

Amy Greenwald

Brown University

Providence, RI 02906, USA

amy_greenwald@brown.edu

Abstract

Pseudo-games are a natural and well-known generalization of normal-form games,

in which the actions taken by each player aect not only the other players’ payos,

as in games, but also the other players’ strategy sets. The solution concept par

excellence for pseudo-games is the generalized Nash equilibrium (GNE), i.e., a

strategy prole at which each player’s strategy is feasible and no player can

improve their payos by unilaterally deviating to another strategy in the strategy

set determined by the other players’ strategies. The computation of GNE in

pseudo-games has long been a problem of interest, due to applications in a wide

variety of elds, from environmental protection to logistics to telecommunications.

Although computing GNE is PPAD-hard in general, it is still of interest to try to

compute them in restricted classes of pseudo-games. One approach is to search for

a strategy prole that minimizes exploitability, i.e., the sum of the regrets across

all players. As exploitability is nondierentiable in general, developing ecient

rst-order methods that minimize it might not seem possible at rst glance. We

observe, however, that the exploitability-minimization problem can be recast

as a min-max optimization problem, and thereby obtain polynomial-time rst-

order methods to compute a renement of GNE, namely the variational equilibria

(VE), in convex-concave cumulative regret pseudo-games with jointly convex

constraints. More generally, we also show that our methods nd the stationary

points of the exploitability in polynomial time in Lipschitz-smooth pseudo-games

with jointly convex constraints. Finally, we demonstrate in experiments that our

methods not only outperform known algorithms, but that even in pseudo-games

where they are not guaranteed to converge to a GNE, they may do so nonetheless,

with proper initialization.

1 Introduction

Nash equilibrium [1] is the canonical solution concept used to predict the outcome of models of

rational agents known as games. A

game

comprises a set of players, each of whom simultaneously

chooses a strategy (or action) from a strategy set, and then receives a payo based on all the players’

choices. A

Nash equilibrium (NE)

is a strategy prole, i.e., a collection of strategies, one per

player, from which no player can improve their payo via a unilateral deviation.

Recently, the literature on NE has expanded beyond classic economic applications to the question of

its computation, a problem known as the

Nash equilibrium problem (NEP)

[2, 3]. Unfortunately,

it is unlikely that there exists an ecient algorithm to compute NE, as NEP is PPAD-complete, i.e.,

NEP is equivalent to the problem of computing solutions to a class of xed-point problems known

as PPAD [4], which are believed to be computationally intractable [5, 6, 7]. Nonetheless, there has

been a great deal of recent progress developing ecient algorithms to compute NE in special classes

of games, such as two-player zero-sum [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.10207v1 [cs.GT] 18 Oct 2022

26, 27, 28, 29, 30, 31], potential [32, 33, 34], and monotone (or, more generally, variationally stable)

games [35, 36, 37, 38, 39, 40, 41]. Interest in two-player zero-sum games, for example, stems in part

from emerging applications in machine learning, e.g., the training of GANs [42, 43].

A natural generalization of games are

pseudo-games

abstract economies

, rst introduced by

Arrow and Debreu [44] in the context of markets, in which the actions taken by each player aect not

only the other players’ payos, as in games, but also the other players’ strategy sets. In other words,

players (simultaneously) take actions that can restrict the other players’ strategy sets. Such a model

is not technically a game, giving rise to the pseudo-game terminology. Pseudo-games generalize

games, and hence are even more widely applicable. Some recently studied applications include

adversarial classication [45, 46], energy resource allocation [47, 48], environmental protection [49,

50], cloud computing [51, 52], adversarial classication, ride sharing services [53], transportation

[54], wireless and network communication [55, 56].

The solution concept par excellence for pseudo-games is the

generalized Nash equilibrium

(GNE)

[44, 57, 58], i.e., a strategy prole at which each player’s strategy is feasible and no player can

improve their payos by unilaterally deviating to another strategy in the strategy set determined by

the other players’ strategies. The aforementioned applications speak to the importance of developing

algorithms that can approximate GNEs eciently in the largest class of pseudo-games possible.

Work in this direction, on the

generalized Nash equilibrium problem (GNEP)

, is progressing;

see, for example, [59, 57, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77]. Nonetheless,

there are still few, if any [78], GNE-nding algorithms with computational guarantees, even for

restricted classes of pseudo-games.

Exploitability

, or the Nikoida-Isoda potential [79], is dened as the sum of the players’ payo-

maximizing unilateral deviations w.r.t. a given strategy prole. Minimizing exploitability is a logical

approach to computing GNE, especially in cases where exploitability is convex, such as in the

pseudo-games that result from replacing each player’s payo function in a monotone pseudo-game

by a second-order Taylor approximation [80]. When the players’ payos are also Lipschitz-smooth

and strongly concave (in their own action), this approximation can be quite good, because the error

in the approximation will be bounded by the Lipschitz-smoothness and strong concavity constant.

Minimizing exploitability can thus be an eective way of nding GNEs in pseudo-games; however, to

our knowledge, no convergence rates for this approach are known, and few methods have been proposed

for pseudo-games whose exploitability is non-convex.

In this paper, we develop ecient exploitability-minimization algorithms that converge to variational

equilibrium (VE), a renement of GNE, exactly in a large class of pseudo-games with jointly convex

constraints (which includes, but is not limited to, two-player zero-sum,

-player pairwise zero-sum,

and a large class of monotone and bilinear pseudo-games, as well as Cournot oligopoly games); and

approximately, in Lipschitz-smooth pseudo-games with jointly convex constraints, where exact

computation is intractable. Our algorithms apply not only to games with discrete action spaces

but also to games with continuous action spaces; previous exploitability-minimization algorithms

concern only nite games [81, 82].

The

cumulative regret

of two strategy proles is dened as the sum, across all players, of the

change in utility due to unilateral deviations from one strategy prole to the other. Our algorithms

and their analysis rely on the following key insights: assuming jointly convex constraints (i.e.,

symmetric and convex-valued joint constraint correspondences) the exploitability-minimization

problem is equivalent to solving a cumulative regret min-max optimization problem, which although

non-convex-concave in general can nonetheless be solved by

gradient descent ascent (GDA)

methods

[25, 85] using a parametric variant of Moreau’s envelope theorem that we introduce

(Theorem 4.1). We apply this logic to develop two algorithms for Lipschitz-smooth pseudo-games

with jointly convex constraints. To the best of our knowledge, this makes ours the rst exploitability-

minimization methods with convergence rate guarantees in this class of pseudo-games.

The rst (EDA; Algorithm 1) is an extragradient method that converges in average iterates to an

-VE, and hence an

-GNE, in

O(1

/ε)

iterations in pseudo-games with convex-concave cumulative

regret, where

is the exploitability of the game (Theorem 3.4). This class of pseudo-games includes

zero-sum, potential, Cournot, a large class of monotone pseudo-games, and a class of bilinear

When the objective function is convex-concave, a saddle point is guaranteed to exist, in which case this

optimization problem can be solved using simultaneous-GDA-style algorithms (e.g., [21, 83, 84]).

pseudo-games, namely those with convex second-order approximations. Our second algorithm

(ADA; Algorithm 2) is a nested GDA algorithm, whose best iterate converges more generally to a

stationary point of (regularized) exploitability in all Lipschitz-smooth pseudo-games with jointly

convex constraints at the slightly slower rate of

O(log(1

/ε)

(Theorem 4.2). When the (regularized)

exploitability is additionally invex, our method also converges to a GNE in best iterates at the

same rate. This convergence can be strengthened to convergence in last iterates in

O(log(1

/ε)2)

iterations in strongly-convex exploitability pseudo-games and in a certain class of pseudo-games

with non-convex exploitability (Theorem 4.3).

Pseudo-games with jointly convex constraints are of interest in decentralized resource allocation

problems with feasibility (e.g., supply) constraints. Feasibility constraints are almost always joint,

and often also jointly convex, since we generally require the sum of the allocations to be less

than or equal to the supply. One of the many examples of such pseudo-games is wireless and

network communication, where agents communicate packets across a shared network, with limited

capacities [55]. The experiments we run in this paper suggest that our algorithms have the potential

to yield improved solutions to such pseudo-games as compared to other known approaches, and

thus open up avenues for practical applications of our work in the future.

Related Work

Following Arrow and Debreu’s introduction of GNE, Rosen [86] initiated the

study of the mathematical and computational properties of GNE in pseudo-games with jointly

convex constraints, proposing a projected gradient method to compute GNE. Thirty years later,

Uryas’ev and Rubinstein [87] developed the rst relaxation methods for nding GNEs, which were

improved upon in subsequent works [88, 89]. Two other types of algorithms were also introduced

to the literature: Newton-style methods [59, 64, 68, 69, 70, 90] and interior-point potential methods

[90]. Many of these approaches are based on minimizing the exploitability of the pseudo-game, but

others use variational inequality[91, 92] and Lemke methods [93].

More recently, novel methods that transform GNEP to NEP were analyzed. These models take the

form of either exact penalization methods, which lift the constraints into the objective function via

a penalty term [72, 73, 76, 77, 94], or augmented Lagrangian methods [71, 74, 76, 95], which do the

same, augmented by dual Lagrangian variables. Using these methods, Jordan, Lin, and Zampetakis

[78] provide the rst convergence rates to a

-GNE in monotone (resp. strongly monotone) pseudo-

games with jointly ane constraints in

O(1

/ε)

(

O(1

/√ε)

) iterations. These algorithms, despite being

highly ecient in theory, are numerically unstable in practice [78]. Nearly all of the aforementioned

approaches concerned pseudo-games with jointly convex constraints.

Exploitability minimization has also been a valuable tool in multi-agent reinforcement learning;

algorithms in this literature that aim to minimize exploitability are known as exploitability descent

algorithms [81]. Lockhart et al. [81] analyzed exploitability descent in two-player, zero-sum,

extensive-form games with nite action spaces. Variants of exploitability-descent have also been

combined with entropic regularization and homotopy methods to solve for NE in large games [82].

2 Preliminaries

Notation

We use caligraphic uppercase letters to denote sets (e.g.,

); bold lowercase letters

to denote vectors (e.g.,

p,π

); bold uppercase letters to denote matrices (e.g.,

) and lowercase

letters to denote scalar quantities (e.g.,

x, δ

). We denote the

th row vector of a matrix (e.g.,

)

by the corresponding bold lowercase letter with subscript

(e.g.,

xi)

. Similarly, we denote the

entry of a vector (e.g.,

) by the corresponding Roman lowercase letter with subscript

(e.g.,

xij

). We denote functions by a letter determined by the value of the function, e.g.,

if the

mapping is scalar-valued,

if the mapping is vector-valued, and

if the mapping is set-valued.

We denote the set of integers

{1, . . . , n}

[n]

, the set of natural numbers by

, the set of real

numbers by

, and the positive and strictly positive elements of a set by a

and

subscript,

respectively, e.g.,

R+

and

R++

. We dene the gradient

∇x:C1(X)→C0(X)

as the operator

which takes as input a functional

f:X → R

, and outputs a vector-valued function consisting of

the partial derivatives of

w.r.t.

. We denote the orthogonal projection onto a set

ΠC

, i.e.,

ΠC(x) = arg miny∈Ckx−yk2

. Finally, we write

X(a)

to denote the indicator function of the

set X, i.e., X(x)=0if x∈ X and X(x) = ∞if x/∈ X.

pseudo-game

[44]

= (n, A,X,h,u)

comprises

n∈N+

players, each

i∈[n]

of whom

chooses an

from an action space

. We denote the players’ joint action space by

A=×i∈[n]Ai

Each player

aims to maximize their continuous utility

ui:A → R

, which is concave in

by choosing a feasible action from a set of actions

Xi(a−i)⊆ Ai

determined by the actions

a−i∈ A−i⊂Rm

of the other players, where

Xi:A−i⇒Ai

is a non-empty, continuous, compact-

and convex-valued action correspondence, which for convenience we represent as

Xi(a−i) =

{ai∈ Ai|hik(ai,a−i)≥0,for all k∈[d]}

, where for all

i∈[n]

, and

k∈[d]

hik

is a

continuous and concave function in

, which denes the constraints. We denote the product

of the feasible action correspondences of all players by

X(a) =×i∈[n]Xi(a−i)

, which we note

is guaranteed to be non-empty, continuous, and compact-valued, but not necessarily convex-

valued. Additionally, overloading notation, we dene the set of jointly feasible strategy proles

X={a∈ A | hik(a)≥0,for all i∈[n], k ∈[d]}.

A two player pseudo-game is called

zero-sum

if for all

a∈ A

u1(a) = −u2(a)

. A pseudo-game

is called

monotone

if for all

a,b∈ A Pi∈[n]∇aiui(a)− ∇aiui(b)T(a−b)≤0

. A pseudo-

game is said to have

joint constraints

if there exists a function

g:A → Rd

s.t.

g=h1=. . . =hd

If additionally, for all

k∈[d]

gk(a)

is concave in

, then we say that the pseudo-game has

jointly

convex constraints

, as this assumption implies that

is a convex set. A

game

[1] is a pseudo-

game where, for all players

i∈[n]

is a constant correspondence, i.e., for all players

i∈[n],

Xi(a−i) = Xi(b−i)

, for all

a,b∈ A

. Moreover, while

X(a)

is convex, for all action proles

a∈ A

Xis not guaranteed to be convex unless one assumes joint convexity.

Given a pseudo-game

, an

generalized Nash equilibrium (GNE)

is strategy prole

a∗∈

X(a∗)

s.t. for all

i∈[n]

and

ai∈ Xi(a∗

−i)

ui(a∗)≥ui(ai,a∗

−i)−ε

. An

variational equilib-

rium (VE)

(or

normalized GNE

) of a pseudo-game is a strategy prole

a∗∈ X(a∗)

s.t. for all

i∈[n]

and

a∈ X

ui(a∗)≥ui(ai,a∗

−i)−ε

. We note that in the above denitions, one could just

as well write a∗∈ X(a∗)as a∗∈ X, as any xed point of the joint action correspondence is also

a jointly feasible action prole, and vice versa. A GNE (VE) is an

-GNE (VE) with

ε= 0

. Under

our assumptions, while GNE are guaranteed to exist in all pseudo-games by Arrow and Debreu’s

lemma on abstract economies [44], VE are only guaranteed to exist in pseudo-games with jointly

convex constraints [65]. Note that the set of

-VE of a pseudo-game is a subset of the set of the

-GNE, as

X(a∗)⊆ X

, for all

a∗

which are GNE of

. The converse, however, is not true, unless

A⊆X. Further, when Gis a game, GNE and VE coincide; we refer to this set simply as NE.

Mathematical Preliminaries

Finally, we dene several mathematical concepts that are relevant

to our convergence proofs. Given

A⊂Rn

, the function

f:A → R

is said to be

Lipschitz-

continuous

i

∀x1,x2∈ X,kf(x1)−f(x2)k ≤ `fkx1−x2k

. If the gradient of

`∇f

Lipschitz-continuous, we then refer to

`∇f

Lipschitz-smooth

. A function

f:X → R

is said

to be

invex

w.r.t. a function

h:X × X → X

f(x)−f(y)≥h(x,y)· ∇f(y)

, for all

x,y∈ X

A function

f:X → R

convex

if it is invex w.r.t.

h(x,y) = x−y

and concave if

−f

is convex.

A function

f:X → R

strongly convex (SC)

f(x1)≥f(x2) + h∇xf(x2),x1−x2i+

/2kx1−x1k2, and µ-strongly concave if −fis µ-strongly convex.

3 From Exploitability Minimization To Min-Max Optimization

In this section, we reformulate the exploitability minimization problem as a min-max optimization

problem. This reformulation is a consequence of the following observation, which is key to our

analysis: In pseudo-games for which a VE exists, e.g., pseudo-games with jointly convex constraints,

the quasi-optimization problem of minimizing exploitability, whose solutions characterize the set of

GNEs, reduces to the reduces to a standard min-max optimization problem (with independent constraint

sets), which characterizes the set of VEs, a subset of the GNEs. This new perspective allows us

to eciently solve GNEP by computing VEs in a large class of pseudo-games as an immediate

consequence of known results on the computation of saddle points in convex-concave min-max

optimization problems.

Given a pseudo-game

, we dene the

regret

felt by any player

i∈[n]

for an action

as compared to another action

, given the action prole

a−i

of other players, as follows:

Regreti(ai,bi;a−i) = ui(bi,a−i)−ui(ai,a−i)

. Additionally, the

cumulative regret

, or the

Nikaida-Isoda function

ψ:A × A → R

between two action proles

a∈ X

and

b∈ X

across all players in a pseudo-game is given by

ψ(a,b) = Pi∈[n]Regreti(ai,bi;a−i)

. Further,

the

exploitability

, or

Nikaido-Isoda potential function

ϕ:A → R

of an action prole

dened as

ϕ(a) = Pi∈[n]maxbi∈Xi(a−i)Regreti(ai,bi;a−i)

[57]. Notice that the max is taken

over Xi(a−i), since a player can only deviate within the set of available strategies.

A well-known [57]2result is that any unexploitable strategy prole in a pseudo-game is a GNE.

Lemma 3.1.

Given a pseudo-game

, for all

a∈ A

ϕ(a)≥0

. Additionally, a strategy prole

a∗∈ X(a∗)is a GNE i it achieves the lowerbound, i.e., ϕ(a∗) = 0.

This lemma tells us that we can reformulate GNEP as the quasi-optimization problem of minimizing

exploitability, i.e.,

mina∈X(a)ϕ(a)

. Here, “quasi” refers to the fact that a solution to this problem

is both a minimizer of

and a xed point

a∗

s.t.

a∗∈ X(a∗)

Despite this reformulation of

GNEP in terms of exploitability, no exploitability-minimization algorithms with convergence rate

guarantees are known. The unexploitability (!) of exploitability may be due to the fact that it is not

Lipschitz-smooth. The key insight that allows us to obtain convergence guarantees is that we treat

the GNE problem not as a quasi-minimization problem, but rather as a quasi-min-max optimization

problem, namely mina∈X(a)ϕ(a) = mina∈X(a)maxb∈X(a)ψ(a,b).

Observation 3.2. Given a pseudo-game G, for all a∈ A,ϕ(a) = maxb∈X(a)ψ(a,b).

Proof.

The per-player maximum operators can be pulled out of the sum, as the

th player’s best

action in hindsight is independent of the other players’ best actions in hindsight, since the action

prole ais xed:

ϕ(a) = X

i∈[n]

max

bi∈Xi(a−i)Regreti(ai,bi;a−i) = max

b∈X(a)X

i∈[n]

Regreti(ai,bi;a−i) = max

b∈X(a)ψ(a,b)

That is, the optimization problem in the

th summand is independent of the optimization

problem in the other summands, e.g., for any

y∈[0,1

/2]2

maxx1∈[0,1]:x1−y1≥0{x1y1}+

maxx2∈[0,1]:x2−y2≥0{x2y2}= maxx∈[0,1]2:x−y≥0x1y1+x2y2.

If we restrict our attention to VEs, a subset of GNEs, VE exploitability—hereafter exploitability,

for short—is conveniently expressed as

ϕ(a) = maxb∈X ψ(a,b)

. When a VE exists, e.g., in

pseudo-games with jointly convex constraints, this exploitability is guaranteed to achieve the

lower bound of 0 for some

a∈ X

. In such cases, formulation of exploitability minimization as a

quasi-min-max optimization problem reduces to a standard min-max optimization problem, namely

mina∈X maxb∈X ψ(a,b), which characterizes VEs.

This problem is well understood when

is a convex-concave objective function [21, 83, 84, 96].

Furthermore, the cumulative regret

is indeed convex-concave, i.e., convex in

and concave in

in many pseudo-games of interest: e.g., two-player zero-sum,

-player pairwise zero-sum, and a

large class of monotone and bilinear pseudo-games, as well as Cournot oligopoly games.

Going forward, we restrict our attention to pseudo-games satisfying the following assumptions.

Lipschitz-smoothness is a standard assumption in the convex optimization literature [97], while

joint convexity is a standard assumption in the GNE literature [57]. Furthermore, these are weaker

assumptions than ones made to obtain the few known results on convergence rates to GNEs [78].

Assumption 3.3.

The pseudo-game

has joint constraints

g:A → Rd

, and additionally, 1. (Lipschitz

smoothness and concavity) for any player

i∈[n]

, their utility function

`∇u

-Lipchitz smooth;

2. (Joint Convexity) gis component-wise concave.

Using the simple observation that every VE of a pseudo-game is the solution to a min-max opti-

mization problem we introduce our rst algorithm (EDA; Algorithm 1), an extragradient method

[83]. The algorithm works by interleaving extragradient ascent and descent steps: at iteration

given

a(t)

, it ascends on

ψ(a(t),·)

, thereby generating a better response

b(t+1)

, and then descends

ψ(·,b(t+1))

, thereby decreasing exploitability. We combine several known results about the

2yet hard to exploit(!)

This problem could also be formulated as minimizing over the set of jointly feasible actions, i.e.,

mina∈X ϕ(a); however, without assuming joint convexity, Xis not necessarily convex.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExploitabilityMinimizationinGamesandBeyondDenizalpGoktasDepartmentofComputerScienceBrownUniversityProvidence,RI02906,USAdenizalp_goktas@brown.eduAmyGreenwaldBrownUniversityProvidence,RI02906,USAamy_greenwald@brown.eduAbstractPseudo-gamesareanaturalandwell-knowngeneralizationofnormal-formgames,inwhic...

展开>> 收起<<

Exploitability Minimization in Games and Beyond Denizalp Goktas Department of Computer Science.pdf

共28页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploitability Minimization in Games and Beyond Denizalp Goktas Department of Computer Science

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: