Learning Rationalizable Equilibria in Multiplayer Games Yuanhao Wang1 Dingwen Kong2 Yu Bai3 Chi Jin1 1Princeton University2Peking University3Salesforce Research

2025-05-02 1 0 492.38KB 33 页 10玖币

侵权投诉

Learning Rationalizable Equilibria in Multiplayer Games

Yuanhao Wang*

∗1, Dingwen Kong*

∗2, Yu Bai3, Chi Jin1

1Princeton University, 2Peking University, 3Salesforce Research

yuanhao@princeton.edu, dingwenk@pku.edu.cn

yu.bai@salesforce.com, chij@princeton.edu

October 21, 2022

Abstract

A natural goal in multiagent learning besides ﬁnding equilibria is to learn rationalizable behavior,

where players learn to avoid iteratively dominated actions. However, even in the basic setting of

multiplayer general-sum games, existing algorithms require a number of samples exponential in the

number of players to learn rationalizable equilibria under bandit feedback. This paper develops the ﬁrst

line of efﬁcient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated

Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the

number of players. To achieve this result, we also develop a new efﬁcient algorithm for the simpler task

of ﬁnding one rationalizable action proﬁle (not necessarily an equilibrium), whose sample complexity

substantially improves over the best existing results of Wu et al. (2021). Our algorithms incorporate

several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a

correlated exploration scheme and adaptive learning rates, which may be of independent interest. We

complement our results with a sample complexity lower bound showing the sharpness of our guarantees.

1 Introduction

A common objective in multiagent learning is to ﬁnd various equilibria, such as Nash equilibria (NE),

correlated equilibria (CE) and coarse correlated equilibria (CCE). Generally speaking, a player in equilibrium

lacks incentive to deviate assuming conformity of other players to the same equilibrium. Equilibrium learning

has been extensively studied in the literature of game theory and online learning, and no-regret based learners

can provably learn approximate CE and CCE with both computational and statistical efﬁciency (Stoltz,2005;

Cesa-Bianchi & Lugosi,2006).

However, not all equilibria are created equal. As shown by Viossat & Zapechelnyuk (2013), a CCE can

be entirely supported on dominated actions—actions that are worse off than some other strategy in all

circumstances—which rational agents should apparently never play. Approximate CE also suffers from a

similar problem. As shown by Wu et al. (2021, Theorem 1), there are examples where an



-CE always plays

iteratively dominated actions—actions that would be eliminated when iteratively deleting strictly dominated

actions—unless



is exponentially small. It is also shown that standard no-regret algorithms are indeed prone

to ﬁnding such seemingly undesirable solutions (Wu et al.,2021). The intrinsic reason behind this is that CCE

*Equal contribution.

arXiv:2210.11402v1 [cs.LG] 20 Oct 2022

Task Sample Complexity

Find all rationalizable actions (Proposition 3)Ω(AN−1)

Find one rationalizable action proﬁle (Theorem 4)e

OLNA

∆2

Learn rationalizable

equilibria

-CCE (Theorem 7)e

OLNA

∆2+NA

2

-CE (Theorem 12)e

OLNA

∆2+NA2

min{2,∆2}

Table 1: Summary of main results. Here

is the number of players,

is the number of actions per player,

L < N A is the minimum elimination length and ∆is the error we allow for rationalizability.

and approximate CE may not be rationalizable, and existing algorithms can indeed fail to ﬁnd rationalizable

solutions.

Different from equilibria notions, rationalizability (Bernheim,1984;Pearce,1984) looks at the game from

the perspective of a single player without knowledge of the actual strategies of other players, and only

assumes common knowledge of their rationality. A rationalizable strategy will avoid strictly dominated

actions, and assuming other players have also eliminated their dominated actions, iteratively avoid strictly

dominated actions in the subgame. Rationalizability is a central solution concept in game theory (Osborne &

Rubinstein,1994) and has found applications in auctions (Battigalli & Siniscalchi,2003) and mechanism

design (Bergemann et al.,2011).

If an (approximate) equilibrium only employs rationalizable actions, it would prevent irrational behavior such

as playing dominated actions. Such equilibria are arguably more reasonable than unrationalizable ones, and

constitute a stronger solution concept. This motivates us to consider the following open question:

Can we efﬁciently learn equilibria that are also rationalizable?

Despite its fundamental role in multiagent reasoning, rationalizability is rarely studied from a learning

perspective until recently, with Wu et al. (2021) giving the ﬁrst algorithm for learning rationalizable action

proﬁles from bandit feedback. However, these rationalizable action proﬁles found are not necessarily

equilibria, and the problem of learning rationalizable CE and CCE remains a challenging open problem.

Due to the existence of unrationalizable equilibria, running standard CE or CCE learners will not guarantee

rationalizable solutions. On the other hand, one cannot hope to ﬁrst identify all rationalizable actions and

then ﬁnd an equilibrium on the subgame, since even determining whether an action is rationalizable requires

exponentially many samples (see Proposition 3). Therefore, achieving rationalizability and approximate

equilibria simultaneously is nontrivial and presents new algorithmic challenges.

In this work, we address the challenges above and give a positive answer to our main question. Our

contributions, summarized in Table 1, are the following:

•

As a ﬁrst step, we provide a simple yet sample-efﬁcient algorithm for identifying a

∆

-rationalizable

action proﬁle under bandit feedback, using only

OLNA

∆21

samples in normal-form games with

players,

actions per player and a minimum elimination length of

. This greatly improves the result

of Wu et al. (2021) and is tight up to logarithmic factors when L=O(1).

1Throughout this paper, we use

Oto suppress logarithmic factors in N,A,L,1

∆,1

δ, and 1

.

•

Using the above algorithm as a subroutine, we develop exponential weights based algorithms that

can provably ﬁnd

∆

-rationalizable



-CCE using

OLNA

∆2+NA

2

samples, and

∆

-rationalizable



-CE

using

OLNA

∆2+NA2

min{2,∆2}

samples. To the best of our knowledge, these are the ﬁrst guarantees for

learning rationalizable approximate CCE and CE.

•

We also provide reduction schemes that ﬁnd

∆

-rationalizable



-CCE/CE using black-box algorithms for



-CCE/CE. Despite having slightly worse rates, these algorithms can directly leverage the progress in

equilibria ﬁnding, which may be of independent interest.

1.1 Related work

Rationalizability and iterative dominance elimination.

Rationalizability (Bernheim,1984;Pearce,1984)

is a notion that captures rational reasoning in games and relaxes Nash Equilibrium. Rationalizability is

closely related to the iterative elimination of dominated actions, which has been a focus of game theory

research since the 1950s (Luce & Raiffa,1957). It can be shown that an action is rationalizable if and only

if it survives iterative elimination of strictly dominated actions

(Pearce,1984). There is also experimental

evidence supporting iterative elimination of dominated strategies as a model of human reasoning (Camerer,

2011).

Equilibria learning in games.

There is a rich literature on applying online learning algorithms to learning

equilibria in games. It is well-known that if all agents have no-regret, the resulting empirical average would be



-CCE (Young,2004), while if all agents have no swap-regret, the resulting empirical average would be an



-CE (Hart & Mas-Colell,2000;Cesa-Bianchi & Lugosi,2006). Later work continuing this line of research

include those with faster convergence rates (Syrgkanis et al.,2015;Chen & Peng,2020;Daskalakis et al.,

2021), last-iterate convergence guarantees (Daskalakis & Panageas,2018;Wei et al.,2020), and extension to

extensive-form games (Celli et al.,2020;Bai et al.,2022b,a;Song et al.,2022) and Markov games (Song

et al.,2021;Jin et al.,2021).

Computational and learning aspect of rationalizability.

Despite its conceptual importance, rationaliz-

ability and iterative dominance elimination are not well studied from a computational or learning perspective.

For iterative strict dominance elimination in two-player games, Knuth et al. (1988) provided a cubic-time

algorithm and proved that the problem is P-complete. The weak dominance version of the problem is proven

to be NP-complete by Conitzer & Sandholm (2005).

Hofbauer & Weibull (1996) showed that in a class of learning dynamics which includes replicator dynamics

— the continuous-time variant of FTRL, all iteratively strictly dominated actions vanish over time, while Mer-

tikopoulos & Moustakas (2010) proved similar results for stochastic replicator dynamics; however, neither

work provides ﬁnite-time guarantees. Cohen et al. (2017) proved that Hedge eliminates dominated actions in

ﬁnite time, but did not extend their results to the more challenging case of iteratively dominated actions.

The most related work in literature is the work on learning rationalizable actions by Wu et al. (2021),

who proposed the Exp3-DH algorithm to ﬁnd a strategy mostly supported on rationalizable actions with a

polynomial rate. Our Algorithm 2accomplishes the same task with a faster rate, while our Algorithms 3

&4deal with the more challenging problems of ﬁnding



-CE/CCE that are also rationalizable. Although

For this equivalence to hold, we need to allow dominance by mixed strategies, and correlated beliefs when there are more than

two players. These conditions are met in the setting of this work.

Exp3-DH is based on a no-regret algorithm, it does not enjoy regret or weighted regret guarantees and thus

does not provably ﬁnd rationalizable equilibria.

2 Preliminary

-player normal-form game involves

players whose action space are denoted by

A=A1×···×AN

and is deﬁned by utility functions

u1,··· , uN:A → [0,1]

. Let

A= maxi∈[N]|Ai|

denote the maximum

number of actions per player,

denote a mixed strategy of the

-th player (i.e., a distribution over

) and

x−i

denote a (correlated) mixed strategy of the other players (i.e., a distribution over

Qj6=iAj

). We use

∆(S)

to denote a distribution over the set S.

Learning from bandit feedback

We consider the bandit feedback setting where in each round, each player

i∈[N]chooses an action ai∈ Ai, and then observes a random feedback Ui∈[0,1] such that

E[Ui|a1, a2,··· , an] = ui(a1, a2,··· , an).

2.1 Rationalizability

An action

a∈ Ai

is said to be rationalizable if it could be the best response to some (possibly correlated)

belief of other players’ strategies, assuming that they are also rational. In other words, the set of rationalizable

actions is obtained by iteratively removing actions that could never be a best response. For ﬁnite normal-

form games, this is in fact equivalent to the iterative elimination of strictly dominated actions

(Osborne &

Rubinstein,1994, Lemma 60.1).

Deﬁnition 1 (∆-Rationalizability).Deﬁne

E1:=

[

i=1 {a∈ Ai:∃x∈∆(Ai),∀a−i, ui(a, a−i)≤ui(x, a−i)−∆},

which is the set of ∆-dominated actions for all players. Further deﬁne

El:=

[

i=1 {a∈ Ai:∃x∈∆(Ai),∀a−is.t. a−i∩El−1=∅, ui(a, a−i)≤ui(x, a−i)−∆},

which is the set of actions that would be eliminated by the

-th round. Deﬁne

L= inf{l:El+1 =El}

as the

minimum elimination length

, and

as the set of

∆

iteratively dominated actions

(

∆

-IDAs). Actions in

∪N

i=1Ai\ELare said to be ∆-rationalizable.

Notice that

E1⊆ ··· ⊆ EL=EL+1

. Here

∆

plays a similar role as the reward gap for best arm identiﬁcation

in stochastic multi-armed bandits. We will henceforth use

∆

-rationalizability and survival of

rounds of

iterative dominance elimination (IDE) interchangeably

. Since one cannot eliminate all the actions of a

player, | ∪N

i=1 Ai\EL| ≥ N, which further implies L≤N(A−1) < NA.

See, e.g., the Diamond-In-the-Rough (DIR) games in Wu et al. (2021, Deﬁnition 2) for a concrete example of iterative dominance

elimination.

Alternatively one can also deﬁne

∆

-rationalizability by the iterative elimination of actions that are never

∆

-best response, which

is mathematically equivalent to Deﬁnition 1(see Appendix A.1).

2.2 Equilibria in games

We consider three common learning objectives, namely Nash Equilibrium (NE), Correlated Equilibrium (CE)

and Coarse Correlated Equilibrium (CCE).

Deﬁnition 2 (Nash Equilibrium).A strategy proﬁle (x1,··· , xN)is an -Nash equilibrium if

ui(xi, x−i)≥ui(a, x−i)−, ∀a∈ Ai,∀i∈[N].

Deﬁnition 3

(Correlated Equilibrium)

A correlated strategy

Π∈∆(A)

is an



-correlated equilibrium if

∀i∈[N],∀φ:Ai→ Ai,

ai∈Ai,a−i∈A−i

Π(ai, a−i)ui(ai, a−i)≥X

ai∈Ai,a−i∈A−i

Π(ai, a−i)ui(φ(ai), a−i)−.

Deﬁnition 4

(Coarse Correlated Equilibrium)

A correlated strategy

Π∈∆(A)

is an



-CCE if

∀i∈

[N],∀a0∈ Ai,

ai∈Ai,a−i∈A−i

Π(ai, a−i)ui(ai, a−i)≥X

ai∈Ai,a−i∈A−i

Π(ai, a−i)ui(a0, a−i)−.

When

= 0

, the above deﬁnitions give exact Nash equilibrium, correlated equilibrium, and coarse correlated

equilibrium, respectively. It is well known that -NE are -CE, and -CE are -CCE.

2.3 Connection between Equilibria and Rationalizability

Exact equilibria.

It is known that all actions in the support of an exact NE and CE are rationalizable (Os-

borne & Rubinstein,1994, Lemma 56.2). However, one can easily construct an exact CCE that is fully

supported on dominated (hence, unrationalizable) actions (see e.g. Viossat & Zapechelnyuk (2013, Fig. 3)).

Approximate equilibria.

For



-Nash equilibrium with

 < poly(∆,1/N, 1/A)

), one can show that the

equilibrium is still mostly supported on rationalizable actions.

Proposition 1. If x∗= (x∗

1,··· , x∗

N)is an -Nash with  < ∆2

24N2A,∀i,Pra∼x∗

i[a∈EL]≤2L

∆.

Therefore, for two-player zero-sum games

, it is possible to run an approximate NE solver and automatically

ﬁnd a rationalizable



-NE. However, this method will induce a rather slow rate, and we will provide a much

more efﬁcient algorithm for ﬁnding rationalizable -NE in Section 5.

The connection between CE and rationalizability becomes quite different when it comes to approximate

equilibria, which are inevitable in the presence of noise. As shown by Wu et al. (2021, Theorem 1), an



-CE can be entirely supported on iteratively dominated actions, unless

=O(2−A)

. In other words,

rationalizability is not guaranteed by running an approximate CE solver unless with an extremely high

accuracy. Furthermore, since exact CCE can be fully supported on dominated actions, so is



-CCE regardless

how small



is. Therefore, efﬁciently ﬁnding



-CE and



-CCE that are simultaneously rationalizable remains

a challenging open problem.

5For multiplayer general-sum games, ﬁnding approximate NE is computationally hard and takes Ω(2N)samples in worst case.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningRationalizableEquilibriainMultiplayerGamesYuanhaoWang*1,DingwenKong*2,YuBai3,ChiJin11PrincetonUniversity,2PekingUniversity,3SalesforceResearchyuanhao@princeton.edu,dingwenk@pku.edu.cnyu.bai@salesforce.com,chij@princeton.eduOctober21,2022AbstractAnaturalgoalinmultiagentlearningbesidesnding...

收起<<

Learning Rationalizable Equilibria in Multiplayer Games Yuanhao Wang1 Dingwen Kong2 Yu Bai3 Chi Jin1 1Princeton University2Peking University3Salesforce Research.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Rationalizable Equilibria in Multiplayer Games Yuanhao Wang1 Dingwen Kong2 Yu Bai3 Chi Jin1 1Princeton University2Peking University3Salesforce Research

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: