Learning Rationalizable Equilibria in Multiplayer Games Yuanhao Wang1 Dingwen Kong2 Yu Bai3 Chi Jin1 1Princeton University2Peking University3Salesforce Research

2025-05-02 1 0 492.38KB 33 页 10玖币
侵权投诉
Learning Rationalizable Equilibria in Multiplayer Games
Yuanhao Wang*
1, Dingwen Kong*
2, Yu Bai3, Chi Jin1
1Princeton University, 2Peking University, 3Salesforce Research
yuanhao@princeton.edu, dingwenk@pku.edu.cn
yu.bai@salesforce.com, chij@princeton.edu
October 21, 2022
Abstract
A natural goal in multiagent learning besides finding equilibria is to learn rationalizable behavior,
where players learn to avoid iteratively dominated actions. However, even in the basic setting of
multiplayer general-sum games, existing algorithms require a number of samples exponential in the
number of players to learn rationalizable equilibria under bandit feedback. This paper develops the first
line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated
Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the
number of players. To achieve this result, we also develop a new efficient algorithm for the simpler task
of finding one rationalizable action profile (not necessarily an equilibrium), whose sample complexity
substantially improves over the best existing results of Wu et al. (2021). Our algorithms incorporate
several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a
correlated exploration scheme and adaptive learning rates, which may be of independent interest. We
complement our results with a sample complexity lower bound showing the sharpness of our guarantees.
1 Introduction
A common objective in multiagent learning is to find various equilibria, such as Nash equilibria (NE),
correlated equilibria (CE) and coarse correlated equilibria (CCE). Generally speaking, a player in equilibrium
lacks incentive to deviate assuming conformity of other players to the same equilibrium. Equilibrium learning
has been extensively studied in the literature of game theory and online learning, and no-regret based learners
can provably learn approximate CE and CCE with both computational and statistical efficiency (Stoltz,2005;
Cesa-Bianchi & Lugosi,2006).
However, not all equilibria are created equal. As shown by Viossat & Zapechelnyuk (2013), a CCE can
be entirely supported on dominated actions—actions that are worse off than some other strategy in all
circumstances—which rational agents should apparently never play. Approximate CE also suffers from a
similar problem. As shown by Wu et al. (2021, Theorem 1), there are examples where an
-CE always plays
iteratively dominated actions—actions that would be eliminated when iteratively deleting strictly dominated
actions—unless
is exponentially small. It is also shown that standard no-regret algorithms are indeed prone
to finding such seemingly undesirable solutions (Wu et al.,2021). The intrinsic reason behind this is that CCE
*Equal contribution.
1
arXiv:2210.11402v1 [cs.LG] 20 Oct 2022
Task Sample Complexity
Find all rationalizable actions (Proposition 3)Ω(AN1)
Find one rationalizable action profile (Theorem 4)e
OLNA
2
Learn rationalizable
equilibria
-CCE (Theorem 7)e
OLNA
2+NA
2
-CE (Theorem 12)e
OLNA
2+NA2
min{2,2}
Table 1: Summary of main results. Here
N
is the number of players,
A
is the number of actions per player,
L < N A is the minimum elimination length and is the error we allow for rationalizability.
and approximate CE may not be rationalizable, and existing algorithms can indeed fail to find rationalizable
solutions.
Different from equilibria notions, rationalizability (Bernheim,1984;Pearce,1984) looks at the game from
the perspective of a single player without knowledge of the actual strategies of other players, and only
assumes common knowledge of their rationality. A rationalizable strategy will avoid strictly dominated
actions, and assuming other players have also eliminated their dominated actions, iteratively avoid strictly
dominated actions in the subgame. Rationalizability is a central solution concept in game theory (Osborne &
Rubinstein,1994) and has found applications in auctions (Battigalli & Siniscalchi,2003) and mechanism
design (Bergemann et al.,2011).
If an (approximate) equilibrium only employs rationalizable actions, it would prevent irrational behavior such
as playing dominated actions. Such equilibria are arguably more reasonable than unrationalizable ones, and
constitute a stronger solution concept. This motivates us to consider the following open question:
Can we efficiently learn equilibria that are also rationalizable?
Despite its fundamental role in multiagent reasoning, rationalizability is rarely studied from a learning
perspective until recently, with Wu et al. (2021) giving the first algorithm for learning rationalizable action
profiles from bandit feedback. However, these rationalizable action profiles found are not necessarily
equilibria, and the problem of learning rationalizable CE and CCE remains a challenging open problem.
Due to the existence of unrationalizable equilibria, running standard CE or CCE learners will not guarantee
rationalizable solutions. On the other hand, one cannot hope to first identify all rationalizable actions and
then find an equilibrium on the subgame, since even determining whether an action is rationalizable requires
exponentially many samples (see Proposition 3). Therefore, achieving rationalizability and approximate
equilibria simultaneously is nontrivial and presents new algorithmic challenges.
In this work, we address the challenges above and give a positive answer to our main question. Our
contributions, summarized in Table 1, are the following:
As a first step, we provide a simple yet sample-efficient algorithm for identifying a
-rationalizable
action profile under bandit feedback, using only
e
OLNA
21
samples in normal-form games with
N
players,
A
actions per player and a minimum elimination length of
L
. This greatly improves the result
of Wu et al. (2021) and is tight up to logarithmic factors when L=O(1).
1Throughout this paper, we use
e
Oto suppress logarithmic factors in N,A,L,1
,1
δ, and 1
.
2
Using the above algorithm as a subroutine, we develop exponential weights based algorithms that
can provably find
-rationalizable
-CCE using
e
OLNA
2+NA
2
samples, and
-rationalizable
-CE
using
e
OLNA
2+NA2
min{2,2}
samples. To the best of our knowledge, these are the first guarantees for
learning rationalizable approximate CCE and CE.
We also provide reduction schemes that find
-rationalizable
-CCE/CE using black-box algorithms for
-CCE/CE. Despite having slightly worse rates, these algorithms can directly leverage the progress in
equilibria finding, which may be of independent interest.
1.1 Related work
Rationalizability and iterative dominance elimination.
Rationalizability (Bernheim,1984;Pearce,1984)
is a notion that captures rational reasoning in games and relaxes Nash Equilibrium. Rationalizability is
closely related to the iterative elimination of dominated actions, which has been a focus of game theory
research since the 1950s (Luce & Raiffa,1957). It can be shown that an action is rationalizable if and only
if it survives iterative elimination of strictly dominated actions
2
(Pearce,1984). There is also experimental
evidence supporting iterative elimination of dominated strategies as a model of human reasoning (Camerer,
2011).
Equilibria learning in games.
There is a rich literature on applying online learning algorithms to learning
equilibria in games. It is well-known that if all agents have no-regret, the resulting empirical average would be
an
-CCE (Young,2004), while if all agents have no swap-regret, the resulting empirical average would be an
-CE (Hart & Mas-Colell,2000;Cesa-Bianchi & Lugosi,2006). Later work continuing this line of research
include those with faster convergence rates (Syrgkanis et al.,2015;Chen & Peng,2020;Daskalakis et al.,
2021), last-iterate convergence guarantees (Daskalakis & Panageas,2018;Wei et al.,2020), and extension to
extensive-form games (Celli et al.,2020;Bai et al.,2022b,a;Song et al.,2022) and Markov games (Song
et al.,2021;Jin et al.,2021).
Computational and learning aspect of rationalizability.
Despite its conceptual importance, rationaliz-
ability and iterative dominance elimination are not well studied from a computational or learning perspective.
For iterative strict dominance elimination in two-player games, Knuth et al. (1988) provided a cubic-time
algorithm and proved that the problem is P-complete. The weak dominance version of the problem is proven
to be NP-complete by Conitzer & Sandholm (2005).
Hofbauer & Weibull (1996) showed that in a class of learning dynamics which includes replicator dynamics
— the continuous-time variant of FTRL, all iteratively strictly dominated actions vanish over time, while Mer-
tikopoulos & Moustakas (2010) proved similar results for stochastic replicator dynamics; however, neither
work provides finite-time guarantees. Cohen et al. (2017) proved that Hedge eliminates dominated actions in
finite time, but did not extend their results to the more challenging case of iteratively dominated actions.
The most related work in literature is the work on learning rationalizable actions by Wu et al. (2021),
who proposed the Exp3-DH algorithm to find a strategy mostly supported on rationalizable actions with a
polynomial rate. Our Algorithm 2accomplishes the same task with a faster rate, while our Algorithms 3
&4deal with the more challenging problems of finding
-CE/CCE that are also rationalizable. Although
2
For this equivalence to hold, we need to allow dominance by mixed strategies, and correlated beliefs when there are more than
two players. These conditions are met in the setting of this work.
3
Exp3-DH is based on a no-regret algorithm, it does not enjoy regret or weighted regret guarantees and thus
does not provably find rationalizable equilibria.
2 Preliminary
An
N
-player normal-form game involves
N
players whose action space are denoted by
A=A1×···×AN
,
and is defined by utility functions
u1,··· , uN:A → [0,1]
. Let
A= maxi[N]|Ai|
denote the maximum
number of actions per player,
xi
denote a mixed strategy of the
i
-th player (i.e., a distribution over
Ai
) and
xi
denote a (correlated) mixed strategy of the other players (i.e., a distribution over
Qj6=iAj
). We use
∆(S)
to denote a distribution over the set S.
Learning from bandit feedback
We consider the bandit feedback setting where in each round, each player
i[N]chooses an action ai∈ Ai, and then observes a random feedback Ui[0,1] such that
E[Ui|a1, a2,··· , an] = ui(a1, a2,··· , an).
2.1 Rationalizability
An action
a∈ Ai
is said to be rationalizable if it could be the best response to some (possibly correlated)
belief of other players’ strategies, assuming that they are also rational. In other words, the set of rationalizable
actions is obtained by iteratively removing actions that could never be a best response. For finite normal-
form games, this is in fact equivalent to the iterative elimination of strictly dominated actions
3
(Osborne &
Rubinstein,1994, Lemma 60.1).
Definition 1 (-Rationalizability).Define
E1:=
N
[
i=1 {a∈ Ai:x∆(Ai),ai, ui(a, ai)ui(x, ai)},
which is the set of -dominated actions for all players. Further define
El:=
N
[
i=1 {a∈ Ai:x∆(Ai),ais.t. aiEl1=, ui(a, ai)ui(x, ai)},
which is the set of actions that would be eliminated by the
l
-th round. Define
L= inf{l:El+1 =El}
as the
minimum elimination length
, and
EL
as the set of
-
iteratively dominated actions
(
-IDAs). Actions in
N
i=1Ai\ELare said to be -rationalizable.
Notice that
E1⊆ ··· ⊆ EL=EL+1
. Here
plays a similar role as the reward gap for best arm identification
in stochastic multi-armed bandits. We will henceforth use
-rationalizability and survival of
L
rounds of
iterative dominance elimination (IDE) interchangeably
4
. Since one cannot eliminate all the actions of a
player, | ∪N
i=1 Ai\EL| ≥ N, which further implies LN(A1) < NA.
3
See, e.g., the Diamond-In-the-Rough (DIR) games in Wu et al. (2021, Definition 2) for a concrete example of iterative dominance
elimination.
4
Alternatively one can also define
-rationalizability by the iterative elimination of actions that are never
-best response, which
is mathematically equivalent to Definition 1(see Appendix A.1).
4
2.2 Equilibria in games
We consider three common learning objectives, namely Nash Equilibrium (NE), Correlated Equilibrium (CE)
and Coarse Correlated Equilibrium (CCE).
Definition 2 (Nash Equilibrium).A strategy profile (x1,··· , xN)is an -Nash equilibrium if
ui(xi, xi)ui(a, xi), a∈ Ai,i[N].
Definition 3
(Correlated Equilibrium)
.
A correlated strategy
Π∆(A)
is an
-correlated equilibrium if
i[N],φ:Ai→ Ai,
X
ai∈Ai,ai∈Ai
Π(ai, ai)ui(ai, ai)X
ai∈Ai,ai∈Ai
Π(ai, ai)ui(φ(ai), ai).
Definition 4
(Coarse Correlated Equilibrium)
.
A correlated strategy
Π∆(A)
is an
-CCE if
i
[N],a0∈ Ai,
X
ai∈Ai,ai∈Ai
Π(ai, ai)ui(ai, ai)X
ai∈Ai,ai∈Ai
Π(ai, ai)ui(a0, ai).
When
= 0
, the above definitions give exact Nash equilibrium, correlated equilibrium, and coarse correlated
equilibrium, respectively. It is well known that -NE are -CE, and -CE are -CCE.
2.3 Connection between Equilibria and Rationalizability
Exact equilibria.
It is known that all actions in the support of an exact NE and CE are rationalizable (Os-
borne & Rubinstein,1994, Lemma 56.2). However, one can easily construct an exact CCE that is fully
supported on dominated (hence, unrationalizable) actions (see e.g. Viossat & Zapechelnyuk (2013, Fig. 3)).
Approximate equilibria.
For
-Nash equilibrium with
 < poly(∆,1/N, 1/A)
), one can show that the
equilibrium is still mostly supported on rationalizable actions.
Proposition 1. If x= (x
1,··· , x
N)is an -Nash with  < 2
24N2A,i,Prax
i[aEL]2L
.
Therefore, for two-player zero-sum games
5
, it is possible to run an approximate NE solver and automatically
find a rationalizable
-NE. However, this method will induce a rather slow rate, and we will provide a much
more efficient algorithm for finding rationalizable -NE in Section 5.
The connection between CE and rationalizability becomes quite different when it comes to approximate
equilibria, which are inevitable in the presence of noise. As shown by Wu et al. (2021, Theorem 1), an
-CE can be entirely supported on iteratively dominated actions, unless
=O(2A)
. In other words,
rationalizability is not guaranteed by running an approximate CE solver unless with an extremely high
accuracy. Furthermore, since exact CCE can be fully supported on dominated actions, so is
-CCE regardless
how small
is. Therefore, efficiently finding
-CE and
-CCE that are simultaneously rationalizable remains
a challenging open problem.
5For multiplayer general-sum games, finding approximate NE is computationally hard and takes Ω(2N)samples in worst case.
5
摘要:

LearningRationalizableEquilibriainMultiplayerGamesYuanhaoWang*1,DingwenKong*2,YuBai3,ChiJin11PrincetonUniversity,2PekingUniversity,3SalesforceResearchyuanhao@princeton.edu,dingwenk@pku.edu.cnyu.bai@salesforce.com,chij@princeton.eduOctober21,2022AbstractAnaturalgoalinmultiagentlearningbesidesnding...

收起<<
Learning Rationalizable Equilibria in Multiplayer Games Yuanhao Wang1 Dingwen Kong2 Yu Bai3 Chi Jin1 1Princeton University2Peking University3Salesforce Research.pdf

共33页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:33 页 大小:492.38KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 33
客服
关注