Tuning Rate of Strategy Revision in Population Games Shinkyu Park Abstract We investigate a multi-agent decision problem in

2025-04-24 0 0 533.01KB 8 页 10玖币
侵权投诉
Tuning Rate of Strategy Revision in Population Games
Shinkyu Park
Abstract We investigate a multi-agent decision problem in
population games where each agent in a population makes a
decision on strategy selection and revision to engage in repeated
games with others. The strategy revision is subject to time
delays which represent the time it takes for an agent revising
its strategy needs to spend before it can adopt a new strategy
and return back to the game. We discuss the effect of the time
delays on long-term behavior of the agents’ strategy revision. In
particular, when the time delays are large, the strategy revision
would exhibit oscillation and the agents spend substantial time
in “transitioning” between different strategies, which prevents
the agents from attaining the Nash equilibrium of the game. As
a main contribution of the paper, we propose an algorithm that
tunes the rate of the agents’ strategy revision and show such
tuning approach ensures convergence to the Nash equilibrium.
We validate our analytical results using simulations.
I. INTRODUCTION
Consider a multi-agent decision problem where each agent
selects a strategy to engage in repeated interactions with
other agents. Agents receive payoffs as a function of their
strategy profile – the distribution of their strategy selection –
and revise their strategy selection based on the payoffs. Such
problem setting is prevalent in many engineering applications
[1]–[10] ranging from demand response in smart grids to task
allocation in multi-robot systems research.
We adopt the population game formalism [11] to model the
process of the agents’ strategy revision and assess long-term
behavior of the revision process. Different from conventional
population game problems, in this work, we investigate the
scenario where the strategy revision is subject to time delays
for which every agent revising its strategy needs to spend a
fixed amount of time for transitioning to a new strategy. As
a case in point, in multi-robot task allocation applications
[8], [10], [12] where the strategy revision corresponds to
assignment of a new task to a robot, time delays in the
strategy revision reflect the requirement for each robot to
travel to a distant location to take on a new task.
When the agents’ strategy revision is subject to time
delays, as we discuss in the paper, their strategy profile
would exhibit oscillation implying that a portion of the
agent population persistently transitions between different
strategies. In the task allocation applications, this means that
the robots would spend substantial time on traveling between
distant locations which is a disadvantage since during the
traveling, they cannot carry out assigned tasks. The authors
of [8], [12] concretely discuss such phenomenon observed in
This work was supported by funding from King Abdullah University of
Science and Technology (KAUST).
Park is with the Electrical and Computer Engineering, King Abdullah
University of Science and Technology (KAUST), Thuwal 23955, Saudi
Arabia. shinkyu.park@kaust.edu.sa
their linear models, and empirical results presented in [10]
illustrate the oscillation of the strategy profile in population
game models.
As a main contribution, we propose an algorithm that
tunes the rate of the agents’ strategy revision to eliminate
the oscillation of the strategy profile, and prove that the
algorithm ensures convergence of the strategy profile to
the Nash equilibrium. Of relevance to this work are [13],
[14, Section 5.9] and references therein that investigate the
scenarios where payoff mechanisms underlying population
games are subject to time delays and hence the agents revise
their strategy selection based on time-delayed payoffs. Also,
[15] examines a relevant problem in which, when revising its
strategy, each agent needs to take on a sequence of sub-tasks
(sub-strategies) which would potentially cause time delays in
the agent’s strategy revision.
Our contributions are distinct from existing work in the
following aspects: (i) unlike [13], [14, Section 5.9] and
references therein, our problem formulation considers that
the agents’ strategy revision is subject to time delays, which
is different from the delays in payoff mechanisms, and (ii)
different from [8], [12], [15], our proposed approach ensures
convergence of the strategy profile to the Nash equilibrium,
which, in our problem formulation, requires that no agents
are in transition or taking on any sub-strategies. Below we
summarize the main results of the paper.
We model and analyze the effect of time delays in the
strategy revision using the population game formalism
and discuss long-term behavior of the agents’ strategy
profile. In particular, our analysis explains how the rate
of the agents’ strategy revision affects the convergence
of the strategy profile.
Based on the analysis, we propose an algorithm for tun-
ing the strategy revision rate. The algorithm judiciously
decreases the revision rate and ensures the convergence
of the strategy profile to the Nash equilibrium in con-
tractive population games. We illustrate our analytical
results using simulations.
The paper is organized as follows. In Section II, we intro-
duce the population game framework and strategy revision
model we adopt in this study. We also illustrate how the rate
of the agents’ strategy revision affects long-term behavior of
their strategy profile. In Section III, we propose an algorithm
that judiciously tunes the strategy revision rate and show that
the proposed algorithm ensures the convergence of the strat-
egy profile to the Nash equilibrium in the class of contractive
population games.1We illustrate our analytical results using
1See Definition 1 for its formal definition.
arXiv:2210.05472v2 [eess.SY] 10 Apr 2023
simulations with a numerical example in Section IV. We
conclude the paper with discussions and future plans in
Section V.
Notation: We denote by Rnthe set of n-dimensional
real vectors and by Rn
+the set of (element-wise) non-
negative n-dimensional real vectors. Throughout the paper,
we adopt the Euclidean norm for matrices and vectors.
II. PROBLEM DESCRIPTION
We describe the foundation of the population game for-
malism [11] and explain how we adopt it in our multi-agent
decision problem to study the effect of time delays in strategy
revision.
A. Population Games and Evolutionary Dynamics
Consider that a population of decision-making agents are
engaged in repeated strategic interactions with one another.
Each agent has a finite set {1,· · · , n}of strategies available
and can select one strategy at a time. Based on its strategy
selection, the agent receives a payoff determined by a payoff
mechanism underlying the interactions.
Adopting the conventional notation and formalism, we de-
note the state of the population by x= (x1,· · · , xn)Rn
+,
where xirepresents the portion of the population selecting
strategy iand xsums up to 1, i.e., Pn
i=1 xi= 1. A payoff
vector p= (p1,· · · , pn)Rnis determined by a continuous
function Fas p=F(x), where piis the payoff assigned to
the agents selecting strategy i. Assuming that there are a
large number of agents in the population, we can define the
set of viable population states as X={zRn
+|Pn
i=1 zi=1}.
Contractive population games are defined as follows.
Definition 1 (Contractive Population Game [16]): A
population game Fis called contractive if it holds that
(wz)T(F(w)− F(z)) 0,w, z X.(1)
Assumption 1: We assume that the differential map DF
of Fexists and is bounded: there is a positive constant BDF
satisfying kDF(z)k2BDF,zX.
Note that when DFexists, the requirement (1) can be cast
as
˜zTDF(z)˜z0,zX,˜zTX,(2)
where TXis the tangent space of X.
The Nash equilibrium of Fis defined as follows.
Definition 2 (Nash Equilibrium): An element zNE in Xis
called the Nash equilibrium of a population game Fif it
holds that
(zNE z)TF(zNE)0,zX.(3)
A population game Fcan have multiple Nash equilibria. We
denote by NE(F)the set of Nash equilibria. We adopt the
following example to illustrate our main results.2
2For a simple presentation of the paper, we use the RPS game for the
illustration purpose.
Example 1: Consider the Rock-Paper-Scissors (RPS)
game (n= 3) whose payoff function is given by
F(x) =
ax2+bx3
bx1ax3
ax1+bx2
,(4)
where a, b are positive constants. Note that (4) is a contrac-
tive game if ba, and the Nash equilibrium of Fis given
by (1/3,1/3,1/3) for any a, b > 0.
In our study, we consider that the agents are repeatedly
engaged in a same game and given an opportunity, they can
revise their strategy selection based on the Smith revision
protocol, originally presented in [9] and defined as follows.3
ρij (p) = %[pjpi]+:= (%(pjpi)if pjpi
0otherwise ,(5)
where %is a positive constant satisfying4
Pn
j=1 %[pjpi]+1and %[pjpi]+1.(6)
According to (5), each agent switches its strategy selection i
to jwith probability %[pjpi]+; otherwise, it stays with its
current strategy selection iwith probability 1Pn
j=1 %[pj
pi]+. Hence, higher the payoff associated with strategy j
more likely the agent selects it. The condition (6) is required
for such probabilistic strategy revision scheme to be well-
defined.
The point in time at which the agents’ strategy revision
takes place is determined by identical and independent
Poisson processes with parameter λ. In particular, each agent
can revise its strategy selection at each jump time of an
associated Poisson process. Since the parameter λdetermines
the rate of the strategy revision, we refer to it as the strategy
revision rate.
Let x(t)=(x1(t),· · · , xn(t)) be the population state at
time instant tdriven by the revision protocol (5), the Poisson
processes, and payoffs determined by a population game F.
By same discussions presented in [11, Chapter 10], as the
population size tends to infinity, the solution to the following
state equation approximates the population state x(t)with
arbitrary accuracy.
˙xi(t) = Vi(x(t), p(t))
:= λPn
j=1 xj(t)%[pi(t)pj(t)]+
xi(t)Pn
j=1 %[pj(t)pi(t)]+(7)
Adopting the same naming convention as in [17], we refer
to (7) as Evolutionary Dynamics Model (EDM).
B. δ-Passivity Tools for Convergence Analysis
A focal research theme in population games is in estab-
lishing convergence of the population state x(t), governed
by (7), to the Nash equilibrium set NE(F)of an underlying
3Although the analysis we present in this paper can be extended to other
class of strategy revision protocols. However, for a concise presentation of
our main result, we adopt the Smith protocol.
4Such constant %exists since Fis a continuous function and Xis a
compact subset of Rn.
摘要:

TuningRateofStrategyRevisioninPopulationGamesShinkyuParkAbstract—Weinvestigateamulti-agentdecisionprobleminpopulationgameswhereeachagentinapopulationmakesadecisiononstrategyselectionandrevisiontoengageinrepeatedgameswithothers.Thestrategyrevisionissubjecttotimedelayswhichrepresentthetimeittakesforan...

展开>> 收起<<
Tuning Rate of Strategy Revision in Population Games Shinkyu Park Abstract We investigate a multi-agent decision problem in.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:533.01KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注