MPOGames Efficient Multimodal Partially Observable Dynamic Games Oswin So12 Paul Drews2 Thomas Balch2 Velin Dimitrov2 Guy Rosman2 Evangelos A. Theodorou1 Abstract Game theoretic methods have become popular for

2025-05-02 0 0 6.03MB 8 页 10玖币
侵权投诉
MPOGames: Efficient Multimodal Partially Observable Dynamic Games
Oswin So1,2,, Paul Drews2, Thomas Balch2, Velin Dimitrov2, Guy Rosman2, Evangelos A. Theodorou1
Abstract Game theoretic methods have become popular for
planning and prediction in situations involving rich multi-agent
interactions. However, these methods often assume the existence
of a single local Nash equilibria and are hence unable to
handle uncertainty in the intentions of different agents. While
maximum entropy (MaxEnt) dynamic games try to address this
issue, practical approaches solve for MaxEnt Nash equilibria
using linear-quadratic approximations which are restricted to
unimodal responses and unsuitable for scenarios with multiple
local Nash equilibria. By reformulating the problem as a POMDP,
we propose MPOGames, a method for efficiently solving MaxEnt
dynamic games that captures the interactions between local
Nash equilibria. We show the importance of uncertainty-aware
game theoretic methods via a two-agent merge case study.
Finally, we prove the real-time capabilities of our approach
with hardware experiments on a 1/10th scale car platform.
I. INTRODUCTION AND RELATED WORK
In recent years, robots have been seen increasing use
in applications with multiple interacting agents. In these
applications, actions taken by the robot cannot be considered
in isolation and must take the responses of other agents into
account, especially in areas such as autonomous driving [1],
robotic arms [2] and surgical robotics [3] where failures
can be catastrophic. For example, in the autonomous driving
setting, two agents at a merge must negotiate and agree on
which agent merges first (Fig. 1). Failure to consider the
responses of the other agent could result in serious collisions.
Game-theoretic (GT) approaches have become increasingly
popular for planning in multi-agent scenarios [412]. By
assuming that agents act rationally, these approaches decouple
the problem of planning and prediction into the problems
of finding objective functions for each agent and solving for
Nash equilibrium of the resulting non-cooperative game. This
provides a more principled and interpretable alternative to
the prediction of other agents when compared to black-box
approaches that use deep neural networks [13,14].
In practice, assuming rational agents is too strict. Humans
face cognitive limitations and make irrational decisions that
are satisfactory but suboptimal, an idea known as “bounded
rationality” or “noisy rationality” [15,16]. Even autonomous
agents are rarely exactly optimal and often make suboptimal
decisions due to finite computational resources. One method
of addressing this disconnect is the Maximum Entropy
(MaxEnt) framework, which has been applied to diverse areas
1Georgia Institute of Technology.
2Toyota Research Institute.
Corresponding Author. oswinso@gatech.edu.
This work was supported by the Toyota Research Institute (TRI). This
article solely reflects the opinions and conclusions of its authors and not
TRI or any other Toyota entity. Their support is gratefully acknowledged.
2023 IEEE International Conference on Robotics and Automation (ICRA)
?
Fig. 1: The ego agent (left) seeks to merge into a lane safely
without collisions. With multiple local minima present,
MPOGames predicts likely plans for other agents, infers their
probabilities, then hedges against all possibilities. Methods
that fail to do so may result in catastrophic collisions.
such as inverse reinforcement learning [7,17], forecasting
[18] and biology [19]. In particular, this combination of
MaxEnt and GT has been proposed for learning objectives
for agents from a traffic dataset [7]. However, since the exact
solution is computationally intractable, the authors use a
linear quadratic (LQ) approximation of the game for which
a closed-form solution can be efficiently obtained. The LQ
approximation, popular among recent approaches (e.g., [5,
7]), only considers a single local minimum and consequently
produces unimodal predictions for each agent. This fails to
capture complex uncertainty-dependent hedging behaviors
which mediate between other agents’ possible plans in the
true solution of the MaxEnt game.
These hedging behaviors are closely related to partially
observable Markov decision process (POMDP) in the
single-player setting, where solutions must consider the
distribution of states when solving for the optimal controls.
Similar multi-agent scenarios have been considered via
POMDPs in [2022]. Also related are partially observable
stochastic games which are usually intractable [1] and mostly
limited to the discrete setting [2326].
One big challenge for methods that seek to address partial
observability and multimodality is the problem of computa-
tional efficiency and scalability (e.g., in the case of POMDPs).
Furthermore, only a few of the existing GT methods perform
experiments on hardware [10,11,27], where the robustness
of the method to noise and delays in the system is crucial. To
tackle both of these issues, we present MPOGames, a frame-
work for efficiently solving multimodal MaxEnt dynamic
games by reformulating the problem as a game of incomplete
information and efficiently finding approximate solutions to
the equivalent POMDP problem. We showcase the real-time
potential of our method via extensive hardware comparisons.
arXiv:2210.10814v2 [cs.GT] 23 May 2023
Our contributions in this work are as follows.
1)
A computationally efficient framework for GT planning
in multi-agent scenarios that produces more accurate
solutions in the presence of multiple local minima
compared to previous works.
2)
A case-study in simulation comparing how GT methods
handle multimodality that show the importance of
inference and handling uncertainty.
3)
Analysis of the performance of GT methods in
multimodal situations with a 1/10th scale car platform.
These results demonstrate the robustness and tractability
of MPOGames on hardware.
II. PRELIMINARIES
A. Discrete Dynamic Games and Nash Equilibria
We consider a discrete dynamic game with
N
players.
Let
ut=[u1
t,...,uN
t]=[ui
t,u¬i
t]Rnu
denote the joint control
vector for all players, where we use
(·)i
and
(·)¬i
to denote
the partition of
(·)
into parts belonging to agent
i
and other
agents respectively. We denote by
xtRnx
the full state of
the system at timestep twith nonlinear dynamics
xt+1 =f(xt,u1
t,...,uN
t)=f(xt,ut).(1)
Time indices are dropped when they are clear from the context.
Each agent’s objective is to minimize a corresponding finite-
horizon cost function
Ji
with running cost
li
and terminal cost
Φiwith respect to the control trajectory U=[u1,...,uT1]
min
Ui∈U iJi(Ui,U¬i) = min
Ui∈U iΦi(xT)+
T1
X
t=1
li(xt,ut),(2)
where
Ui
denotes the set of feasible control trajectories for
agent
i
. Note that the objective function
Ji
is a function
of both
Ui
and
U¬i
, but the optimization is only over
Ui
.
Hence, the quantity of interest here is the Nash Equilibrium
(NE), i.e., find controls Usuch that for each agent i,
Ji(Ui,U¬i)Ji(Ui,U¬i),UiUi.(3)
B. Maximum Entropy Dynamic Games
Agents rarely act exactly optimally due to cognitive or
computational limitations and settle for satisfactory decisions
[16]. We can model suboptimality via the noisy rationality
model [15], where controls are stochastic according to the
MaxEnt framework [17,28]. Consider a stochastic control
policy
πi(ui,x)
and introduce an entropy regularization term
to the original objective
(2)
. We then consider the expected
cost under all agents’ policies π
Ji(π)= E
utπ"Φi(xT)+
T1
X
t=1
li(xt,ut)1
βH[πi(·|xt)]#,
(4)
where
β > 0
denotes inverse temperature or the rationality
coefficient, and
H[π]
is the Shannon entropy of
π
, defined as
H[π]:=E
uπ[logπ(u)]=Zπ(u)logπ(u)du. (5)
The resulting MaxEnt NE problem (referred to as “Entropic
Cost Equilibrium” in [7]) is then to find stochastic policies
πsuch that the following holds for each agent
Ji(πi¬i)Ji(πi¬i),πiΠi,(6)
for feasible control policies
Πi
from any initial state. Let
the value function
Vi
for agent
i
given the policies of other
agents π¬ibe
Vi
t(xt):= inf
πi{Ji(xi,π¬i)}.(7)
The optimal policy πitakes the form [7,9]
πi(ui|xt)=Zi(xt)1expβQ(xt,ui),(8)
where
Qi(xt,ui
t):=E
π¬iVit+1(xt+1)+li(xt,u),(9)
and Zidenotes the partition function
Zi(xt):=ZexpβE
π¬iVit+1(xt+1)+li(xt,u)dui.
(10)
Then, the value function
Vi
takes the form [29, Appendix A]
Vi
t(xt)=lnZi(xt).(11)
While this gives us an expression for
πi
and
Vi
, it is
generally intractable to compute the integral
(10)
and solve
for πand Ziunder general nonlinear costs and dynamics.
One way of resolving this problem is to approximate the
problem as a linear-quadratic (LQ) game [7,9] in a manner
similar to iLQR [30], an approach also taken in the iLQGames
[5,31] family of NE solvers. By doing so, the approximate
game can be solved exactly [9, Lemma 1 and Lemma 2].
C. LQ is problematic for Multiodal MaxEnt NE
While taking LQ approximations yields a computationally
efficient closed-form simulation for both the original NE
and the MaxEnt NE problem, this approximation can be
especially problematic in the MaxEnt NE setting. While the
value function in the deterministic case does not depend on
the cost function at states away from the NE, this is not the
case for MaxEnt NE due to the integral in (8) and (11).
Illustrative Toy Example. We illustrate this using a toy
example where the MaxEnt NE can be solved for in closed-
form. Consider the following two-agent single timestep
dynamic game with
x= [x1,x2]R2
and
u= [u1,u2]R2
with single-integrator dynamics and costs defined by
J1=1
2(u1
0)2+3
2(x1
1x2
1)2,(12)
J2=1
2(u2
0)2+Φ2(x2
1),(13)
Φ2(x2
1)=σSM3
2(x2
11)2,3
2(x2
1+1)2+ϵ,(14)
σSM(a,b):=lnea+eb,(15)
where
ϵ > 0
controls the “height” of the left local minima
(Fig. 2top right) and
x1
0=x2
0= 0
. The
σSM
in
(14)
acts
摘要:

MPOGames:EfficientMultimodalPartiallyObservableDynamicGamesOswinSo1,2,∗,PaulDrews2,ThomasBalch2,VelinDimitrov2,GuyRosman2,EvangelosA.Theodorou1Abstract—Gametheoreticmethodshavebecomepopularforplanningandpredictioninsituationsinvolvingrichmulti-agentinteractions.However,thesemethodsoftenassumetheexis...

展开>> 收起<<
MPOGames Efficient Multimodal Partially Observable Dynamic Games Oswin So12 Paul Drews2 Thomas Balch2 Velin Dimitrov2 Guy Rosman2 Evangelos A. Theodorou1 Abstract Game theoretic methods have become popular for.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:6.03MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注