MPOGames Efficient Multimodal Partially Observable Dynamic Games Oswin So12 Paul Drews2 Thomas Balch2 Velin Dimitrov2 Guy Rosman2 Evangelos A. Theodorou1 Abstract Game theoretic methods have become popular for

2025-05-02 0 0 6.03MB 8 页 10玖币

侵权投诉

MPOGames: Efﬁcient Multimodal Partially Observable Dynamic Games

Oswin So1,2,∗, Paul Drews2, Thomas Balch2, Velin Dimitrov2, Guy Rosman2, Evangelos A. Theodorou1

Abstract— Game theoretic methods have become popular for

planning and prediction in situations involving rich multi-agent

interactions. However, these methods often assume the existence

of a single local Nash equilibria and are hence unable to

handle uncertainty in the intentions of different agents. While

maximum entropy (MaxEnt) dynamic games try to address this

issue, practical approaches solve for MaxEnt Nash equilibria

using linear-quadratic approximations which are restricted to

unimodal responses and unsuitable for scenarios with multiple

local Nash equilibria. By reformulating the problem as a POMDP,

we propose MPOGames, a method for efﬁciently solving MaxEnt

dynamic games that captures the interactions between local

Nash equilibria. We show the importance of uncertainty-aware

game theoretic methods via a two-agent merge case study.

Finally, we prove the real-time capabilities of our approach

with hardware experiments on a 1/10th scale car platform.

I. INTRODUCTION AND RELATED WORK

In recent years, robots have been seen increasing use

in applications with multiple interacting agents. In these

applications, actions taken by the robot cannot be considered

in isolation and must take the responses of other agents into

account, especially in areas such as autonomous driving [1],

robotic arms [2] and surgical robotics [3] where failures

can be catastrophic. For example, in the autonomous driving

setting, two agents at a merge must negotiate and agree on

which agent merges ﬁrst (Fig. 1). Failure to consider the

responses of the other agent could result in serious collisions.

Game-theoretic (GT) approaches have become increasingly

popular for planning in multi-agent scenarios [4–12]. By

assuming that agents act rationally, these approaches decouple

the problem of planning and prediction into the problems

of ﬁnding objective functions for each agent and solving for

Nash equilibrium of the resulting non-cooperative game. This

provides a more principled and interpretable alternative to

the prediction of other agents when compared to black-box

approaches that use deep neural networks [13,14].

In practice, assuming rational agents is too strict. Humans

face cognitive limitations and make irrational decisions that

are satisfactory but suboptimal, an idea known as “bounded

rationality” or “noisy rationality” [15,16]. Even autonomous

agents are rarely exactly optimal and often make suboptimal

decisions due to ﬁnite computational resources. One method

of addressing this disconnect is the Maximum Entropy

(MaxEnt) framework, which has been applied to diverse areas

1Georgia Institute of Technology.

2Toyota Research Institute.

∗Corresponding Author. oswinso@gatech.edu.

This work was supported by the Toyota Research Institute (TRI). This

article solely reﬂects the opinions and conclusions of its authors and not

TRI or any other Toyota entity. Their support is gratefully acknowledged.

2023 IEEE International Conference on Robotics and Automation (ICRA)

Fig. 1: The ego agent (left) seeks to merge into a lane safely

without collisions. With multiple local minima present,

MPOGames predicts likely plans for other agents, infers their

probabilities, then hedges against all possibilities. Methods

that fail to do so may result in catastrophic collisions.

such as inverse reinforcement learning [7,17], forecasting

[18] and biology [19]. In particular, this combination of

MaxEnt and GT has been proposed for learning objectives

for agents from a trafﬁc dataset [7]. However, since the exact

solution is computationally intractable, the authors use a

linear quadratic (LQ) approximation of the game for which

a closed-form solution can be efﬁciently obtained. The LQ

approximation, popular among recent approaches (e.g., [5,

7]), only considers a single local minimum and consequently

produces unimodal predictions for each agent. This fails to

capture complex uncertainty-dependent hedging behaviors

which mediate between other agents’ possible plans in the

true solution of the MaxEnt game.

These hedging behaviors are closely related to partially

observable Markov decision process (POMDP) in the

single-player setting, where solutions must consider the

distribution of states when solving for the optimal controls.

Similar multi-agent scenarios have been considered via

POMDPs in [20–22]. Also related are partially observable

stochastic games which are usually intractable [1] and mostly

limited to the discrete setting [23–26].

One big challenge for methods that seek to address partial

observability and multimodality is the problem of computa-

tional efﬁciency and scalability (e.g., in the case of POMDPs).

Furthermore, only a few of the existing GT methods perform

experiments on hardware [10,11,27], where the robustness

of the method to noise and delays in the system is crucial. To

tackle both of these issues, we present MPOGames, a frame-

work for efﬁciently solving multimodal MaxEnt dynamic

games by reformulating the problem as a game of incomplete

information and efﬁciently ﬁnding approximate solutions to

the equivalent POMDP problem. We showcase the real-time

potential of our method via extensive hardware comparisons.

arXiv:2210.10814v2 [cs.GT] 23 May 2023

Our contributions in this work are as follows.

A computationally efﬁcient framework for GT planning

in multi-agent scenarios that produces more accurate

solutions in the presence of multiple local minima

compared to previous works.

A case-study in simulation comparing how GT methods

handle multimodality that show the importance of

inference and handling uncertainty.

Analysis of the performance of GT methods in

multimodal situations with a 1/10th scale car platform.

These results demonstrate the robustness and tractability

of MPOGames on hardware.

II. PRELIMINARIES

A. Discrete Dynamic Games and Nash Equilibria

We consider a discrete dynamic game with

players.

Let

ut=[u1

t,...,uN

t]=[ui

t,u¬i

t]∈Rnu

denote the joint control

vector for all players, where we use

(·)i

and

(·)¬i

to denote

the partition of

(·)

into parts belonging to agent

and other

agents respectively. We denote by

xt∈Rnx

the full state of

the system at timestep twith nonlinear dynamics

xt+1 =f(xt,u1

t,...,uN

t)=f(xt,ut).(1)

Time indices are dropped when they are clear from the context.

Each agent’s objective is to minimize a corresponding ﬁnite-

horizon cost function

with running cost

and terminal cost

Φiwith respect to the control trajectory U=[u1,...,uT−1]

min

Ui∈U iJi(Ui,U¬i) = min

Ui∈U iΦi(xT)+

T−1

t=1

li(xt,ut),(2)

where

denotes the set of feasible control trajectories for

agent

. Note that the objective function

is a function

of both

and

U¬i

, but the optimization is only over

Hence, the quantity of interest here is the Nash Equilibrium

(NE), i.e., ﬁnd controls U∗such that for each agent i,

Ji(Ui∗,U¬i∗)≤Ji(Ui,U¬i∗),∀Ui∈Ui.(3)

B. Maximum Entropy Dynamic Games

Agents rarely act exactly optimally due to cognitive or

computational limitations and settle for satisfactory decisions

[16]. We can model suboptimality via the noisy rationality

model [15], where controls are stochastic according to the

MaxEnt framework [17,28]. Consider a stochastic control

policy

πi(ui,x)

and introduce an entropy regularization term

to the original objective

(2)

. We then consider the expected

cost under all agents’ policies π

Ji(π)= E

ut∼π"Φi(xT)+

T−1

t=1

li(xt,ut)−1

βH[πi(·|xt)]#,

(4)

where

β > 0

denotes inverse temperature or the rationality

coefﬁcient, and

H[π]

is the Shannon entropy of

, deﬁned as

H[π]:=−E

u∼π[logπ(u)]=−Zπ(u)logπ(u)du. (5)

The resulting MaxEnt NE problem (referred to as “Entropic

Cost Equilibrium” in [7]) is then to ﬁnd stochastic policies

π∗such that the following holds for each agent

Ji(πi∗,π¬i∗)≤Ji(πi,π¬i∗),∀πi∈Πi,(6)

for feasible control policies

Πi

from any initial state. Let

the value function

for agent

given the policies of other

agents π¬ibe

t(xt):= inf

πi{Ji(x,πi,π¬i)}.(7)

The optimal policy πi∗takes the form [7,9]

πi∗(ui|xt)=Zi(xt)−1exp−βQ(xt,ui),(8)

where

Qi(xt,ui

t):=E

π¬iVit+1(xt+1)+li(xt,u),(9)

and Zidenotes the partition function

Zi(xt):=Zexp−βE

π¬iVit+1(xt+1)+li(xt,u)dui.

(10)

Then, the value function

takes the form [29, Appendix A]

t(xt)=−lnZi(xt).(11)

While this gives us an expression for

πi∗

and

, it is

generally intractable to compute the integral

(10)

and solve

for π∗and Ziunder general nonlinear costs and dynamics.

One way of resolving this problem is to approximate the

problem as a linear-quadratic (LQ) game [7,9] in a manner

similar to iLQR [30], an approach also taken in the iLQGames

[5,31] family of NE solvers. By doing so, the approximate

game can be solved exactly [9, Lemma 1 and Lemma 2].

C. LQ is problematic for Multiodal MaxEnt NE

While taking LQ approximations yields a computationally

efﬁcient closed-form simulation for both the original NE

and the MaxEnt NE problem, this approximation can be

especially problematic in the MaxEnt NE setting. While the

value function in the deterministic case does not depend on

the cost function at states away from the NE, this is not the

case for MaxEnt NE due to the integral in (8) and (11).

Illustrative Toy Example. We illustrate this using a toy

example where the MaxEnt NE can be solved for in closed-

form. Consider the following two-agent single timestep

dynamic game with

x= [x1,x2]∈R2

and

u= [u1,u2]∈R2

with single-integrator dynamics and costs deﬁned by

J1=1

2(u1

0)2+3

2(x1

1−x2

1)2,(12)

J2=1

2(u2

0)2+Φ2(x2

1),(13)

Φ2(x2

1)=σSM3

2(x2

1−1)2,3

2(x2

1+1)2+ϵ,(14)

σSM(a,b):=−lne−a+e−b,(15)

where

ϵ > 0

controls the “height” of the left local minima

(Fig. 2top right) and

0=x2

0= 0

. The

σSM

(14)

acts

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MPOGames:EfficientMultimodalPartiallyObservableDynamicGamesOswinSo1,2,∗,PaulDrews2,ThomasBalch2,VelinDimitrov2,GuyRosman2,EvangelosA.Theodorou1Abstract—Gametheoreticmethodshavebecomepopularforplanningandpredictioninsituationsinvolvingrichmulti-agentinteractions.However,thesemethodsoftenassumetheexis...

展开>> 收起<<

MPOGames Efficient Multimodal Partially Observable Dynamic Games Oswin So12 Paul Drews2 Thomas Balch2 Velin Dimitrov2 Guy Rosman2 Evangelos A. Theodorou1 Abstract Game theoretic methods have become popular for.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MPOGames Efficient Multimodal Partially Observable Dynamic Games Oswin So12 Paul Drews2 Thomas Balch2 Velin Dimitrov2 Guy Rosman2 Evangelos A. Theodorou1 Abstract Game theoretic methods have become popular for

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: