How Bad is Selﬁsh Driving Bounding the Inefﬁciency of Equilibria in Urban Driving Games Alessandro Zanardi Pier Giuseppe Sessa Nando K aslin Saverio Bolognani Andrea Censi Emilio Frazzoli

2025-05-06 2 0 784.02KB 11 页 10玖币

侵权投诉

How Bad is Selﬁsh Driving?

Bounding the Inefﬁciency of Equilibria in Urban Driving Games

Alessandro Zanardi∗, Pier Giuseppe Sessa∗, Nando K¨

aslin, Saverio Bolognani, Andrea Censi, Emilio Frazzoli

Abstract— We consider the interaction among agents engag-

ing in a driving task and we model it as general-sum game. This

class of games exhibits a plurality of different equilibria posing

the issue of equilibrium selection. While selecting the most

efﬁcient equilibrium (in term of social cost) is often impractical

from a computational standpoint, in this work we study the

(in)efﬁciency of any equilibrium players might agree to play.

More speciﬁcally, we bound the equilibrium inefﬁciency by

modeling driving games as particular type of congestion games

over spatio-temporal resources. We obtain novel guarantees that

reﬁne existing bounds on the Price of Anarchy (PoA) as a

function of problem-dependent game parameters. For instance,

the relative trade-off between proximity costs and personal

objectives such as comfort and progress. Although the obtained

guarantees concern open-loop trajectories, we observe efﬁcient

equilibria even when agents employ closed-loop policies trained

via decentralized multi-agent reinforcement learning.

I. INTRODUCTION

While autonomous vehicles begin to be deployed around

the world, it became evident that they often still miss the

magic touch to seamlessly integrate with other road users [1].

This has sparked a noticeable research interest toward the in-

teractive nature of the driving task [1]–[3]. To this end, game-

theoretical notions have been integrated in motion planning al-

gorithms [4], in learning policies [5], [6] and, more in general,

when explicitly reasoning about others’ reactive behavior [7].

Arguably, the hardness of driving interactions is to coor-

dinate on a certain equilibrium [1] – who goes ﬁrst when

resources are contended. Under mild assumptions, it has been

shown that there actually exist certain equilibria that shall

be preferred in terms of social efﬁciency [8] or in terms

of cost sharing [9]. At the same time, big strides forward

have been made for game-theoretical planners that have local

guarantees of convergence [5], [9], [10]. Combining these

two aspects, our work is motivated by the following question:

How inefﬁcient can an equilibrium be compared to another?

An answer would have several implications ranging from the

problem of equilibrium selection [11] to the importance of

global vs local, centralized vs decentralised solutions.

We consider the class of urban driving games and study

their efﬁciency, i.e., the cost of their equilibria with respect to

the social optimum. Under minor modeling assumptions, we

show that it is possible to derive analytical bounds for their

inefﬁciency. The resulting bound is a function of the relative

importance between personal objectives (e.g., a comfort cost

that depends only on the agent’s trajectory) and joint ones

(e.g., a proximity cost that depends on the joint trajectories of

the players). In addition, it depends on a problem-dependent

∗Equal contribution.

This work was supported by the Swiss National Science Foundation under

NCCR Automation, grant agreement 51NF40 180545.

Fig. 1. For agents engaging in the driving task there exist many topologically

different equilibria. The ﬁgure shows two distinct examples of learned

Nash Equilibrim policies. Some would result in better overall costs for the

individual agents but would require to ﬁnd global solutions which are often

impractical in these cases. We study and bound in terms of Price of Anarchy

the inefﬁciency arising in these type of games.

parameter which represents the agent’s sensitiveness to the

number of nearby vehicles. A satisfactory bound implies that

the agents can be self-interested without having to estimate

others’ degree of cooperativeness [12]; and that there is no

need for global coordination since decentralized and local

solutions would still achieve a satisfactory overall cost for

the system.

A. Related Work

Many works in the last years modeled driving interactions

as a general-sum game [8], [13]. And most–if not all–of the

devised solution methods provide guarantees only for local

convergence to Nash Equilibria; examples range from itera-

tive quadratic approximations [10], augmented Lagrangian

methods [9] and “Newtonesque” methods [14]. Since there

exists both a continuum of solutions but also qualitatively

different class of solutions (in the sense of topologically

different solutions [1, Sec. 3.3]), one cannot ignore the

problem of equilibrium selection. For some methods the

(local) choice is embedded in the method, converging for

example to Generalized Nash Equilibria [9]. In other mixed

context the choice is dictated by human drivers. In [15] for

instance, autonomous vehicles keep a belief over the possible

equilibria in a bid to favor the ones preferred by others.

Differently from these works, we study the inefﬁciency that

any equilibrium could have.

The study of games’ inefﬁciency ﬁnds a considerable body

of literature since the pioneering work in [16]. Efﬁciency

guarantees are often expressed in terms of PoA which quanti-

ﬁes the ratio between the social cost of the worst-performing

arXiv:2210.13064v1 [cs.RO] 24 Oct 2022

equilibrium with that of the socially optimal outcome. PoA

bounds have been derived for speciﬁc classes of games such

as congestion games [17], [18], utility games [19], [20], and

smooth games [21], exploiting various structures. However, to

the best of our knowledge, PoA in the urban driving setting

have not been studied in the literature. In this work, we

formulate driving games as a particular type of congestion

games and employ the PoA bounding techniques of [18].

However, differently from [18] we generalize and reﬁne the

obtained guarantees exploiting the speciﬁc driving games’ cost

structure which consists of joint but also personal objectives.

B. Contribution

We consider the problem of bounding the inefﬁciency of

equilibria that emerge in driving games, quantiﬁed via the

notion of PoA. To this end,

•

We formally show that driving games can be naturally

modeled as a particular type of congestion games, where

the agents compete for spatio-temporal resources. This

allows us to leverage existing PoA bounds for congestion

games and apply them to our class of driving games.

•

We further reﬁne the efﬁciency guarantees by exploiting

the speciﬁc cost structure of driving games. We derive

a novel and improved efﬁciency bound which depends on

the relative importance between personal and joint costs.

The obtained results can be of broader interest since they

apply to general congestion games with added personal

costs and, to the best of our knowledge, they constitute

the ﬁrst PoA bounds for driving games.

•

We conduct an experimental case study to evaluate the

inefﬁciency of several driving scenarios. We compute

equilibrium driving policies via multi-agent reinforcement

learning and utilize a systematic approach to empirically

approximate the associated PoAs. Conforming with our

intuition, the computed equilibria display a high efﬁciency

in all the considered scenarios and the resulting PoAs are

within the derived, albeit conservative, bounds.

II. PRELIMINARIES

A. Urban Driving Games

We consider the class of (urban) driving games akin to [8].

They are a particular subclass of general-sum games with few

peculiarities. Most importantly, the cost-structure in a driving

game allows to distinguish between joint and personal costs.

Joint costs depend on the state and actions of all the players,

e.g., proximity. Personal costs instead, depend only on the

states and actions of a speciﬁc player, e.g., a comfort objective

penalizing large accelerations. Furthermore, the resulting

game often enjoys the favorable structure of being a potential

game [22], [23], which in turn, guarantees convergence

of better-response schemes and, in some cases [8], social

efﬁciency of global minima.

For the provided analytic results we consider open-loop

strategies, where the players commit to the whole trajectory.

More formally, we consider a driving game deﬁned by the

tuple

G=hA,{Γi},{Ji}ii∈A

., where

is a ﬁnite set of

players,

Γi

is a discrete set of dynamically feasible trajectories,

and

is the cost structure for a player. In the remaining,

we denote without subindices the joint quantities, while the

subindex speciﬁes if a quantity is peculiar to a subset of the

players; e.g.,

−i

reads “all but Player

”. Thus, we denote the

trajectory choice of Player

γi∈Γi

, whereas

γ∈Qi∈A Γi

denotes the joint trajectories of all players.

In a driving game, the cost structure

of each player

penalizes – as a ﬁrst priority objective – colliding trajecto-

ries. Second, whenever game outcomes are not colliding, it

typically penalizes distances from neighbouring players (i.e.,

aproximity cost) and other personal objectives (i.e., comfort,

acceleration, time, etc.). This principle has been modeled with

a lexicographic ordered cost in [8], and with optimization

constraints in other works [9], [24].

In this work, we restrict the possible coupling costs among

the players to the ones involving distance. Therefore, the

overall cost for a player has the form

Ji(γ) = Jprox

i(γ) +

Jper

i(γi)

where the ﬁrst term is a proximity cost–collision at

the limit–and the second term is a personal cost. We further

specify the allowed proximity costs to have two properties:

(i) To be integrable over the trajectory;

(ii)

To be monotonically increasing as the distance decreases.

More formally, given a pair of players’ trajectories

γi

and

γj

we deﬁne

δ(γi, γj, t)

as the spatial distance between

γi

and

γj

at time

. Then, the proximity costs

Jprox

must satisfy the

following property.

Property 1.

For any player

and others’ trajectories

γ−i

, consider any pair of feasible trajectories

γi, γ0

i∈Γi

Then, if

δ(γ0

i, γj, t)≤δ(γi, γj, t),∀t, ∀j6=i

, it must hold

Jprox

j(γ0

i, γ−i)≥Jprox

j(γi, γ−i),∀j∈ A.

Intuitively, Property 1ensures that for a unilateral deviation

of player

, the proximity cost of every player (including

)

increases as player

chooses trajectories that are spatially

closer to the others. Overall, Property 1encompasses many

possible choices of proximity costs such as

Jprox

i(γ) =

−Pj6=iPtδ(γi, γj, t)α

, for a given degree coefﬁcient

α > 0

or of the kind

Jprox

i(γ) = Pj6=iPt

δ(γi,γj,t)α

. Additionally,

Property 1is still valid whenever only distances within a

certain thresholds are penalized as e.g. in

Jprox

i(γ) = X

j6=iX

t((δs−δ(γi, γj, t))αif δ(γi, γj, t)< δs,

0otherwise,

(1)

where α > 1and δsis some safety distance.

B. Game Equilibria and Efﬁcency

We consider Nash Equilibria (NE) as the solution concepts

of driving games, deﬁned as follows.

Deﬁnition 2

(Nash Equilibrium)

A trajectory proﬁle

is a

(pure) Nash equilibrium (NE) if ∀i∈ A,∀γ0

i∈Γi

Ji(γ)≤Ji(γ0

i, γ−i).(2)

We denote the set of NE outcomes as ΓNE ⊆Γ.

Different approaches have been proposed for computing

NE trajectories, e.g., [9], [10], [14]. Unfortunately, however,

the fact that NE can be computed does not tell us anything

about their quality (i.e., their efﬁciency). A common way to

quantify the efﬁciency of a game outcome is by measuring

its social cost:

Deﬁnition 3

(Social Cost)

The social cost

C(γ)

associated

with a speciﬁc outcome

of a game is deﬁned as the sum

over all the individual player costs:

C(γ):=X

i∈A

Ji(γ).(3)

A game is more efﬁcient if it results in a lower social cost.

In the context of urban driving, efﬁcient outcomes are usually

represented by trajectories allowing the players to reach their

goals, at a safe distance, and with minimal total consumption,

e.g., of time, acceleration, fuel, etc.

Because the players are self-interested, they aim at minimiz-

ing their individual costs

rather than

causing inefﬁciency

for the game. To measure the game inefﬁciency we adopt the

widely used notion of PoA [16].

Deﬁnition 4

(Price of Anarchy)

The Price of Anarchy (PoA)

is the ratio between the highest social cost at a NE and the

lowest social cost overall:

PoA =maxγ∈ΓNE C(γ)

minγ∈ΓC(γ)∈[1,∞).(4)

In general, providing PoA bounds is a hard task since these

must clearly depend on the speciﬁc game and players’ costs

structure. In the following, we show that driving games can be

naturally modeled as a particular type of congestion games and

suitable PoA bounds can be derived inheriting – and reﬁning

– existing guarantees for such speciﬁc games’ structure.

III. DRIVING GAMES AS CONGESTION GAMES

Inspired by the robotics literature, we look at the class of

driving games as agents competing for common resources –

in this case, portions of the road at speciﬁc time instances. In

this spirit, we show that the games presented in Sec. II can

be (re)modeled as congestion games which preserve the same

key properties, allowing us to derive inefﬁciency bounds.

Congestion games were ﬁrst introduced in [25] as games

where the agents’ strategy corresponds to selecting a subset of

the available resources. The use of each resource is penalized

by a monotonic load function such that, the more players select

that resource, the more cost they incur. Hence, the total cost for

a player results in the sum over the load costs of the selected re-

sources. We refer to [26] for a more pedagogical presentation.

A. Congestion Game Formulation

In the driving setting at hand, we consider the ﬁnite set of

resources given by a discretization of the road in both space

(i.e. a 2D grid) and time. We denote the set of spatio temporal

resources as

. Then, each trajectory

γi

can be mapped to

a corresponding strategy

γcg

i⊆ R

by its spatio-temporal

occupancy. Since we consider deterministic trajectories, each

agent either uses a resource or it does not. In other words,

the load an agent can put on a resource is binary. Hence,

the resulting load on resource

is deﬁned as

lr(γcg) =

Pi∈A 1r∈γcg

i∈N

, where

1•

is the indicator function. Each

resource then has a speciﬁc load-dependent cost function

Jr:N→R

. For the purpose of this work, we restrict

proximity h

space s

time t

player i

resource r

Fig. 2. The set of possible resources follows from a discretization in space,

time, and proximity levels. The ﬁgure depicts an example of the ﬁrst two

time steps of a “one-dimensional” road with three proximity levels, i.e.,

H= 3

. The resources that are used by player

are shaded in red and grow

along the proximity dimension.

to be a polynomial with non-negative coefﬁcients. The cost

which an agent incurs is then constructed as:

Jcg

i(γcg) = X

r∈γcg

Jr(lr(γcg)).

Intuitively, to minimize the above congestion cost, agents

are encouraged to choose non-overlapping trajectories and

thus the above formulation models – as a ﬁrst approximation

– driving games’ preferences. However, it is a very crude

approximation since it does not discriminate among non-

overlapping trajectories and thus cannot fully model prox-

imity costs. In the following, we show that augmenting the

resource set along a new dimension that we name “proximity

dimension” allows us to model various proximity costs that

respect Property 1. The inclusion of the personal costs is

instead straightforward (see end of this section) as already

shown in the literature [27].

Proximity levels: On top of the discretization in space

and time, we further consider a proximity dimension

h∈

{0, . . . , H −1}

so that a “copy” of the spatio-temporal

resources exists for each proximity level

, as shown in Fig. 2.

Intuitively, the trajectory of an agent progressively inﬂates

its occupancy along this dimension, such that, at the larger

proximity levels, resources can overlap even if the trajectories

do not physically overlap. This allows to model proximity

costs. More precisely, consider any given time

. Then,

the spatial resources occupied at each proximity level are

determined in the following manner: At the ﬁrst level (

h= 0

the trajectory

γi

uses only the resources associated to its

physical occupancy (we denote them as

[γcg

i(t)]0

). Then, for

each successive level

γi

uses the spatial resources that

are within a neighborhood (i.e., a ball) of a given radius

ρh

around the agent’s position

γi(t)

. We denote such resources

[γcg

i(t)]h

and assume for simplicity that

ρh> ρh−1

. Hence,

by letting

Jh(·)

represent the polynomial cost functions

associated to resources of proximity level

, the agents’

proximity cost can be written as:

Jcg

i(γcg) =

T−1

t=0

H−1

h=0 X

r∈[γcg

i(t)]h

Jh(lr(γcg)).(5)

In Fig. 2, an illustrative example is shown with two

additional proximity levels (i.e.,

H= 3

). According to this for-

mulation, resources at higher levels of proximity can overlap

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HowBadisSelshDriving?BoundingtheInefciencyofEquilibriainUrbanDrivingGamesAlessandroZanardi,PierGiuseppeSessa,NandoK¨aslin,SaverioBolognani,AndreaCensi,EmilioFrazzoliAbstractWeconsidertheinteractionamongagentsengag-inginadrivingtaskandwemodelitasgeneral-sumgame.Thisclassofgamesexhibitsaplurality...

展开>> 收起<<

How Bad is Selﬁsh Driving Bounding the Inefﬁciency of Equilibria in Urban Driving Games Alessandro Zanardi Pier Giuseppe Sessa Nando K aslin Saverio Bolognani Andrea Censi Emilio Frazzoli.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

How Bad is Selﬁsh Driving Bounding the Inefﬁciency of Equilibria in Urban Driving Games Alessandro Zanardi Pier Giuseppe Sessa Nando K aslin Saverio Bolognani Andrea Censi Emilio Frazzoli

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: