How Bad is Selfish Driving Bounding the Inefficiency of Equilibria in Urban Driving Games Alessandro Zanardi Pier Giuseppe Sessa Nando K aslin Saverio Bolognani Andrea Censi Emilio Frazzoli

2025-05-06 0 0 784.02KB 11 页 10玖币
侵权投诉
How Bad is Selfish Driving?
Bounding the Inefficiency of Equilibria in Urban Driving Games
Alessandro Zanardi, Pier Giuseppe Sessa, Nando K¨
aslin, Saverio Bolognani, Andrea Censi, Emilio Frazzoli
Abstract We consider the interaction among agents engag-
ing in a driving task and we model it as general-sum game. This
class of games exhibits a plurality of different equilibria posing
the issue of equilibrium selection. While selecting the most
efficient equilibrium (in term of social cost) is often impractical
from a computational standpoint, in this work we study the
(in)efficiency of any equilibrium players might agree to play.
More specifically, we bound the equilibrium inefficiency by
modeling driving games as particular type of congestion games
over spatio-temporal resources. We obtain novel guarantees that
refine existing bounds on the Price of Anarchy (PoA) as a
function of problem-dependent game parameters. For instance,
the relative trade-off between proximity costs and personal
objectives such as comfort and progress. Although the obtained
guarantees concern open-loop trajectories, we observe efficient
equilibria even when agents employ closed-loop policies trained
via decentralized multi-agent reinforcement learning.
I. INTRODUCTION
While autonomous vehicles begin to be deployed around
the world, it became evident that they often still miss the
magic touch to seamlessly integrate with other road users [1].
This has sparked a noticeable research interest toward the in-
teractive nature of the driving task [1]–[3]. To this end, game-
theoretical notions have been integrated in motion planning al-
gorithms [4], in learning policies [5], [6] and, more in general,
when explicitly reasoning about others’ reactive behavior [7].
Arguably, the hardness of driving interactions is to coor-
dinate on a certain equilibrium [1] – who goes first when
resources are contended. Under mild assumptions, it has been
shown that there actually exist certain equilibria that shall
be preferred in terms of social efficiency [8] or in terms
of cost sharing [9]. At the same time, big strides forward
have been made for game-theoretical planners that have local
guarantees of convergence [5], [9], [10]. Combining these
two aspects, our work is motivated by the following question:
How inefficient can an equilibrium be compared to another?
An answer would have several implications ranging from the
problem of equilibrium selection [11] to the importance of
global vs local, centralized vs decentralised solutions.
We consider the class of urban driving games and study
their efficiency, i.e., the cost of their equilibria with respect to
the social optimum. Under minor modeling assumptions, we
show that it is possible to derive analytical bounds for their
inefficiency. The resulting bound is a function of the relative
importance between personal objectives (e.g., a comfort cost
that depends only on the agent’s trajectory) and joint ones
(e.g., a proximity cost that depends on the joint trajectories of
the players). In addition, it depends on a problem-dependent
Equal contribution.
This work was supported by the Swiss National Science Foundation under
NCCR Automation, grant agreement 51NF40 180545.
Fig. 1. For agents engaging in the driving task there exist many topologically
different equilibria. The figure shows two distinct examples of learned
Nash Equilibrim policies. Some would result in better overall costs for the
individual agents but would require to find global solutions which are often
impractical in these cases. We study and bound in terms of Price of Anarchy
the inefficiency arising in these type of games.
parameter which represents the agent’s sensitiveness to the
number of nearby vehicles. A satisfactory bound implies that
the agents can be self-interested without having to estimate
others’ degree of cooperativeness [12]; and that there is no
need for global coordination since decentralized and local
solutions would still achieve a satisfactory overall cost for
the system.
A. Related Work
Many works in the last years modeled driving interactions
as a general-sum game [8], [13]. And most–if not all–of the
devised solution methods provide guarantees only for local
convergence to Nash Equilibria; examples range from itera-
tive quadratic approximations [10], augmented Lagrangian
methods [9] and “Newtonesque” methods [14]. Since there
exists both a continuum of solutions but also qualitatively
different class of solutions (in the sense of topologically
different solutions [1, Sec. 3.3]), one cannot ignore the
problem of equilibrium selection. For some methods the
(local) choice is embedded in the method, converging for
example to Generalized Nash Equilibria [9]. In other mixed
context the choice is dictated by human drivers. In [15] for
instance, autonomous vehicles keep a belief over the possible
equilibria in a bid to favor the ones preferred by others.
Differently from these works, we study the inefficiency that
any equilibrium could have.
The study of games’ inefficiency finds a considerable body
of literature since the pioneering work in [16]. Efficiency
guarantees are often expressed in terms of PoA which quanti-
fies the ratio between the social cost of the worst-performing
arXiv:2210.13064v1 [cs.RO] 24 Oct 2022
equilibrium with that of the socially optimal outcome. PoA
bounds have been derived for specific classes of games such
as congestion games [17], [18], utility games [19], [20], and
smooth games [21], exploiting various structures. However, to
the best of our knowledge, PoA in the urban driving setting
have not been studied in the literature. In this work, we
formulate driving games as a particular type of congestion
games and employ the PoA bounding techniques of [18].
However, differently from [18] we generalize and refine the
obtained guarantees exploiting the specific driving games’ cost
structure which consists of joint but also personal objectives.
B. Contribution
We consider the problem of bounding the inefficiency of
equilibria that emerge in driving games, quantified via the
notion of PoA. To this end,
We formally show that driving games can be naturally
modeled as a particular type of congestion games, where
the agents compete for spatio-temporal resources. This
allows us to leverage existing PoA bounds for congestion
games and apply them to our class of driving games.
We further refine the efficiency guarantees by exploiting
the specific cost structure of driving games. We derive
a novel and improved efficiency bound which depends on
the relative importance between personal and joint costs.
The obtained results can be of broader interest since they
apply to general congestion games with added personal
costs and, to the best of our knowledge, they constitute
the first PoA bounds for driving games.
We conduct an experimental case study to evaluate the
inefficiency of several driving scenarios. We compute
equilibrium driving policies via multi-agent reinforcement
learning and utilize a systematic approach to empirically
approximate the associated PoAs. Conforming with our
intuition, the computed equilibria display a high efficiency
in all the considered scenarios and the resulting PoAs are
within the derived, albeit conservative, bounds.
II. PRELIMINARIES
A. Urban Driving Games
We consider the class of (urban) driving games akin to [8].
They are a particular subclass of general-sum games with few
peculiarities. Most importantly, the cost-structure in a driving
game allows to distinguish between joint and personal costs.
Joint costs depend on the state and actions of all the players,
e.g., proximity. Personal costs instead, depend only on the
states and actions of a specific player, e.g., a comfort objective
penalizing large accelerations. Furthermore, the resulting
game often enjoys the favorable structure of being a potential
game [22], [23], which in turn, guarantees convergence
of better-response schemes and, in some cases [8], social
efficiency of global minima.
For the provided analytic results we consider open-loop
strategies, where the players commit to the whole trajectory.
More formally, we consider a driving game defined by the
tuple
G=hA,{Γi},{Ji}ii∈A
., where
A
is a finite set of
players,
Γi
is a discrete set of dynamically feasible trajectories,
and
Ji
is the cost structure for a player. In the remaining,
we denote without subindices the joint quantities, while the
subindex specifies if a quantity is peculiar to a subset of the
players; e.g.,
i
reads “all but Player
i
”. Thus, we denote the
trajectory choice of Player
i
as
γiΓi
, whereas
γQi∈A Γi
denotes the joint trajectories of all players.
In a driving game, the cost structure
Ji
of each player
penalizes – as a first priority objective – colliding trajecto-
ries. Second, whenever game outcomes are not colliding, it
typically penalizes distances from neighbouring players (i.e.,
aproximity cost) and other personal objectives (i.e., comfort,
acceleration, time, etc.). This principle has been modeled with
a lexicographic ordered cost in [8], and with optimization
constraints in other works [9], [24].
In this work, we restrict the possible coupling costs among
the players to the ones involving distance. Therefore, the
overall cost for a player has the form
Ji(γ) = Jprox
i(γ) +
Jper
i(γi)
where the first term is a proximity cost–collision at
the limit–and the second term is a personal cost. We further
specify the allowed proximity costs to have two properties:
(i) To be integrable over the trajectory;
(ii)
To be monotonically increasing as the distance decreases.
More formally, given a pair of players’ trajectories
γi
and
γj
,
we define
δ(γi, γj, t)
as the spatial distance between
γi
and
γj
at time
t
. Then, the proximity costs
Jprox
i
must satisfy the
following property.
Property 1.
For any player
i
and others’ trajectories
γi
, consider any pair of feasible trajectories
γi, γ0
iΓi
.
Then, if
δ(γ0
i, γj, t)δ(γi, γj, t),t, j6=i
, it must hold
Jprox
j(γ0
i, γi)Jprox
j(γi, γi),j∈ A.
Intuitively, Property 1ensures that for a unilateral deviation
of player
i
, the proximity cost of every player (including
i
)
increases as player
i
chooses trajectories that are spatially
closer to the others. Overall, Property 1encompasses many
possible choices of proximity costs such as
Jprox
i(γ) =
Pj6=iPtδ(γi, γj, t)α
, for a given degree coefficient
α > 0
,
or of the kind
Jprox
i(γ) = Pj6=iPt
1
δ(γij,t)α
. Additionally,
Property 1is still valid whenever only distances within a
certain thresholds are penalized as e.g. in
Jprox
i(γ) = X
j6=iX
t((δsδ(γi, γj, t))αif δ(γi, γj, t)< δs,
0otherwise,
(1)
where α > 1and δsis some safety distance.
B. Game Equilibria and Efficency
We consider Nash Equilibria (NE) as the solution concepts
of driving games, defined as follows.
Definition 2
(Nash Equilibrium)
.
A trajectory profile
γ
is a
(pure) Nash equilibrium (NE) if i∈ A,γ0
iΓi
Ji(γ)Ji(γ0
i, γi).(2)
We denote the set of NE outcomes as ΓNE Γ.
Different approaches have been proposed for computing
NE trajectories, e.g., [9], [10], [14]. Unfortunately, however,
the fact that NE can be computed does not tell us anything
about their quality (i.e., their efficiency). A common way to
quantify the efficiency of a game outcome is by measuring
its social cost:
Definition 3
(Social Cost)
.
The social cost
C(γ)
associated
with a specific outcome
γ
of a game is defined as the sum
over all the individual player costs:
C(γ):=X
i∈A
Ji(γ).(3)
A game is more efficient if it results in a lower social cost.
In the context of urban driving, efficient outcomes are usually
represented by trajectories allowing the players to reach their
goals, at a safe distance, and with minimal total consumption,
e.g., of time, acceleration, fuel, etc.
Because the players are self-interested, they aim at minimiz-
ing their individual costs
Ji
rather than
C
causing inefficiency
for the game. To measure the game inefficiency we adopt the
widely used notion of PoA [16].
Definition 4
(Price of Anarchy)
.
The Price of Anarchy (PoA)
is the ratio between the highest social cost at a NE and the
lowest social cost overall:
PoA =maxγΓNE C(γ)
minγΓC(γ)[1,).(4)
In general, providing PoA bounds is a hard task since these
must clearly depend on the specific game and players’ costs
structure. In the following, we show that driving games can be
naturally modeled as a particular type of congestion games and
suitable PoA bounds can be derived inheriting – and refining
– existing guarantees for such specific games’ structure.
III. DRIVING GAMES AS CONGESTION GAMES
Inspired by the robotics literature, we look at the class of
driving games as agents competing for common resources –
in this case, portions of the road at specific time instances. In
this spirit, we show that the games presented in Sec. II can
be (re)modeled as congestion games which preserve the same
key properties, allowing us to derive inefficiency bounds.
Congestion games were first introduced in [25] as games
where the agents’ strategy corresponds to selecting a subset of
the available resources. The use of each resource is penalized
by a monotonic load function such that, the more players select
that resource, the more cost they incur. Hence, the total cost for
a player results in the sum over the load costs of the selected re-
sources. We refer to [26] for a more pedagogical presentation.
A. Congestion Game Formulation
In the driving setting at hand, we consider the finite set of
resources given by a discretization of the road in both space
(i.e. a 2D grid) and time. We denote the set of spatio temporal
resources as
R
. Then, each trajectory
γi
can be mapped to
a corresponding strategy
γcg
i⊆ R
by its spatio-temporal
occupancy. Since we consider deterministic trajectories, each
agent either uses a resource or it does not. In other words,
the load an agent can put on a resource is binary. Hence,
the resulting load on resource
r
is defined as
lr(γcg) =
Pi∈A 1rγcg
iN
, where
1
is the indicator function. Each
resource then has a specific load-dependent cost function
Jr:NR
. For the purpose of this work, we restrict
Jr
proximity h
space s
time t
player i
resource r
Fig. 2. The set of possible resources follows from a discretization in space,
time, and proximity levels. The figure depicts an example of the first two
time steps of a “one-dimensional” road with three proximity levels, i.e.,
H= 3
. The resources that are used by player
i
are shaded in red and grow
along the proximity dimension.
to be a polynomial with non-negative coefficients. The cost
which an agent incurs is then constructed as:
Jcg
i(γcg) = X
rγcg
i
Jr(lr(γcg)).
Intuitively, to minimize the above congestion cost, agents
are encouraged to choose non-overlapping trajectories and
thus the above formulation models – as a first approximation
– driving games’ preferences. However, it is a very crude
approximation since it does not discriminate among non-
overlapping trajectories and thus cannot fully model prox-
imity costs. In the following, we show that augmenting the
resource set along a new dimension that we name “proximity
dimension” allows us to model various proximity costs that
respect Property 1. The inclusion of the personal costs is
instead straightforward (see end of this section) as already
shown in the literature [27].
Proximity levels: On top of the discretization in space
and time, we further consider a proximity dimension
h
{0, . . . , H 1}
so that a “copy” of the spatio-temporal
resources exists for each proximity level
h
, as shown in Fig. 2.
Intuitively, the trajectory of an agent progressively inflates
its occupancy along this dimension, such that, at the larger
proximity levels, resources can overlap even if the trajectories
do not physically overlap. This allows to model proximity
costs. More precisely, consider any given time
t
. Then,
the spatial resources occupied at each proximity level are
determined in the following manner: At the first level (
h= 0
),
the trajectory
γi
uses only the resources associated to its
physical occupancy (we denote them as
[γcg
i(t)]0
). Then, for
each successive level
h
,
γi
uses the spatial resources that
are within a neighborhood (i.e., a ball) of a given radius
ρh
around the agent’s position
γi(t)
. We denote such resources
as
[γcg
i(t)]h
and assume for simplicity that
ρh> ρh1
. Hence,
by letting
Jh(·)
represent the polynomial cost functions
associated to resources of proximity level
h
, the agents’
proximity cost can be written as:
Jcg
i(γcg) =
T1
X
t=0
H1
X
h=0 X
r[γcg
i(t)]h
Jh(lr(γcg)).(5)
In Fig. 2, an illustrative example is shown with two
additional proximity levels (i.e.,
H= 3
). According to this for-
mulation, resources at higher levels of proximity can overlap
摘要:

HowBadisSelshDriving?BoundingtheInefciencyofEquilibriainUrbanDrivingGamesAlessandroZanardi,PierGiuseppeSessa,NandoK¨aslin,SaverioBolognani,AndreaCensi,EmilioFrazzoliAbstract—Weconsidertheinteractionamongagentsengag-inginadrivingtaskandwemodelitasgeneral-sumgame.Thisclassofgamesexhibitsaplurality...

展开>> 收起<<
How Bad is Selfish Driving Bounding the Inefficiency of Equilibria in Urban Driving Games Alessandro Zanardi Pier Giuseppe Sessa Nando K aslin Saverio Bolognani Andrea Censi Emilio Frazzoli.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:784.02KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注