
equilibrium with that of the socially optimal outcome. PoA
bounds have been derived for specific classes of games such
as congestion games [17], [18], utility games [19], [20], and
smooth games [21], exploiting various structures. However, to
the best of our knowledge, PoA in the urban driving setting
have not been studied in the literature. In this work, we
formulate driving games as a particular type of congestion
games and employ the PoA bounding techniques of [18].
However, differently from [18] we generalize and refine the
obtained guarantees exploiting the specific driving games’ cost
structure which consists of joint but also personal objectives.
B. Contribution
We consider the problem of bounding the inefficiency of
equilibria that emerge in driving games, quantified via the
notion of PoA. To this end,
•
We formally show that driving games can be naturally
modeled as a particular type of congestion games, where
the agents compete for spatio-temporal resources. This
allows us to leverage existing PoA bounds for congestion
games and apply them to our class of driving games.
•
We further refine the efficiency guarantees by exploiting
the specific cost structure of driving games. We derive
a novel and improved efficiency bound which depends on
the relative importance between personal and joint costs.
The obtained results can be of broader interest since they
apply to general congestion games with added personal
costs and, to the best of our knowledge, they constitute
the first PoA bounds for driving games.
•
We conduct an experimental case study to evaluate the
inefficiency of several driving scenarios. We compute
equilibrium driving policies via multi-agent reinforcement
learning and utilize a systematic approach to empirically
approximate the associated PoAs. Conforming with our
intuition, the computed equilibria display a high efficiency
in all the considered scenarios and the resulting PoAs are
within the derived, albeit conservative, bounds.
II. PRELIMINARIES
A. Urban Driving Games
We consider the class of (urban) driving games akin to [8].
They are a particular subclass of general-sum games with few
peculiarities. Most importantly, the cost-structure in a driving
game allows to distinguish between joint and personal costs.
Joint costs depend on the state and actions of all the players,
e.g., proximity. Personal costs instead, depend only on the
states and actions of a specific player, e.g., a comfort objective
penalizing large accelerations. Furthermore, the resulting
game often enjoys the favorable structure of being a potential
game [22], [23], which in turn, guarantees convergence
of better-response schemes and, in some cases [8], social
efficiency of global minima.
For the provided analytic results we consider open-loop
strategies, where the players commit to the whole trajectory.
More formally, we consider a driving game defined by the
tuple
G=hA,{Γi},{Ji}ii∈A
., where
A
is a finite set of
players,
Γi
is a discrete set of dynamically feasible trajectories,
and
Ji
is the cost structure for a player. In the remaining,
we denote without subindices the joint quantities, while the
subindex specifies if a quantity is peculiar to a subset of the
players; e.g.,
−i
reads “all but Player
i
”. Thus, we denote the
trajectory choice of Player
i
as
γi∈Γi
, whereas
γ∈Qi∈A Γi
denotes the joint trajectories of all players.
In a driving game, the cost structure
Ji
of each player
penalizes – as a first priority objective – colliding trajecto-
ries. Second, whenever game outcomes are not colliding, it
typically penalizes distances from neighbouring players (i.e.,
aproximity cost) and other personal objectives (i.e., comfort,
acceleration, time, etc.). This principle has been modeled with
a lexicographic ordered cost in [8], and with optimization
constraints in other works [9], [24].
In this work, we restrict the possible coupling costs among
the players to the ones involving distance. Therefore, the
overall cost for a player has the form
Ji(γ) = Jprox
i(γ) +
Jper
i(γi)
where the first term is a proximity cost–collision at
the limit–and the second term is a personal cost. We further
specify the allowed proximity costs to have two properties:
(i) To be integrable over the trajectory;
(ii)
To be monotonically increasing as the distance decreases.
More formally, given a pair of players’ trajectories
γi
and
γj
,
we define
δ(γi, γj, t)
as the spatial distance between
γi
and
γj
at time
t
. Then, the proximity costs
Jprox
i
must satisfy the
following property.
Property 1.
For any player
i
and others’ trajectories
γ−i
, consider any pair of feasible trajectories
γi, γ0
i∈Γi
.
Then, if
δ(γ0
i, γj, t)≤δ(γi, γj, t),∀t, ∀j6=i
, it must hold
Jprox
j(γ0
i, γ−i)≥Jprox
j(γi, γ−i),∀j∈ A.
Intuitively, Property 1ensures that for a unilateral deviation
of player
i
, the proximity cost of every player (including
i
)
increases as player
i
chooses trajectories that are spatially
closer to the others. Overall, Property 1encompasses many
possible choices of proximity costs such as
Jprox
i(γ) =
−Pj6=iPtδ(γi, γj, t)α
, for a given degree coefficient
α > 0
,
or of the kind
Jprox
i(γ) = Pj6=iPt
1
δ(γi,γj,t)α
. Additionally,
Property 1is still valid whenever only distances within a
certain thresholds are penalized as e.g. in
Jprox
i(γ) = X
j6=iX
t((δs−δ(γi, γj, t))αif δ(γi, γj, t)< δs,
0otherwise,
(1)
where α > 1and δsis some safety distance.
B. Game Equilibria and Efficency
We consider Nash Equilibria (NE) as the solution concepts
of driving games, defined as follows.
Definition 2
(Nash Equilibrium)
.
A trajectory profile
γ
is a
(pure) Nash equilibrium (NE) if ∀i∈ A,∀γ0
i∈Γi
Ji(γ)≤Ji(γ0
i, γ−i).(2)
We denote the set of NE outcomes as ΓNE ⊆Γ.
Different approaches have been proposed for computing
NE trajectories, e.g., [9], [10], [14]. Unfortunately, however,
the fact that NE can be computed does not tell us anything
about their quality (i.e., their efficiency). A common way to
quantify the efficiency of a game outcome is by measuring
its social cost: