
MPOGames: Efficient Multimodal Partially Observable Dynamic Games
Oswin So1,2,∗, Paul Drews2, Thomas Balch2, Velin Dimitrov2, Guy Rosman2, Evangelos A. Theodorou1
Abstract— Game theoretic methods have become popular for
planning and prediction in situations involving rich multi-agent
interactions. However, these methods often assume the existence
of a single local Nash equilibria and are hence unable to
handle uncertainty in the intentions of different agents. While
maximum entropy (MaxEnt) dynamic games try to address this
issue, practical approaches solve for MaxEnt Nash equilibria
using linear-quadratic approximations which are restricted to
unimodal responses and unsuitable for scenarios with multiple
local Nash equilibria. By reformulating the problem as a POMDP,
we propose MPOGames, a method for efficiently solving MaxEnt
dynamic games that captures the interactions between local
Nash equilibria. We show the importance of uncertainty-aware
game theoretic methods via a two-agent merge case study.
Finally, we prove the real-time capabilities of our approach
with hardware experiments on a 1/10th scale car platform.
I. INTRODUCTION AND RELATED WORK
In recent years, robots have been seen increasing use
in applications with multiple interacting agents. In these
applications, actions taken by the robot cannot be considered
in isolation and must take the responses of other agents into
account, especially in areas such as autonomous driving [1],
robotic arms [2] and surgical robotics [3] where failures
can be catastrophic. For example, in the autonomous driving
setting, two agents at a merge must negotiate and agree on
which agent merges first (Fig. 1). Failure to consider the
responses of the other agent could result in serious collisions.
Game-theoretic (GT) approaches have become increasingly
popular for planning in multi-agent scenarios [4–12]. By
assuming that agents act rationally, these approaches decouple
the problem of planning and prediction into the problems
of finding objective functions for each agent and solving for
Nash equilibrium of the resulting non-cooperative game. This
provides a more principled and interpretable alternative to
the prediction of other agents when compared to black-box
approaches that use deep neural networks [13,14].
In practice, assuming rational agents is too strict. Humans
face cognitive limitations and make irrational decisions that
are satisfactory but suboptimal, an idea known as “bounded
rationality” or “noisy rationality” [15,16]. Even autonomous
agents are rarely exactly optimal and often make suboptimal
decisions due to finite computational resources. One method
of addressing this disconnect is the Maximum Entropy
(MaxEnt) framework, which has been applied to diverse areas
1Georgia Institute of Technology.
2Toyota Research Institute.
∗Corresponding Author. oswinso@gatech.edu.
This work was supported by the Toyota Research Institute (TRI). This
article solely reflects the opinions and conclusions of its authors and not
TRI or any other Toyota entity. Their support is gratefully acknowledged.
2023 IEEE International Conference on Robotics and Automation (ICRA)
Fig. 1: The ego agent (left) seeks to merge into a lane safely
without collisions. With multiple local minima present,
MPOGames predicts likely plans for other agents, infers their
probabilities, then hedges against all possibilities. Methods
that fail to do so may result in catastrophic collisions.
such as inverse reinforcement learning [7,17], forecasting
[18] and biology [19]. In particular, this combination of
MaxEnt and GT has been proposed for learning objectives
for agents from a traffic dataset [7]. However, since the exact
solution is computationally intractable, the authors use a
linear quadratic (LQ) approximation of the game for which
a closed-form solution can be efficiently obtained. The LQ
approximation, popular among recent approaches (e.g., [5,
7]), only considers a single local minimum and consequently
produces unimodal predictions for each agent. This fails to
capture complex uncertainty-dependent hedging behaviors
which mediate between other agents’ possible plans in the
true solution of the MaxEnt game.
These hedging behaviors are closely related to partially
observable Markov decision process (POMDP) in the
single-player setting, where solutions must consider the
distribution of states when solving for the optimal controls.
Similar multi-agent scenarios have been considered via
POMDPs in [20–22]. Also related are partially observable
stochastic games which are usually intractable [1] and mostly
limited to the discrete setting [23–26].
One big challenge for methods that seek to address partial
observability and multimodality is the problem of computa-
tional efficiency and scalability (e.g., in the case of POMDPs).
Furthermore, only a few of the existing GT methods perform
experiments on hardware [10,11,27], where the robustness
of the method to noise and delays in the system is crucial. To
tackle both of these issues, we present MPOGames, a frame-
work for efficiently solving multimodal MaxEnt dynamic
games by reformulating the problem as a game of incomplete
information and efficiently finding approximate solutions to
the equivalent POMDP problem. We showcase the real-time
potential of our method via extensive hardware comparisons.
arXiv:2210.10814v2 [cs.GT] 23 May 2023