interaction relation between agents, which is a non-trivial
task for generative models that may require a large number
of samples. Third, it affords better efficiency by identifying
only the relevant agents influenced by the ego plans and
modifying their future trajectories in simulation as opposed
to all agents.
Our contribution is as follows:
•We propose a learning-based simulation model,
InterSim, that rolls out realistic and consistent future
trajectories of multiple traffic agents based on explicit
interaction relations.
•We leverage a relation predictor to infer interaction
relations for better interpretability and simulation ef-
ficiency, and show how our simulator can be used to
manipulate different interaction situations by specifying
the relations.
•We train and evaluate our model on the Waymo Open
Motion Dataset, a publicly available real-world driving
benchmark, and demonstrate its advantage compared to
a state-of-the-art baseline in two simulation tasks.
II. RELATED WORK
In this section, we discuss relevant literature in three as-
pects: traffic simulation, behavior prediction, and interaction
modeling.
A. Traffic Simulation
Traffic simulation is an important task for intelligent
transportation systems, allowing for training and evaluating
driving models in a more scalable and safe way. Existing traf-
fic simulators render high-fidelity driving environments in the
context of racing [9] and urban driving [10], [11]. However,
they often simulate agent behaviors through heuristic-based
models that fail to cover diverse scenarios or interactions.
Recently, learning-based models have demonstrated great
success in simulating realistic and reactive agent behaviors
by learning driving patterns from real-world driving data. For
instance, [4] trains a deep neural network through a rasterized
representation derived from driving logs to simulate future
agent trajectories; [5] infers future agent intent and control
inputs to model stochastic traffic dynamics. While such
methods consider the past trajectories of all the agents at
once, they assume independence of future trajectory rollouts
that may lead to inconsistent or colliding trajectories between
simulated agents in interactive scenarios.
In order to improve simulation consistency over multiple
interacting agents, [6] leverages a collision loss and [7]
proposes a rule-based fall-back layer to discourage or avoid
collisions. While such works often require hand-crafted
losses or post-processing filters, [8] proposes a multi-agent
behavior model that simulates joint agent behaviors directly
through an implicit latent variable that governs the agent
interactions. Compared to existing models, we propose a
relation-aware simulator that simulates diverse and realistic
interactive behaviors in a more straightforward and efficient
way by explicitly modeling interacting relations.
B. Behavior Prediction
Behavior prediction offers a natural solution to simulate
agent behaviors through the predicted trajectories given the
environmental context. Recent models prove great success in
improving prediction accuracy, by learning agent dynamics
and environmental context represented either as a vector
representation [12], [13] or a rasterized image [14], [15].
Due to uncertainty in human intent, the future trajectories
are multi-modal. To handle the multi-modality and improve
prediction coverage, a family of models are proposed to first
predict high-level intent, such as goal targets [16]–[18], lanes
to follow [19], [20], maneuvers [21]–[23], and linguistic
descriptions [24], before predicting low-level trajectories that
are conditioned on the intent.
In this work, we take advantage of the goal-conditioned
models in the behavior prediction literature to simulate
realistic agent trajectories given the environmental context
and agent intent.
C. Interaction Modeling
Modeling interaction is an important task in motion pre-
diction and simulation when reasoning about multi-agent be-
haviors. While many existing approaches [?], [25]–[27] rely
on implicit latent variables to model interactions, we focus on
modeling and predicting explicit interaction relations in this
work for better interpretability. These explicit relations allow
us to produce and manipulate different types of interactive
scenarios.
In this work, we follow [28]–[30] that define agent re-
lations based on the pass and yield relationship and predict
the relationship as a classification problem through a separate
learning model. The predicted relations are useful in guid-
ing the motion predictor to generate consistent trajectories
among multiple agents, as shown by [30].
When there exist potential conflicts between a novel ego
plan and the simulated trajectories of environment agents
given the predicted relations, we adopt conflict resolution
techniques that are widely used in planning [31], search [32],
and ordering [33].
III. PROBLEM FORMULATION
We formulate the problem of learning realistic interactive
behaviors for traffic simulation following [8]. Given map
states Mand the observed states Sof Ntraffic agents in a
scene, the goal is to roll out the future states of all agents Y
up to a finite horizon T.
Due to the computational complexity and memory con-
straint in simulating joint behaviors over all traffic agents in
the scene, our model focuses on simulating agent behaviors
that are relevant to the ego plan, as the irrelevant agent
behaviors are often ignored by the ego planner. For an
irrelevant agent whose future trajectory stays the same given
a new ego plan at the next step, our simulator can simply
roll out its future trajectory from the data.
One key consideration in our problem is to faithfully
follow the agent’s origin intent as much as possible. We
define such intent based on the goal location collected from