TrafficGen: Learning to Generate Diverse and Realistic Traffic Scenarios
Lan Feng§*, Quanyi Li†*, Zhenghao Peng♠*, Shuhan Tan‡, Bolei Zhou♠
§ETH Zurich, †The University of Edinburgh,
♠University of California, Los Angeles, ‡The University of Texas at Austin
Abstract— Diverse and realistic traffic scenarios are crucial
for evaluating the AI safety of autonomous driving systems in
simulation. This work introduces a data-driven method called
TrafficGen for traffic scenario generation. It learns from the
fragmented human driving data collected in the real world and
then generates realistic traffic scenarios. TrafficGen is an au-
toregressive neural generative model with an encoder-decoder
architecture. In each autoregressive iteration, it first encodes the
current traffic context with the attention mechanism and then
decodes a vehicle’s initial state followed by generating its long
trajectory. We evaluate the trained model in terms of vehicle
placement and trajectories, and the experimental result shows
our method has substantial improvements over baselines for
generating traffic scenarios. After training, TrafficGen can also
augment existing traffic scenarios, by adding new vehicles and
extending the fragmented trajectories. We further demonstrate
that importing the generated scenarios into a simulator as an
interactive training environment improves the performance and
safety of a driving agent learned from reinforcement learning.
Model and data are available at https://metadriverse.
github.io/trafficgen.
I. INTRODUCTION
Autonomous driving (AD) is transforming our daily life
with promised benefits like safe transportation and efficient
mobility. One of the biggest hurdles for deploying AD
in the real world is to ensure the vehicles controlled by
algorithms operate safely and reliably in all kinds of traffic
scenarios. Before the real-world deployment of AD, the
simulation environment becomes an ideal testbed to evaluate
the reliability and safety of AD systems. However, most of
the existing simulators like CARLA [5] and SMARTS [32]
have hand-crafted traffic-generating rules and maps, while
the traffic scenarios available for testing are also far from
enough to emulate the complexity of the real world. As a
result, it is difficult to evaluate how the AD systems make
safe-critical decisions and react to other traffic participants in
complex traffic scenarios. Thus, creating diverse and realistic
traffic scenarios in simulation becomes crucial for thoroughly
evaluating the AI safety of AD systems.
Existing methods tackle the challenge by replaying vehicle
trajectories collected from the real world [15, 16, 30].
Though the trajectories replayed from real data preserve
the fidelity of the real world, there are two issues. First, it
requires a time-consuming data collection process, particu-
larly on a large scale. Second, the trajectories of vehicles
in the commonly used driving datasets such as Waymo
Open Dataset [20] and Argoverse [2] are fragmented and
incomplete. Most of the trajectories only span a short period
Lan Feng, Quanyi Li and Zhenghao Peng contribute equally to this work.
of time due to the fact that they are collected by a data
collection vehicle in moving, which is often occluded by
other traffic participants. For instance, only 30% of the
trajectories collected in Waymo Motion Dataset [20] last
more than 10 seconds, and only 12% cover the whole
scenario. Thus, the replayed traffic scenarios are incomplete
and insufficient for a thorough evaluation of AD systems.
One solution for covering more traffic scenes is to design
test cases manually [10] or use heuristic methods like proce-
dural generation (PG) [8, 16] to create a huge number of test
cases. However, the generated scenarios cannot well reflect
the complexity of real-world traffic and road structure. In
addition, it requires a substantial amount of human effort
and domain knowledge to design rules for placing vehicles,
determining their initial states, and setting trigger conditions
for assuring the interaction between the ego vehicle and other
traffic participants [7].
Our goal is to enable the automatic generation of realistic,
complete, and diverse traffic scenarios that learn from real-
world data. We develop a data-driven traffic scenario gen-
erator TrafficGen that can synthesize both the initial states
of traffic vehicles and their long and complete trajectories.
TrafficGen learns from the fragmented and noisy trajectories
collected in the real-world driving dataset such as Waymo
Open Dataset [20]. TrafficGen follows an encoder-decoder
neural architecture. The encoder transforms the HD map and
states of vehicles into a scenario representation. The decoder
generates initial state distributions for placing vehicles and
long-term multi-modal probabilistic trajectories for realistic
simulation.
TrafficGen can generate diverse and realistic traffic scenar-
ios given HD maps after training, which greatly enlarges the
set of traffic scenarios available for AD testing. As shown
in Fig.1, TrafficGen can also be used to edit and augment
the existing scenarios, by (1) generating new traffic on the
same HD map. (2) adding new vehicles and trajectories,
(3) inpainting a trajectory segment into a longer one. The
generated scenarios are further imported into the simulator
to improve the driving agent trained from reinforcement
learning (RL). The experimental results show that the safety
of driving agents can be substantially improved when being
trained on the generated scenarios with higher complexity
and traffic density.
II. RELATED WORK
The majority of the traffic scenarios in the existing driving
simulators are either pre-recorded in the real world [1, 15]
arXiv:2210.06609v2 [cs.RO] 5 Mar 2023