In TrAAD, we add a phase of training in addition to
traditional imitation learning for driving, where the vehicle
“learns to accelerate”. This phase involves maximizing the
overall traffic flow of a vehicle’s local lane, minimizing
the fuel consumption of all vehicles, and discouraging the
acceleration actions from being too jerky. Because our method
supervises acceleration via distillation, it is generalizable to
nearly any standard imitation learning framework, regard-
less of architecture or design. Our results show that our
method, when implemented on top of existing state-of-the-art
driving frameworks, improves traffic flow, minimizes energy
consumption for the AV, and enhances the passenger’s ride
experience.
In summary, we present the following key contributions:
1)
A simulated traffic-annotated driving dataset for imita-
tion learning for self-driving cars;
2)
Use of gradients from differentiable traffic simulation
to improve sample efficiency for autonomous vehicles;
3)
A generalizable method for traffic-aware autonomous
driving, which learns to control the vehicle via rewards
based on societal traffic-based objectives.
Additional results, materials, code, datasets, and information
can be found on our project website.
II. RELATED WORKS
A. Autonomous Driving with Traffic Information
Zhu et al. recently proposed a method for safe, efficient,
and comfortable velocity control using RL [17]. Similarly to
one of our objectives, they aim to learn acceleration behavior
that exceeds the safety and comfort of human expert drivers.
One major difference is that our work complements existing
end-to-end autonomous driving systems with multi-modal
sensor data, and learned acceleration behavior cooperates
with learned control behavior from imitation learning rather
than learning acceleration in a pure traffic simulation setting.
In addition, our objective is to directly optimize on an
entire traffic state, not just the objectives for the autonomous
vehicle itself. The reward objectives of [17] are also inferred
from a partially-observed point of view. Other works have
considered learning driving behavior with passenger comfort
and safety in mind, but many do not directly involve traffic
state information beyond partially-observed settings [18]–[20].
Wegener et al. present a method for energy-efficient urban
driving via RL [21] in a partially-observed setting purely in
traffic simulation, however, does not address integration with
current works for more complex vehicle control. In short,
our method addresses a broader method for learning a policy
beneficial to both individual and societal traffic objectives,
while can be easily integrated into existing state-of-the-art
end-to-end driving control methods.
B. Differentiable Microscopic Traffic Simulation
While differentiable physics simulation has been gaining
popularity in recent years, differentiable traffic simulation
is under-explored, especially in applications for autonomous
driving. In 2021, Andelfinger first introduced the potential of
differentiable agent-based traffic simulation, as well as tech-
niques to address discontinuities of control flow [22]. In his
work, Andelfinger highlights continuous solutions for discrete
or discontinuous operations such as conditional branching,
iteration, time-dependent behavior, or stochasticity in forward
simulation, ultimately enabling the use of automatic differen-
tiation (autodiff) libraries for applications such as traffic light
control. One key difference between our work and [22] is that
our implementation of differentiable simulation accounts for
learning agents acting independently from agents following a
car-following model, and is compatible with existing learning
frameworks. In addition, we optimize traffic-related learning
by defining analytical gradients rather than relying solely
on auto-differentiation. Most recently, Son et al. proposed
a novel differentiable hybrid traffic simulator that computes
gradients for both macroscopic, or fluid-like, representations
and agent-based microscopic representations, as well as the
transitions between them [23]. In our work, we focus solely
on microscopic agent-based simulation to maintain relevance
to autonomous driving frameworks.
C. Deep Learning with Traffic Simulation
Deep reinforcement learning has been used to address
futuristic and complex problems for control of autonomous
vehicles in traffic. One survey on Deep RL for motion
planning for autonomous vehicles by Aradi [24] delineates
challenges facing the application of DRL to traffic problems,
one of which is the long and potentially unsuccessful learning
process. This has been addressed in several ways through
curriculum learning [25]–[27], adversarial learning [28], [29],
or model-based action choice. In our work, we address
this issue via sample enhancement for on-policy deep re-
inforcement learning. With differentiable traffic simulation
and access to gradients of reward with respect to policy
action, we can artificially generate “helpful” samples during
learning with respect to reward. “FLOW” by Wu et al. [30]
presents a deep reinforcement learning (DRL) benchmarking
framework, built on the popular microscopic traffic simulator
SUMO [31]. Wu et al. provide motivation for integrating
traffic dynamics into autonomous driving objectives with
DRL, defining the problem/task as “mixed autonomy”. Novel
objectives for driving include reducing congestion, carbon
emissions, and other societal costs; these are all in futuristic
anticipation of mixed autonomy traffic. Based on FLOW,
Vinitsky et al. published a series of benchmarks highlighting
4 main scenarios regarding traffic light control, bottleneck
throughput, optimizing intersection capacity, and controlling
merge on-ramp shock waves [32]. We extend the environments
from FLOW’s DRL framework to be differentiable and show
benchmark results for enhanced DRL algorithms utilizing
traffic flow gradients for optimization.
III. BACKGROUND
A. Simulation-related Notation and Definitions
To integrate traffic simulation into learning and optimiza-
tion frameworks for autonomous driving, we need differen-
tiable forward simulation. Agent-based traffic simulation is