they do, reasoning about occlusions significantly improves
the plan quality; and BiVO is significantly more effective
than alternative learned models that were not designed for
planning (see Section VI-B).
In summary, the contributions of this paper are as follows:
•We introduce a generative model, BiVO, based on vari-
ational autoencoders that is able to produce trajectories
of occluded vehicles.
•We integrate BiVO into a fast sampling-based planning
algorithm and evaluate it in open and closed-loop replay
simulation with the real-world nuScenes dataset.
•We demonstrate that BiVO predictions integrated into
planning leads to better motion plans in critical scenar-
ios.
•To the best of our knowledge we are the first to integrate
a learned occlusion model with a planning algorithm for
autonomous driving.
II. RELATED WORK
Detecting and reasoning with occluded objects in robotics
has an extensive literature [2, 6]. In the context of au-
tonomous vehicles, occlusions can be of critical importance.
Indeed, prior work has proposed various methods that predict
and/or plan with occluded traffic agents.
Planning with occluded agents: Planning algorithms that
reason with occluded agents typically rely on handcrafted
occlusion models. For example, Orzechowski, Meyer, and
Lauer [15] propose an approach to predicting the presence
of a vehicle coming out of an occluded region and ensures the
existence of a fail-safe manoeuvre. Wang, Burger, and Stiller
[20] extend this work by eliminating some of the occluded
traffic by reasoning about the history of occlusions. Zhang
and Fisac [21] propose a method of navigating through
traffic with occluded regions by making sure a potentially
hidden pursuer should never intersect with the set of possible
inevitable collision states. Hanna et al. [8] use a model-driven
approach that infers a joint distribution over the state of the
occluded areas and the goals of other vehicles, using the
observed trajectories of the vehicles.
In contrast to these hand-crafted approaches, we propose a
data-driven approach that learns a model of occluded agents
from real-world data.
Data-driven occlusion models: Learning based models
for occluded object prediction include Schulter et al. [18],
Purkait, Zach, and Reid [17], and Han, Banfi, and Campbell
[7]. However these models make assumptions about static
objects or environments which are not pertinent in urban
driving. Some learned models can handle dynamic traffic
agents. Notably, Itkina et al. [9] use an autoencoder ar-
chitecture to infer the surroundings of visible objects and
later reconstruct them into occupancy grid maps that encode
the probability of occupied areas in 2D space. However,
it is not straightforward to integrate approaches that make
occupancy predictions of areas with existing planners, since
the predictions lack the information on how agents might
emerge out of occlusions and interfere with the ego vehicle.
In our work, we predict dynamic agents together with
their possible future trajectories instead of only occluded
areas. Our model’s predictions are key to integration with
existing downstream planners that make use of probabilistic
predictions of future trajectories.
III. TECHNICAL PRELIMINARIES
Variational Autoencoders: Variational autoencoders [11]
(VAEs) are generative models that aim to learn a density
function over some unobserved latent variables Zgiven a
dataset input x∈X. Given an unknown true posterior
p(z|x), VAEs approximate it with a parametric distribution
qθ(z|x). The KL-divergence from the parametric distribution
to the true posterior can be computed using:
DKL(qθ(z|x)kp(z|x)) = log p(x)
−Ez∼qθ(z|x)[log pu(x|z)] + DKL(qθ(z|x)kq(z)),
where DKL is the KL divergence between two distribu-
tions, and the log-evidence term log p(x)is constant. The
expectation and the KL-divergence (second line) are com-
monly called the negative evidence lower bound (ELBO).
Minimising the ELBO is equivalent to minimising the KL-
divergence between the parametric and the true posterior.
States and trajectories: A state si
tfor a vehicle iis
defined as the location, heading, velocity, and acceleration at
the current timestep t. A trajectory xi
t:t+Tis a sequence of
states si
t, si
t+1, . . . , si
t+Tthat defines how an agent imoved
in time T.
Agents: We will refer to the controlled vehicle as “ego”.
Other vehicles that are not controlled by the planner, pedes-
trians, or other road users will be referred to as “agents”.
Agents can be visible by the ego if they are in the line of
sight, or occluded if there is an obstacle blocking their view
(further details of this calculation is in Section VI-A).
Occupancy grid maps: OGMs encode the occupancy of
an area. Mobs
i∈[0,1]H,W is the H×Warea surrounding
agent iin a 1×1meter resolution, and each grid cell contains
1if it is occupied or 0if free. Locations that are not visible
with a direct line of sight from the position of vehicle iare
marked as occluded with a value 0.5.Mgt
iis the ground truth
occupancy map of the same area.
IV. BI-LEVEL VARIATIONAL OCCLUSION MODELS
The objective of our occlusion model is to generate likely
trajectories for agents emerging from occluded regions, given
a known map of occluded regions, the past and present state
of visible agents, and a lane graph.
Our approach, BiVO, is shown in Fig. 2. We break
down the problem into two subproblems and train separate
CVAE models for each. Intuitively, the first step locates
the subspace of occluded areas that have high potential of
hidden objects; and the second step infers how these hidden
object may emerge from the occluded space. Overall, BiVO
parameterizes a distribution over trajectories that start from
known occluded regions, and allows fast sampling from this
distribution for subsequent planning.