GLAD: Grounded Layered Autonomous Driving
for Complex Service Tasks
Yan Ding, Cheng Cui, Xiaohan Zhang, and Shiqi Zhang
Abstract— Given the current point-to-point navigation capa-
bilities of autonomous vehicles, researchers are looking into
complex service requests that require the vehicles to visit
multiple points of interest. In this paper, we develop a lay-
ered planning framework, called GLAD, for complex service
requests in autonomous urban driving. There are three layers
for service-level, behavior-level, and motion-level planning. The
layered framework is unique in its tight coupling, where the
different layers communicate user preferences, safety estimates,
and motion costs for system optimization. GLAD is visually
grounded by perceptual learning from a dataset of 13.8k
instances collected from driving behaviors. GLAD enables
autonomous vehicles to efficiently and safely fulfill complex
service requests. Experimental results from abstract and full
simulation show that our system outperforms a few competitive
baselines from the literature.
I. INTRODUCTION
Self-driving cars are changing people’s everyday lives.
Narrowly defined autonomous driving technology is con-
cerned with point-to-point navigation and obstacle avoid-
ance [1], where recent advances in perception and machine
learning have made significant achievements. In this paper,
we are concerned with urban driving scenarios, where vehi-
cles must follow traffic rules and social norms to perform
driving behaviors, such as merging lanes and parking on the
right. At the same time, the vehicles need to fulfill service
requests from end users. Consider the following scenario:
Emma asks her autonomous car to drive her home
after work. On her way home, Emma needs to pick
up her kid Lucas from school, stop at a gas station,
and visit a grocery store. In rush hour, driving in
some areas can be difficult. Lucas does not like the
gas smell, but he likes shopping with Emma.
The goal of Emma’s autonomous car is to efficiently and
safely fulfill her requests while respecting the preferences of
Emma (and her family). We say a service request is complex,
if fulfilling it requires the vehicle to visit two or more
points of interest (POIs), such as a gas station and a grocery
store, each corresponding to a driving task. Facing such a
service request, a straightforward idea is to first sequence
the driving tasks of visiting different POIs, and then perform
behavioral and motion planning to complete those tasks.
However, this idea is less effective in practice, because of the
unforeseen execution-time dynamism of traffic conditions.
For instance, the vehicle might find it difficult to merge right
The authors are with the Department of Computer Science, SUNY Bing-
hamton, Binghamton NY 13902. {yding25, ccui7, xzhan244,
zhangs}@binghamton.edu
and park at a gas station because of unanticipated heavy
traffic. This observation motivates this work that leverages
visual perception to bridge the communication gap between
different decision-making layers for urban driving.
In this paper, we develop Grounded Layered Autonomous
Driving (GLAD), a planning framework for urban driving
that includes three decision-making layers for service, be-
havior, and motion respectively. The service (top) layer is
for sequencing POIs to be visited in order to fulfill users’
service requests. User preferences, such as “Lucas likes
shopping with Emma”, can be incorporated into this layer.
The behavior (middle) layer plans driving behaviors, such as
“merge left” and “drive straight”. The motion (bottom) layer
aims to compute trajectories and follow them to realize the
middle-layer behaviors. GLAD is novel in its bidirectional
communication mechanism between different layers. For
example, the bottom layer reports motion cost estimates up to
the top two layers for plan optimization. The safety estimates
of different driving behaviors (middle layer) are reported to
the top layer, and the safety estimation is conditioned on
the motion trajectories in the bottom layer. An overview of
GLAD is presented in Fig. 1.
“Grounding” is a concept that was initially developed in
the literature of symbolic reasoning [2]. In this work, the
vehicle’s behavioral planner (middle layer) relies on sym-
bolic rules, such as “If the current situation is safe, a merge
left behavior will move a vehicle to the lane on its left.”
While classical planning methods assume perfect information
about “X is safe,” an autonomous vehicle needs its perception
algorithms to visually ground such symbols in the real world.
We used the CARLA simulator [3] to collect a dataset of
13.8kinstances, each including 16 images, for evaluating
the safety levels of driving behaviors. Learning from our
gathered dataset enables GLAD to visually ground symbolic
predicates for planning driving behaviors. We have compared
GLAD with baseline methods [4], [5] that support decision-
making at behavioral and motion levels. Results show that
GLAD produced the highest overall utility compared to the
baseline methods.
II. BACKGROUND AND RELATED WORK
Service agents sometimes need more than one action
to fulfill service requests. Task planning methods aim to
sequence symbolic actions to complete such complex tasks.
There are at least two types of task planning, namely auto-
mated planning [6], [7] and planning under uncertainty [8],
that can be distinguished based on their assumptions about
the determinism of action outcomes. Automated planning
arXiv:2210.02302v1 [cs.RO] 5 Oct 2022