GLAD Grounded Layered Autonomous Driving for Complex Service Tasks Yan Ding Cheng Cui Xiaohan Zhang and Shiqi Zhang

2025-04-29 0 0 2.63MB 7 页 10玖币

侵权投诉

GLAD: Grounded Layered Autonomous Driving

for Complex Service Tasks

Yan Ding, Cheng Cui, Xiaohan Zhang, and Shiqi Zhang

Abstract— Given the current point-to-point navigation capa-

bilities of autonomous vehicles, researchers are looking into

complex service requests that require the vehicles to visit

multiple points of interest. In this paper, we develop a lay-

ered planning framework, called GLAD, for complex service

requests in autonomous urban driving. There are three layers

for service-level, behavior-level, and motion-level planning. The

layered framework is unique in its tight coupling, where the

different layers communicate user preferences, safety estimates,

and motion costs for system optimization. GLAD is visually

grounded by perceptual learning from a dataset of 13.8k

instances collected from driving behaviors. GLAD enables

autonomous vehicles to efﬁciently and safely fulﬁll complex

service requests. Experimental results from abstract and full

simulation show that our system outperforms a few competitive

baselines from the literature.

I. INTRODUCTION

Self-driving cars are changing people’s everyday lives.

Narrowly deﬁned autonomous driving technology is con-

cerned with point-to-point navigation and obstacle avoid-

ance [1], where recent advances in perception and machine

learning have made signiﬁcant achievements. In this paper,

we are concerned with urban driving scenarios, where vehi-

cles must follow trafﬁc rules and social norms to perform

driving behaviors, such as merging lanes and parking on the

right. At the same time, the vehicles need to fulﬁll service

requests from end users. Consider the following scenario:

Emma asks her autonomous car to drive her home

after work. On her way home, Emma needs to pick

up her kid Lucas from school, stop at a gas station,

and visit a grocery store. In rush hour, driving in

some areas can be difﬁcult. Lucas does not like the

gas smell, but he likes shopping with Emma.

The goal of Emma’s autonomous car is to efﬁciently and

safely fulﬁll her requests while respecting the preferences of

Emma (and her family). We say a service request is complex,

if fulﬁlling it requires the vehicle to visit two or more

points of interest (POIs), such as a gas station and a grocery

store, each corresponding to a driving task. Facing such a

service request, a straightforward idea is to ﬁrst sequence

the driving tasks of visiting different POIs, and then perform

behavioral and motion planning to complete those tasks.

However, this idea is less effective in practice, because of the

unforeseen execution-time dynamism of trafﬁc conditions.

For instance, the vehicle might ﬁnd it difﬁcult to merge right

The authors are with the Department of Computer Science, SUNY Bing-

hamton, Binghamton NY 13902. {yding25, ccui7, xzhan244,

zhangs}@binghamton.edu

and park at a gas station because of unanticipated heavy

trafﬁc. This observation motivates this work that leverages

visual perception to bridge the communication gap between

different decision-making layers for urban driving.

In this paper, we develop Grounded Layered Autonomous

Driving (GLAD), a planning framework for urban driving

that includes three decision-making layers for service, be-

havior, and motion respectively. The service (top) layer is

for sequencing POIs to be visited in order to fulﬁll users’

service requests. User preferences, such as “Lucas likes

shopping with Emma”, can be incorporated into this layer.

The behavior (middle) layer plans driving behaviors, such as

“merge left” and “drive straight”. The motion (bottom) layer

aims to compute trajectories and follow them to realize the

middle-layer behaviors. GLAD is novel in its bidirectional

communication mechanism between different layers. For

example, the bottom layer reports motion cost estimates up to

the top two layers for plan optimization. The safety estimates

of different driving behaviors (middle layer) are reported to

the top layer, and the safety estimation is conditioned on

the motion trajectories in the bottom layer. An overview of

GLAD is presented in Fig. 1.

“Grounding” is a concept that was initially developed in

the literature of symbolic reasoning [2]. In this work, the

vehicle’s behavioral planner (middle layer) relies on sym-

bolic rules, such as “If the current situation is safe, a merge

left behavior will move a vehicle to the lane on its left.”

While classical planning methods assume perfect information

about “X is safe,” an autonomous vehicle needs its perception

algorithms to visually ground such symbols in the real world.

We used the CARLA simulator [3] to collect a dataset of

13.8kinstances, each including 16 images, for evaluating

the safety levels of driving behaviors. Learning from our

gathered dataset enables GLAD to visually ground symbolic

predicates for planning driving behaviors. We have compared

GLAD with baseline methods [4], [5] that support decision-

making at behavioral and motion levels. Results show that

GLAD produced the highest overall utility compared to the

baseline methods.

II. BACKGROUND AND RELATED WORK

Service agents sometimes need more than one action

to fulﬁll service requests. Task planning methods aim to

sequence symbolic actions to complete such complex tasks.

There are at least two types of task planning, namely auto-

mated planning [6], [7] and planning under uncertainty [8],

that can be distinguished based on their assumptions about

the determinism of action outcomes. Automated planning

arXiv:2210.02302v1 [cs.RO] 5 Oct 2022

Service-level

planning

Behavior-level

planning

Motion-level

planning

Navigation

cost

Safety

estimates

Behavior planning

goals

Motion planning

goals

Grounded Layered Autonomous Driving (GLAD)

User

preferences

Traffic

conditions

Vehicle

localization

Visual

perception

Soft

constraints

Current

positions

Service requests

Pretrained

encoder

Back

Safety estimator

Front

Left

Right

4*8192

16*288*288*3

Binary

classifier

Image

dataset

[0.0, 0.1]

Training

Top

Middle

Bottom

Fig. 1. An overview of the GLAD planning framework for complex driving tasks in urban scenarios. GLAD consists of three decision-making layers

about fulﬁlling service requests, sequencing driving behaviors, and computing motion trajectories respectively. GLAD is a visually grounded planning

framework, because the safety levels of driving behaviors are evaluated using computer vision.

systems compute a sequence of actions that lead transitions

to a goal state. Assuming non-deterministic action outcomes,

planning under uncertainty algorithms model state transi-

tions using Markov decision processes (MDPs) and compute

a policy that maps the current state to action. Other than

task planning, autonomous vehicles need motion planning

algorithms [9], [10] to compute motion trajectories and

generate control signals to follow those trajectories. This

work involves both task planning and motion planning.

Next, we discuss how those different planning paradigms

have been used in driving and robotics literature.

A. Planning for Autonomous Driving

Within the context of planning in autonomous driving,

the majority of research has been focusing on comput-

ing trajectories connecting a vehicle’s current position to

its desired goal [11]–[15]. For instance, Hu et al. (2018)

developed an approach that generates an optimal path in

real time as well as the appropriate acceleration and speed

for the autonomous vehicle in order to avoid both static

and moving obstacles [11]. Those works concentrated on

the motion planning and control problems in autonomous

driving, whereas we further include task-level planning for

driving behaviors and fulﬁlling service requests.

Driving behaviors, such as merging lanes and turning,

have been modeled as a set of symbolic actions [16]–[19].

Those works focused on task-level, discrete-space behavioral

planning, and did not consider safety, cost, or both at the

motion level. Another difference is that those works did not

consider user preferences, whereas we do in this work.

Recent research has incorporated state estimation into

behavioral planning in autonomous urban driving. For in-

stance, the work of Phiquepal and Toussaint (2019) modeled

urban driving as a partially observable Markov decision

process to enable active information gathering for decision-

making [20]. Another example is the work of Ding et

al. (2020) that quantitatively estimated safety levels for

behavioral planning [5]. In line with those methods, we

consider partially observable worlds, and compute plans

under uncertainty. Beyond that, GLAD has a perception com-

ponent, and leverages egocentric vision for safety evaluation,

which improves the real-world applicability of our approach

compared with those methods.

B. Task and Motion Planning in Robotics

Researchers in robotics have developed a variety of task

and motion planning (TAMP) algorithms to enable robots to

plan to fulﬁll task-level goals while maintaining motion-level

feasibility, as summarized in review articles [21], [22]. The

majority of TAMP research concentrated on manipulation do-

mains, where ensuring plan feasibility is key, and efﬁciency

in task completion is secondary [23]–[26]. Urban driving

tasks tend to be time-consuming, and sub-optimal plans

might result in a very long execution time. Different from

most TAMP methods, our work incorporates task-completion

efﬁciency in addition to plan feasibility.

There are a few TAMP works that incorporated efﬁciency

into plan optimization [27]–[29]. For instance, the work

of Zhang et al. (2022) leveraged computer vision towards

computing efﬁcient and feasible task-motion plans for a

mobile manipulator in indoor environments [29]. In line with

those methods, we consider both task-completion efﬁciency,

and plan feasibility. Different from those TAMP methods

for indoor scenarios, GLAD focuses on urban driving, and

further considers user preferences and road safety.

III. PROBLEM STATEMENT

Planning-time Input: A service request is speciﬁed as

hard constraints that can be satisﬁed using multiple driving

behaviors, denoted as csthard. User preferences are formulated

using soft constraints, denoted as cstsoft. Violating such a soft

constraint introduces a penalty. In our domains, service task

speciﬁcations and user preferences are provided as part of

the problem deﬁnition. The hard and soft constraints can

be encoded in different ways depending on the syntax of

planning languages and systems.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GLAD:GroundedLayeredAutonomousDrivingforComplexServiceTasksYanDing,ChengCui,XiaohanZhang,andShiqiZhangAbstractGiventhecurrentpoint-to-pointnavigationcapa-bilitiesofautonomousvehicles,researchersarelookingintocomplexservicerequeststhatrequirethevehiclestovisitmultiplepointsofinterest.Inthispaper,wed...

展开>> 收起<<

GLAD Grounded Layered Autonomous Driving for Complex Service Tasks Yan Ding Cheng Cui Xiaohan Zhang and Shiqi Zhang.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

GLAD Grounded Layered Autonomous Driving for Complex Service Tasks Yan Ding Cheng Cui Xiaohan Zhang and Shiqi Zhang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: