Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios Zhili Zhang Songyang Han Jiangwei Wang Fei Miao

2025-05-03 0 0 2.97MB 7 页 10玖币

侵权投诉

Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of

Connected Autonomous Vehicles in Challenging Scenarios

Zhili Zhang Songyang Han Jiangwei Wang Fei Miao

Abstract— Communication technologies enable coordination

among connected and autonomous vehicles (CAVs). However, it

remains unclear how to utilize shared information to improve

the safety and efﬁciency of the CAV system. In this work, we

propose a framework of constrained multi-agent reinforcement

learning (MARL) with a parallel safety shield for CAVs in chal-

lenging driving scenarios. The coordination mechanisms of the

proposed MARL include information sharing and cooperative

policy learning, with Graph Convolutional Network (GCN)-

Transformer as a spatial-temporal encoder that enhances

the agent’s environment awareness. The safety shield module

with Control Barrier Functions (CBF)-based safety checking

protects the agents from taking unsafe actions. We design

a constrained multi-agent advantage actor-critic (CMAA2C)

algorithm to train safe and cooperative policies for CAVs.

With the experiment deployed in the CARLA simulator, we

verify the effectiveness of the safety checking, spatial-temporal

encoder, and coordination mechanisms designed in our method

by comparative experiments in several challenging scenarios

with the deﬁned hazard vehicles (HAZV). Results show that

our proposed methodology signiﬁcantly increases system safety

and efﬁciency in challenging scenarios.

I. INTRODUCTION

Wireless communication technologies such as WiFi and

5G cellular networks enable vehicle-to-everything (V2X)

communication and help the autonomous vehicle to get extra

information about the driving environment beyond its sens-

ing capability [1], [2]. Shared information captured by the

onboard sensors such as cameras and LIDARs-based vision

information can be used to improve connected autonomous

vehicles’ (CAVs) decision-making [3], [4], [5]. Shared basic

safety messages (BSMs) (velocity, position, heading angle,

and yaw rate) beneﬁt the coordination and control decisions

of CAVs in scenarios such as cross intersections and lane-

merging [6], [7].

However, it is not clear how information sharing beneﬁts

connected autonomous vehicles in challenging scenarios.

Without communication and coordination, it is difﬁcult for

CAVs to react to a trafﬁc-rule-violating behavior or sudden

acceleration/deceleration maneuvers taken by the hazard ve-

hicle as shown in Fig. 1. When an autonomous vehicle gets

extra knowledge about the environment via coordinated V2X

communication, how to design the neural network structure

to utilize the shared information with spatial and temporal

This work was supported by NSF 1849246, NSF 1932250, NSF 2047354

grants. Z. Zhang, S. Han, and F. Miao are with the Department of Computer

Science and Engineering, J. Wang is with the Department of Electri-

cal and Computer Engineering, University of Connecticut, Storrs Mans-

ﬁeld, CT, USA 06268. Email: {zhili.zhang, songyang.han, jiangwei.wang,

fei.miao}@uconn.edu.

(a) (b) (c)

(d) (e) (f)

Fig. 1. Intersection (upper) and Highway (lower) scenarios. 1a, 1d:

scenario initialization; 1b, 1e: successful cases of collaborative collision-

avoidance from test runs of our method; 1c, 1f: collision cases from test runs

of baseline model. Connected autonomous vehicles (CAVs) are in green;

unconnected vehicles (UCVs) are in red; the hazard vehicle (HAZV) is in

red with a yellow triangle mark. The hazard vehicle runs the red light in

Intersection scenario and takes a sudden hard-brake in Highway. Without

the safety shield or coordination, CAVs are likely to collide with HAZV or

other vehicles as in 1c, 1f.

features and how to make prudent decisions to improve

collaborative safety are unsolved challenges.

In this work, we design a spatial-temporal-aware con-

strained MARL framework with parallel Safety Shield for co-

operative policy-learning of CAVs, to improve the safety and

efﬁciency of the system utilizing V2X communication-based

information-sharing. In particular, we consider challenging

driving scenarios with potential trafﬁc hazard vehicles. The

complicated dynamics and interactions among CAVs under

challenging scenarios provide strong motivation for us to

design a Safety Shield for the actions and policies of MARL,

introduced in IV-B. We further introduce coordination mech-

anisms, as illustrated in Fig 2b. We utilize the prevailing

Graph Convolutional Network (GCN) and Transformer struc-

tures as spatial-temporal scene encoders (Fig. 2a) for each

agent to raise their situation awareness, as the actor-critic-

cost neural network of the MARL model. In summary, the

main contributions of this work are:

•We propose a framework of constrained MARL with the

designed Safety Shield based on Control Barrier Func-

tions (CBFs) and verify the signiﬁcant improvement in

collision-free rate with experiments.

•We design a GCN-Transformer encoder integrated with

arXiv:2210.02300v3 [cs.RO] 13 Mar 2023

MARL to utilize the shared spatial and temporal in-

formation among CAVs. Compared with the baseline

model, our solution is enabled to achieve higher safety

metrics and overall returns in challenging scenarios.

•We introduce coordination mechanisms to MARL with

information-sharing and cooperative policy-learning.

Our experiment results show that cooperation among

CAVs improves the collision-free rate and overall return.

II. RELATED WORK

a) Planning and Control of Autonomous Vehicles:

To learn the output control signals for steering angle and

acceleration directly based on the observed environment,

end-to-end learning is designed in CNN-based supervised

learning [8], and CBF-based Deep Reinforcement Learn-

ing [9], when only considering lane-keeping without lane-

changing behavior. The other popular way is to separate the

learning and control phases. Learning methods can give a

high-level decision, such as “go straight”, “go left” [10],

or whether or not to yield to another vehicle [11]. It also

works to ﬁrst extract image features and then apply control

upon these features [12]. However, the works mentioned

above do not consider the connection between CAVs, while

we consider how CAVs should use information sharing to

improve the safety and efﬁciency of the system, and design

an MARL-based algorithm such that CAVs cooperatively

take actions under challenging driving scenarios.

b) GCN, Transformer and Deep MARL: It has not been

addressed yet how to speciﬁcally design a neural network

structure to utilize the communication among CAVs to

improve the system’s safety or efﬁciency in policy learning.

Recent advances like GCN [13] and Transformer [14], [15]

show their advantages in processing spatial and temporal

properties of data. We utilize a GCN-Transformer structure to

capture the spatial-temporal information of driving scenarios

to improve the coordination among CAVs. To the best of

our knowledge, we are the ﬁrst to design a GCN-Transformer

structure-based deep constrained MARL framework to utilize

the shared information among CAVs. We validate that this

design improves the safety rates and total rewards for CAVs

in challenging scenarios with trafﬁc hazards.

c) Constrained MDP and Safe RL: Existing multi-

agent reinforcement learning (MARL) literature [16], [17],

[18], [19] has not fully solved the challenges for CAVs.

Constrained Markov Decision Process (CMDP) [20], [21]

learns a policy to maximize the total reward while main-

taining the total cost under certain constraints. However,

the cost or the constraint does not explicitly represents all

the safety requirements of physical dynamic systems and

cannot be directly applied to solve CAV challenges. The

recent advance with a formal safety guarantee is the model

predictive shielding (MPS) that also works for multi-agent

systems [22], [23]. However, their safety guarantee assumes

an accurate model of vehicles which is difﬁcult to ﬁnd in

reality. Control Barrier Functions are used to map unsafe

actions to a safe action set in MARL [24], but they do

not consider how to design a spatial-temporal encoder actor

or critic network structure for challenging scenarios with

hazard vehicles. In this work, we ﬁrst integrate the strengths

of both constrained MARL and CBF-based safety shield to

further improve the safety of CAVs under the threat of trafﬁc

hazards.

III. PROBLEM FORMULATION

A. Problem Description

We consider the cooperative policy-learning problem for

CAVs in challenging scenarios occurred on a multi-lane

urban intersection or on a multi-lane highway (as shown in

Fig.1). Other trafﬁc participants include unconnected vehicle

(UCVs) and a hazard vehicle (HAZV). Meanwhile infras-

tructures that have sensing, communication and computation

abilities also play a supportive role to CAVs.

A CAV agent is primarily supported with its own observa-

tion oi, the shared observation oNifrom neighboring agents

Nibased on V2V communication and the shared observation

oinf from the road infrastructures. Speciﬁcally, Niprovides

extra sensor measurements and sensor-detection data, such

as lane-detection with camera images and object detection

with LiDARs [25]. oinf is broadcasted messages to CAVs

from road infrastructures, such as Radar that can broadcast

the detected speed and location of nearby vehicles.

B. Constrained MARL Problem Formulation

A Constrained MARL is deﬁned as a tuple G=

(S,A, P, {ri},{ci},G, γ)where G:= (N,E)is the commu-

nication network of all CAV agents; Sis the joint state space

of all agents: S:= S1× · · · × Sn. The state space of agent

i:Si={oi, oj∈Ni, oinf}contains information from three

sources: self-observation oifrom vehicle i’s own odometers

and sensors, observation oj∈Nishared by other connected

agents and observation oinf shared by infrastructure. The

observation of each CAV is oi={(li,vi,αi),deti}, where

(li,vi,αi)is the GPS location, velocity and acceleration of

agent i, detiis the vision-based sensors (on-board camera

and 3D point-cloud LiDAR) object detection results. The

joint action set is A:= A1× · · · × Anwhere Ai=

{ai,1, ai,2,· · · , ai,4+k}is the discrete ﬁnite action space for

agent i, and

•ai,1: KEEP-LANE-SPEED - the CAV imaintains cur-

rent speed in the current lane

•ai,2: CHANGE-LANE-LEFT - the CAV ichanges to

its left lane. In experiment, by taking ai,2we set the

target waypoint on the left lane.

•ai,3: CHANGE-LANE-RIGHT - the CAV ichanges to

its right lane. In experiment, by taking ai,3we set the

target waypoint on the right lane.

•ai,4: BRAKE. In the experiment, the CAV i’s actuator

will compute a brake value within range braket

i∈

[0,0.5] at time t.

•ai,5, ai,6, . . . , ai,4+kare kdiscretized throttle intervals.

Given the available throttle value set in the simulator

as [0,1], we set ai,4+j= [j−1

k,j

k]. By choosing the

action ai,5, for example, the actuator of the vehicle iwill

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Spatial-Temporal-AwareSafeMulti-AgentReinforcementLearningofConnectedAutonomousVehiclesinChallengingScenariosZhiliZhangSongyangHanJiangweiWangFeiMiaoAbstractCommunicationtechnologiesenablecoordinationamongconnectedandautonomousvehicles(CAVs).However,itremainsunclearhowtoutilizesharedinformationtoim...

展开>> 收起<<

Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios Zhili Zhang Songyang Han Jiangwei Wang Fei Miao.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios Zhili Zhang Songyang Han Jiangwei Wang Fei Miao

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: