Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios Zhili Zhang Songyang Han Jiangwei Wang Fei Miao

2025-05-03 0 0 2.97MB 7 页 10玖币
侵权投诉
Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of
Connected Autonomous Vehicles in Challenging Scenarios
Zhili Zhang Songyang Han Jiangwei Wang Fei Miao
Abstract Communication technologies enable coordination
among connected and autonomous vehicles (CAVs). However, it
remains unclear how to utilize shared information to improve
the safety and efficiency of the CAV system. In this work, we
propose a framework of constrained multi-agent reinforcement
learning (MARL) with a parallel safety shield for CAVs in chal-
lenging driving scenarios. The coordination mechanisms of the
proposed MARL include information sharing and cooperative
policy learning, with Graph Convolutional Network (GCN)-
Transformer as a spatial-temporal encoder that enhances
the agent’s environment awareness. The safety shield module
with Control Barrier Functions (CBF)-based safety checking
protects the agents from taking unsafe actions. We design
a constrained multi-agent advantage actor-critic (CMAA2C)
algorithm to train safe and cooperative policies for CAVs.
With the experiment deployed in the CARLA simulator, we
verify the effectiveness of the safety checking, spatial-temporal
encoder, and coordination mechanisms designed in our method
by comparative experiments in several challenging scenarios
with the defined hazard vehicles (HAZV). Results show that
our proposed methodology significantly increases system safety
and efficiency in challenging scenarios.
I. INTRODUCTION
Wireless communication technologies such as WiFi and
5G cellular networks enable vehicle-to-everything (V2X)
communication and help the autonomous vehicle to get extra
information about the driving environment beyond its sens-
ing capability [1], [2]. Shared information captured by the
onboard sensors such as cameras and LIDARs-based vision
information can be used to improve connected autonomous
vehicles’ (CAVs) decision-making [3], [4], [5]. Shared basic
safety messages (BSMs) (velocity, position, heading angle,
and yaw rate) benefit the coordination and control decisions
of CAVs in scenarios such as cross intersections and lane-
merging [6], [7].
However, it is not clear how information sharing benefits
connected autonomous vehicles in challenging scenarios.
Without communication and coordination, it is difficult for
CAVs to react to a traffic-rule-violating behavior or sudden
acceleration/deceleration maneuvers taken by the hazard ve-
hicle as shown in Fig. 1. When an autonomous vehicle gets
extra knowledge about the environment via coordinated V2X
communication, how to design the neural network structure
to utilize the shared information with spatial and temporal
This work was supported by NSF 1849246, NSF 1932250, NSF 2047354
grants. Z. Zhang, S. Han, and F. Miao are with the Department of Computer
Science and Engineering, J. Wang is with the Department of Electri-
cal and Computer Engineering, University of Connecticut, Storrs Mans-
field, CT, USA 06268. Email: {zhili.zhang, songyang.han, jiangwei.wang,
fei.miao}@uconn.edu.
(a) (b) (c)
(d) (e) (f)
Fig. 1. Intersection (upper) and Highway (lower) scenarios. 1a, 1d:
scenario initialization; 1b, 1e: successful cases of collaborative collision-
avoidance from test runs of our method; 1c, 1f: collision cases from test runs
of baseline model. Connected autonomous vehicles (CAVs) are in green;
unconnected vehicles (UCVs) are in red; the hazard vehicle (HAZV) is in
red with a yellow triangle mark. The hazard vehicle runs the red light in
Intersection scenario and takes a sudden hard-brake in Highway. Without
the safety shield or coordination, CAVs are likely to collide with HAZV or
other vehicles as in 1c, 1f.
features and how to make prudent decisions to improve
collaborative safety are unsolved challenges.
In this work, we design a spatial-temporal-aware con-
strained MARL framework with parallel Safety Shield for co-
operative policy-learning of CAVs, to improve the safety and
efficiency of the system utilizing V2X communication-based
information-sharing. In particular, we consider challenging
driving scenarios with potential traffic hazard vehicles. The
complicated dynamics and interactions among CAVs under
challenging scenarios provide strong motivation for us to
design a Safety Shield for the actions and policies of MARL,
introduced in IV-B. We further introduce coordination mech-
anisms, as illustrated in Fig 2b. We utilize the prevailing
Graph Convolutional Network (GCN) and Transformer struc-
tures as spatial-temporal scene encoders (Fig. 2a) for each
agent to raise their situation awareness, as the actor-critic-
cost neural network of the MARL model. In summary, the
main contributions of this work are:
We propose a framework of constrained MARL with the
designed Safety Shield based on Control Barrier Func-
tions (CBFs) and verify the significant improvement in
collision-free rate with experiments.
We design a GCN-Transformer encoder integrated with
arXiv:2210.02300v3 [cs.RO] 13 Mar 2023
MARL to utilize the shared spatial and temporal in-
formation among CAVs. Compared with the baseline
model, our solution is enabled to achieve higher safety
metrics and overall returns in challenging scenarios.
We introduce coordination mechanisms to MARL with
information-sharing and cooperative policy-learning.
Our experiment results show that cooperation among
CAVs improves the collision-free rate and overall return.
II. RELATED WORK
a) Planning and Control of Autonomous Vehicles:
To learn the output control signals for steering angle and
acceleration directly based on the observed environment,
end-to-end learning is designed in CNN-based supervised
learning [8], and CBF-based Deep Reinforcement Learn-
ing [9], when only considering lane-keeping without lane-
changing behavior. The other popular way is to separate the
learning and control phases. Learning methods can give a
high-level decision, such as “go straight”, “go left” [10],
or whether or not to yield to another vehicle [11]. It also
works to first extract image features and then apply control
upon these features [12]. However, the works mentioned
above do not consider the connection between CAVs, while
we consider how CAVs should use information sharing to
improve the safety and efficiency of the system, and design
an MARL-based algorithm such that CAVs cooperatively
take actions under challenging driving scenarios.
b) GCN, Transformer and Deep MARL: It has not been
addressed yet how to specifically design a neural network
structure to utilize the communication among CAVs to
improve the system’s safety or efficiency in policy learning.
Recent advances like GCN [13] and Transformer [14], [15]
show their advantages in processing spatial and temporal
properties of data. We utilize a GCN-Transformer structure to
capture the spatial-temporal information of driving scenarios
to improve the coordination among CAVs. To the best of
our knowledge, we are the first to design a GCN-Transformer
structure-based deep constrained MARL framework to utilize
the shared information among CAVs. We validate that this
design improves the safety rates and total rewards for CAVs
in challenging scenarios with traffic hazards.
c) Constrained MDP and Safe RL: Existing multi-
agent reinforcement learning (MARL) literature [16], [17],
[18], [19] has not fully solved the challenges for CAVs.
Constrained Markov Decision Process (CMDP) [20], [21]
learns a policy to maximize the total reward while main-
taining the total cost under certain constraints. However,
the cost or the constraint does not explicitly represents all
the safety requirements of physical dynamic systems and
cannot be directly applied to solve CAV challenges. The
recent advance with a formal safety guarantee is the model
predictive shielding (MPS) that also works for multi-agent
systems [22], [23]. However, their safety guarantee assumes
an accurate model of vehicles which is difficult to find in
reality. Control Barrier Functions are used to map unsafe
actions to a safe action set in MARL [24], but they do
not consider how to design a spatial-temporal encoder actor
or critic network structure for challenging scenarios with
hazard vehicles. In this work, we first integrate the strengths
of both constrained MARL and CBF-based safety shield to
further improve the safety of CAVs under the threat of traffic
hazards.
III. PROBLEM FORMULATION
A. Problem Description
We consider the cooperative policy-learning problem for
CAVs in challenging scenarios occurred on a multi-lane
urban intersection or on a multi-lane highway (as shown in
Fig.1). Other traffic participants include unconnected vehicle
(UCVs) and a hazard vehicle (HAZV). Meanwhile infras-
tructures that have sensing, communication and computation
abilities also play a supportive role to CAVs.
A CAV agent is primarily supported with its own observa-
tion oi, the shared observation oNifrom neighboring agents
Nibased on V2V communication and the shared observation
oinf from the road infrastructures. Specifically, Niprovides
extra sensor measurements and sensor-detection data, such
as lane-detection with camera images and object detection
with LiDARs [25]. oinf is broadcasted messages to CAVs
from road infrastructures, such as Radar that can broadcast
the detected speed and location of nearby vehicles.
B. Constrained MARL Problem Formulation
A Constrained MARL is defined as a tuple G=
(S,A, P, {ri},{ci},G, γ)where G:= (N,E)is the commu-
nication network of all CAV agents; Sis the joint state space
of all agents: S:= S1× · · · × Sn. The state space of agent
i:Si={oi, oj∈Ni, oinf}contains information from three
sources: self-observation oifrom vehicle is own odometers
and sensors, observation oj∈Nishared by other connected
agents and observation oinf shared by infrastructure. The
observation of each CAV is oi={(li,vi,αi),deti}, where
(li,vi,αi)is the GPS location, velocity and acceleration of
agent i, detiis the vision-based sensors (on-board camera
and 3D point-cloud LiDAR) object detection results. The
joint action set is A:= A1× · · · × Anwhere Ai=
{ai,1, ai,2,· · · , ai,4+k}is the discrete finite action space for
agent i, and
ai,1: KEEP-LANE-SPEED - the CAV imaintains cur-
rent speed in the current lane
ai,2: CHANGE-LANE-LEFT - the CAV ichanges to
its left lane. In experiment, by taking ai,2we set the
target waypoint on the left lane.
ai,3: CHANGE-LANE-RIGHT - the CAV ichanges to
its right lane. In experiment, by taking ai,3we set the
target waypoint on the right lane.
ai,4: BRAKE. In the experiment, the CAV is actuator
will compute a brake value within range braket
i
[0,0.5] at time t.
ai,5, ai,6, . . . , ai,4+kare kdiscretized throttle intervals.
Given the available throttle value set in the simulator
as [0,1], we set ai,4+j= [j1
k,j
k]. By choosing the
action ai,5, for example, the actuator of the vehicle iwill
摘要:

Spatial-Temporal-AwareSafeMulti-AgentReinforcementLearningofConnectedAutonomousVehiclesinChallengingScenariosZhiliZhangSongyangHanJiangweiWangFeiMiaoAbstract—Communicationtechnologiesenablecoordinationamongconnectedandautonomousvehicles(CAVs).However,itremainsunclearhowtoutilizesharedinformationtoim...

展开>> 收起<<
Spatial-Temporal-Aware Safe Multi-Agent Reinforcement Learning of Connected Autonomous Vehicles in Challenging Scenarios Zhili Zhang Songyang Han Jiangwei Wang Fei Miao.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:2.97MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注