Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit

2025-05-06 0 0 1.54MB 8 页 10玖币

侵权投诉

Graded-Q Reinforcement Learning with

Information-Enhanced State Encoder for

Hierarchical Collaborative Multi-Vehicle Pursuit

Yiying Yang, Xinhang Li, Zheng Yuan, Qinwen Wang, Chen Xu, and Lin Zhang

School of Artiﬁcial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China

{yyying, lixinhang, yuanzheng, wangqinwen, chen.xu, zhanglin}@bupt.edu.cn

Abstract—The multi-vehicle pursuit (MVP), as a problem

abstracted from various real-world scenarios, is becoming a hot

research topic in Intelligent Transportation System (ITS). The

combination of Artiﬁcial Intelligence (AI) and connected vehicles

has greatly promoted the research development of MVP. However,

existing works on MVP pay little attention to the importance

of information exchange and cooperation among pursuing vehi-

cles under the complex urban trafﬁc environment. This paper

proposed a graded-Q reinforcement learning with information-

enhanced state encoder (GQRL-IESE) framework to address

this hierarchical collaborative multi-vehicle pursuit (HCMVP)

problem. In the GQRL-IESE, a cooperative graded Q scheme is

proposed to facilitate the decision-making of pursuing vehicles to

improve pursuing efﬁciency. Each pursuing vehicle further uses

a deep Q network (DQN) to make decisions based on its encoded

state. A coordinated Q optimizing network adjusts the individual

decisions based on the current environment trafﬁc information to

obtain the global optimal action set. In addition, an information-

enhanced state encoder is designed to extract critical information

from multiple perspectives, and uses the attention mechanism to

assist each pursuing vehicle in effectively determining the target.

Extensive experimental results based on SUMO indicate that the

total timestep of the proposed GQRL-IESE is less than other

methods on average by 47.64%, which demonstrates the excellent

pursuing efﬁciency of the GQRL-IESE. Codes are outsourced in

https://github.com/ANT-ITS/GQRL-IESE.

Index Terms—cooperative multi-agent reinforcement learning,

hierarchical collaborative multi-vehicle pursuit, GQRL-IESE

I. INTRODUCTION

The Intelligent Transportation System (ITS), as an essential

part of the smart city, is greatly facilitated by the development

of emerging technologies. The Internet of Vehicles (IoVs)

enables ITS to realize dynamic and intelligent management

of trafﬁc [1] [2]. Pursuit-evasion game (PEG), as a realistic

problem for studying the self-learning and autonomous control

of multiple agents, has been extensively studied in many ﬁelds,

such as spacecraft control [3] and robot control [4]. Multi-

vehicle pursuit (MVP), as an embodiment of PEG in ITS, has

more conditional constraints, such as complex road structures,

additional trafﬁc participants, and trafﬁc rules constraints. A

patrol guide released by the New York City Police Department,

representatively describes an MVP game, where multiple pol-

icy vehicles cooperate to capture single or multiple suspected

vehicles [5].

This work was supported by the National Natural Science Foundation of

China (Grant No. 62071179) and project A02B01C01-201916D2

Regarding MVP, there have been some works on game

theory-based methods. [6] focused on the multi-player pursuit

game with malicious pursuers and constructed a nonzero-sum

game framework to learn pursuers with different emotional in-

tentions to complete the task. [7] developed a model predictive

control method to address the problem of limited information

on the pursuers, in which each pursuer only focused on

its opponents’ information. [8] adopted the graph-theoretic

method to learn the interaction between the perception-limited

agents and set the Minmax strategy to maintain the safe

operation when the system failed to reach the Nash equilib-

rium. However, it is difﬁcult for these methods to construct

a suitable objective function, and these methods pay little

attention to the cooperation among pursuers in the dynamic

trafﬁc environment, which directly affects the effectiveness of

the pursuit.

Cooperative multi-agent reinforcement learning (CoMARL)

has been widely used in the coordinated control of multi-

agent systems (MASs), such as trafﬁc light control [9], and

network resource allocation [10]. CoMARL aims to maximize

all agents’ expected long-term common accumulative reward

by learning a series of optimal policies or action sets [11].

There is a growing research interest in applying CoMARL to

MVP problem due to the powerful coordination mechanism

and real-time decision-making ability of CoMARL. [12] de-

veloped a probabilistic reward-based reinforcement learning

(RL) method based on multi-agent deep deterministic policy

gradient (MADDPG), where all pursuing agents are trained

by a critic network, to accomplish the pursuit. [13] designed a

target prediction network in the traditional general multi-agent

reinforcement learning framework to more usefully assist the

agents in decision-making. [14] introduced adversarial attack

ticks and adversarial learning based on MADDPG to help

agents learn more robust strategies. [15] added Transformer

based on QMIX and learned historical observations from time

and team, thereby promoting pursuers to learn cooperative

pursuing strategies. [16] developed a CoMARL framework

combining collaborative exploration and attention-QMIX to

coordinately complete tasks, and the collaborative effective-

ness of the CoMARL framework had been veriﬁed on a

predator-prey scenario. However, these CoMARL methods on

MVP are performed on the open or grid environment, and the

complex trafﬁc environments and trafﬁc rules constraints will

arXiv:2210.13470v1 [cs.LG] 24 Oct 2022

Fig. 1. The architecture of GQRL-IESE. (a) Complex urban trafﬁc scene for HCMVP. (b) Information-enhanced state encoder (IESE). (c) DQN-based pursuing

decision-making for multiple pursuing vehicles. (d) Coordinated Q optimizing network. (b) encodes the state observed in (a) and inputs the encoded state into

(c) for decision-making. The decisions of the pursuing vehicles generated by (c) are not directly executed, while fed into (d) for evaluation, considering the

current environment trafﬁc information. The Q-matrix is optimized and adjusted according to the evaluation results of decision-making to obtain the current

optimal action set.

bring them new challenges.

In this paper, we propose a graded-Q reinforcement learning

with information-enhanced state encoder framework (GQRL-

IESE) for hierarchical collaborative multi-vehicle pursuit

(HCMVP) under the complex urban environment. The ar-

chitecture of the proposed GQRL-IESE is shown in Fig. 1.

Compared with traditional RL, the proposed Graded-Q RL

framework enhances the cooperative decision-making ability

of agents in MASs. In GQRL-IESE, an information-enhanced

state encoder (IESE) is designed and implemented to encode

complex states and extract effective information. Moreover,

equipped with a cooperative graded-Q scheme, the GQRL-

IESE coordinates the decisions of each pursuing vehicle to

enable them to complete tasks cooperatively and efﬁciently.

Furthermore, the main contributions of this paper are as

follows:

•This paper proposes a graded-Q reinforcement learning

with information-enhanced state encoder framework to

address the HCMVP problem under the complex urban

trafﬁc environment.

•This paper designs an information-enhanced state encoder

to extract crucial information from multi-dimension states

of various pursuing participants, thus boosting the DQN-

based decision-making of pursuing vehicles.

•This paper proposes a cooperative graded-Q scheme to

facilitate cooperation among pursuing vehicles, which in-

troduces a coordinated Q optimizing network considering

the current environment trafﬁc information to promote the

multi-agent pursuing policy.

The rest of this paper is organized as follows. Section II

presents an HCMVP problem and a detailed statement of the

proposed GQRL-IESE for HCMVP. Section III shows the

structure of information-enhanced state encoder. The details

of the proposed cooperative graded-Q scheme are given in

Section IV. Section V conducts experiments to verify the

performance of GQRL-IESE, and Section VI concludes this

paper.

II. ANINFORMATION-ENHANCED COOPERATIVE

REINFORCEMENT LEARNING FRAMEWORK FOR HCMVP

A. HCMVP Problem Statement Under Complex Urban Trafﬁc

Environment

This paper focuses on the HCMVP problem under the

complex urban trafﬁc environment. Different from the tradi-

tional MVP problem, where the pursuing vehicle only makes

pursuing decisions according to its own information, the

HCMVP problem focuses on the hierarchical optimization of

cooperation and decision-making among pursuing vehicles.

In the HCMVP problem, each pursuing vehicle can obtain

global position information of other pursuing vehicles and

evading vehicles through vehicle-to-vehicle (V2V) or vehicle-

to-infrastructure (V2I). The goal of the HCMVP problem

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Graded-QReinforcementLearningwithInformation-EnhancedStateEncoderforHierarchicalCollaborativeMulti-VehiclePursuitYiyingYang,XinhangLi,ZhengYuan,QinwenWang,ChenXu,andLinZhangSchoolofArticialIntelligence,BeijingUniversityofPostsandTelecommunications,Beijing,Chinafyyying,lixinhang,yuanzheng,wangqinwen...

展开>> 收起<<

Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: