Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit

2025-05-06 0 0 1.54MB 8 页 10玖币
侵权投诉
Graded-Q Reinforcement Learning with
Information-Enhanced State Encoder for
Hierarchical Collaborative Multi-Vehicle Pursuit
Yiying Yang, Xinhang Li, Zheng Yuan, Qinwen Wang, Chen Xu, and Lin Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
{yyying, lixinhang, yuanzheng, wangqinwen, chen.xu, zhanglin}@bupt.edu.cn
Abstract—The multi-vehicle pursuit (MVP), as a problem
abstracted from various real-world scenarios, is becoming a hot
research topic in Intelligent Transportation System (ITS). The
combination of Artificial Intelligence (AI) and connected vehicles
has greatly promoted the research development of MVP. However,
existing works on MVP pay little attention to the importance
of information exchange and cooperation among pursuing vehi-
cles under the complex urban traffic environment. This paper
proposed a graded-Q reinforcement learning with information-
enhanced state encoder (GQRL-IESE) framework to address
this hierarchical collaborative multi-vehicle pursuit (HCMVP)
problem. In the GQRL-IESE, a cooperative graded Q scheme is
proposed to facilitate the decision-making of pursuing vehicles to
improve pursuing efficiency. Each pursuing vehicle further uses
a deep Q network (DQN) to make decisions based on its encoded
state. A coordinated Q optimizing network adjusts the individual
decisions based on the current environment traffic information to
obtain the global optimal action set. In addition, an information-
enhanced state encoder is designed to extract critical information
from multiple perspectives, and uses the attention mechanism to
assist each pursuing vehicle in effectively determining the target.
Extensive experimental results based on SUMO indicate that the
total timestep of the proposed GQRL-IESE is less than other
methods on average by 47.64%, which demonstrates the excellent
pursuing efficiency of the GQRL-IESE. Codes are outsourced in
https://github.com/ANT-ITS/GQRL-IESE.
Index Terms—cooperative multi-agent reinforcement learning,
hierarchical collaborative multi-vehicle pursuit, GQRL-IESE
I. INTRODUCTION
The Intelligent Transportation System (ITS), as an essential
part of the smart city, is greatly facilitated by the development
of emerging technologies. The Internet of Vehicles (IoVs)
enables ITS to realize dynamic and intelligent management
of traffic [1] [2]. Pursuit-evasion game (PEG), as a realistic
problem for studying the self-learning and autonomous control
of multiple agents, has been extensively studied in many fields,
such as spacecraft control [3] and robot control [4]. Multi-
vehicle pursuit (MVP), as an embodiment of PEG in ITS, has
more conditional constraints, such as complex road structures,
additional traffic participants, and traffic rules constraints. A
patrol guide released by the New York City Police Department,
representatively describes an MVP game, where multiple pol-
icy vehicles cooperate to capture single or multiple suspected
vehicles [5].
This work was supported by the National Natural Science Foundation of
China (Grant No. 62071179) and project A02B01C01-201916D2
Regarding MVP, there have been some works on game
theory-based methods. [6] focused on the multi-player pursuit
game with malicious pursuers and constructed a nonzero-sum
game framework to learn pursuers with different emotional in-
tentions to complete the task. [7] developed a model predictive
control method to address the problem of limited information
on the pursuers, in which each pursuer only focused on
its opponents’ information. [8] adopted the graph-theoretic
method to learn the interaction between the perception-limited
agents and set the Minmax strategy to maintain the safe
operation when the system failed to reach the Nash equilib-
rium. However, it is difficult for these methods to construct
a suitable objective function, and these methods pay little
attention to the cooperation among pursuers in the dynamic
traffic environment, which directly affects the effectiveness of
the pursuit.
Cooperative multi-agent reinforcement learning (CoMARL)
has been widely used in the coordinated control of multi-
agent systems (MASs), such as traffic light control [9], and
network resource allocation [10]. CoMARL aims to maximize
all agents’ expected long-term common accumulative reward
by learning a series of optimal policies or action sets [11].
There is a growing research interest in applying CoMARL to
MVP problem due to the powerful coordination mechanism
and real-time decision-making ability of CoMARL. [12] de-
veloped a probabilistic reward-based reinforcement learning
(RL) method based on multi-agent deep deterministic policy
gradient (MADDPG), where all pursuing agents are trained
by a critic network, to accomplish the pursuit. [13] designed a
target prediction network in the traditional general multi-agent
reinforcement learning framework to more usefully assist the
agents in decision-making. [14] introduced adversarial attack
ticks and adversarial learning based on MADDPG to help
agents learn more robust strategies. [15] added Transformer
based on QMIX and learned historical observations from time
and team, thereby promoting pursuers to learn cooperative
pursuing strategies. [16] developed a CoMARL framework
combining collaborative exploration and attention-QMIX to
coordinately complete tasks, and the collaborative effective-
ness of the CoMARL framework had been verified on a
predator-prey scenario. However, these CoMARL methods on
MVP are performed on the open or grid environment, and the
complex traffic environments and traffic rules constraints will
arXiv:2210.13470v1 [cs.LG] 24 Oct 2022
Fig. 1. The architecture of GQRL-IESE. (a) Complex urban traffic scene for HCMVP. (b) Information-enhanced state encoder (IESE). (c) DQN-based pursuing
decision-making for multiple pursuing vehicles. (d) Coordinated Q optimizing network. (b) encodes the state observed in (a) and inputs the encoded state into
(c) for decision-making. The decisions of the pursuing vehicles generated by (c) are not directly executed, while fed into (d) for evaluation, considering the
current environment traffic information. The Q-matrix is optimized and adjusted according to the evaluation results of decision-making to obtain the current
optimal action set.
bring them new challenges.
In this paper, we propose a graded-Q reinforcement learning
with information-enhanced state encoder framework (GQRL-
IESE) for hierarchical collaborative multi-vehicle pursuit
(HCMVP) under the complex urban environment. The ar-
chitecture of the proposed GQRL-IESE is shown in Fig. 1.
Compared with traditional RL, the proposed Graded-Q RL
framework enhances the cooperative decision-making ability
of agents in MASs. In GQRL-IESE, an information-enhanced
state encoder (IESE) is designed and implemented to encode
complex states and extract effective information. Moreover,
equipped with a cooperative graded-Q scheme, the GQRL-
IESE coordinates the decisions of each pursuing vehicle to
enable them to complete tasks cooperatively and efficiently.
Furthermore, the main contributions of this paper are as
follows:
This paper proposes a graded-Q reinforcement learning
with information-enhanced state encoder framework to
address the HCMVP problem under the complex urban
traffic environment.
This paper designs an information-enhanced state encoder
to extract crucial information from multi-dimension states
of various pursuing participants, thus boosting the DQN-
based decision-making of pursuing vehicles.
This paper proposes a cooperative graded-Q scheme to
facilitate cooperation among pursuing vehicles, which in-
troduces a coordinated Q optimizing network considering
the current environment traffic information to promote the
multi-agent pursuing policy.
The rest of this paper is organized as follows. Section II
presents an HCMVP problem and a detailed statement of the
proposed GQRL-IESE for HCMVP. Section III shows the
structure of information-enhanced state encoder. The details
of the proposed cooperative graded-Q scheme are given in
Section IV. Section V conducts experiments to verify the
performance of GQRL-IESE, and Section VI concludes this
paper.
II. ANINFORMATION-ENHANCED COOPERATIVE
REINFORCEMENT LEARNING FRAMEWORK FOR HCMVP
A. HCMVP Problem Statement Under Complex Urban Traffic
Environment
This paper focuses on the HCMVP problem under the
complex urban traffic environment. Different from the tradi-
tional MVP problem, where the pursuing vehicle only makes
pursuing decisions according to its own information, the
HCMVP problem focuses on the hierarchical optimization of
cooperation and decision-making among pursuing vehicles.
In the HCMVP problem, each pursuing vehicle can obtain
global position information of other pursuing vehicles and
evading vehicles through vehicle-to-vehicle (V2V) or vehicle-
to-infrastructure (V2I). The goal of the HCMVP problem
摘要:

Graded-QReinforcementLearningwithInformation-EnhancedStateEncoderforHierarchicalCollaborativeMulti-VehiclePursuitYiyingYang,XinhangLi,ZhengYuan,QinwenWang,ChenXu,andLinZhangSchoolofArticialIntelligence,BeijingUniversityofPostsandTelecommunications,Beijing,Chinafyyying,lixinhang,yuanzheng,wangqinwen...

展开>> 收起<<
Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.54MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注