ADL IGHT A U NIVERSAL APPROACH OF TRAFFIC SIGNAL CONTROL WITH AUGMENTED DATA USING REINFORCEMENT LEARNING

2025-05-06 0 0 1.65MB 14 页 10玖币
侵权投诉
ADLIGHT: A UNIVERSAL APPROACH OF TRAFFIC SIGNAL
CONTROL WITH AUGMENTED DATA USING REINFORCEMENT
LEARNING
Maonan Wang
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen
Shanghai AI Laboratory, Shanghai, China
Email: maonanwang@link.cuhk.edu.cn
Yutong Xu
School of Data Science, The Chinese University of Hong Kong, Shenzhen
Email: yutongxu@link.cuhk.edu.cn
Xi Xiong
Shenzhen Research Institute of Big Data
Email: xiongxi@cuhk.edu.cn
Yuheng Kan
SenseTime Group Limited, Shanghai, China
Shanghai AI Laboratory, Shanghai, China
Email: kanyuheng@sensetime.com
Chengcheng Xu
SenseTime Group Limited, Shanghai, China
Email: xuchengcheng@sensetime.com
Man-On Pun
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen
Shenzhen Research Institute of Big Data
Email: simonpun@cuhk.edu.cn (Corresponding Author)
1 Abstract
Traffic signal control has the potential to reduce congestion in dynamic networks. Recent studies show that traffic
signal control with reinforcement learning (RL) methods can significantly reduce the average waiting time. However, a
shortcoming of existing methods is that they require model retraining for new intersections with different structures. In
this paper, we propose a novel reinforcement learning approach with augmented data (ADLight) to train a universal
model for intersections with different structures. We propose a new agent design incorporating features on movements
and actions with set current phase duration to allow the generalized model to have the same structure for different
intersections. A new data augmentation method named movement shuffle is developed to improve the generalization
performance. We also test the universal model with new intersections in Simulation of Urban MObility (SUMO). The
results show that the performance of our approach is close to the models trained in a single environment directly (only a
5%
loss of average waiting time), and we can reduce more than
80%
of training time, which saves a lot of computational
resources in scalable operations of traffic lights.
arXiv:2210.13378v2 [eess.SY] 18 Mar 2023
Wang, Xu, Xiong, Kan, Xu, and Pun
Keywords: traffic signal control, model generalization, reinforcement learning, data augmentation
2 Introduction
Traffic congestion has been one of the major challenges in many metropolitan areas. Congestion can not only increase
travel time in networks, but also result in environmental problems, e.g., fuel waste and pollutant emissions [
1
]. One
way to address this issue is to control traffic lights dynamically. Nowadays, most traffic lights are controlled with
pre-defined fixed-time plans [
2
]. Conventional traffic signal control methods, such as the Webster model [
3
] and
Self-Organizing Traffic Light Control (SOTL) [
4
], are also used in this domain. However, these methods rely heavily
on expert knowledge and certain assumptions about the traffic model, e.g., uniform traffic flow distribution [
5
], which is
not realistic in real applications [6].
In order to recognize traffic patterns from real traffic data, some reinforcement learning methods are proposed to solve
this problem [
7
,
8
,
9
,
10
,
11
,
12
,
13
,
14
,
15
,
16
,
6
,
17
,
18
,
19
,
20
,
21
,
22
]. [
10
] proposed a deep Q-learning algorithm
with the phase gate to control signals on a large-scale network. [
20
] used the Advantage Actor Critic (A2C) method
to train the agent and leverage demonstrations collected from classic methods to accelerate the learning process. By
directly interacting with the environment, an RL agent can learn effectively how to adapt to changes in traffic conditions
with real experience.
In spite of the significant improvement with RL methods in this domain, the major limitation is that the model needs to
be redesigned and trained from scratch when faced with intersections of different structures, e.g., different approaching
roads, different lanes, and different phases. It will take much time if we choose to learn the optimal policy for each
intersection in large-scale networks. There exist some works to design a universal model for different junctions
[
13
,
15
,
22
]; however, all these methods utilize the choose next phase as the action design, which would lead to safety
concerns since this method breaks the traditional traffic cycle by randomly choosing phases.
To address the problem above, we present the ADLight, a novel approach that can be used to train a universal
reinforcement learning model with augmented data for the traffic signal control problem. To address the intersections
with the different structures, we perform feature extraction by movement, instead of lanes. The features of each
movement include not only traffic flows but also the movement itself. To ensure safety and generalization with different
phase structures, set current phase duration is chosen as the action in our ADLight. With this design, the agent
can select actions from a list of pre-defined time periods to establish the duration for the current phase. A new data
augmentation method movement shuffle is developed in the ADLight, and the results show that it can improve the
performance significantly. Experiments are conducted on intersections with different approaching roads, different lanes,
and different phases in Simulation of Urban MObility (SUMO) [
23
]. The experiments show that the resulting model
from ADLight can achieve satisfactory performance in untrained intersections with different structures.
Our contributions can be summarized as below:
We propose an agent design that allows the model to have the same structure at different intersections. This
paper is, to the best of our knowledge, the first effort that utilizes the set current phase duration to create
universal models for the traffic signal control problem.
A new data augmentation method for traffic signal control is proposed in our approach. The performance can
be further improved by incorporating the data augmentation into the agent design.
We demonstrate the generalization performance of our ADLight at
11
intersections with different structures.
The results show that when compared with the model trained on a single environment from scratch (which
takes around
6×106
steps), the loss of the average waiting time in our generalization model can be reduced to
5%
after only
1×106
steps by retraining the universal model. When a network has 1000 intersections, we can
save more than
5×109
interactions with SUMO, thereby greatly reducing computing resources consumption.
The rest of the paper is organized as follows. We first summarize the related work on traffic signal control. The road and
traffic signal terminology is introduced in the following section. This is followed by the Methodology section, which
formally describes the ADLight. We then show the dataset and the experiments in the SUMO platform. Finally, we
provide the conclusions and future directions.
3 Related Work
Conventional Traffic Signal Control.
Several classical methods have been developed to reduce the total delay for all
vehicles [
24
]. For example, the Webster [
3
] method is one of the most widely-used methods for the single intersection
in this field. It determines the optimum cycle length and phase split for a single intersection according to traffic volume.
2
Wang, Xu, Xiong, Kan, Xu, and Pun
In this research, the Webster method is taken into comparison as a baseline for RL-based methods. Self-Organizing
Traffic Light Control (SOTL) [
4
] is a fully-actuated control algorithm. It decides whether to keep or change the current
phase based on whether the number of vehicles approaching the green signal is larger than a threshold. The detailed
rules for SOTL can be found in [
24
]. SCATS (Sydney Coordinated Adaptive Traffic System) [
25
] is an intelligent
transportation system that is widely used at more than
50,000
intersections in over
180
cities in
28
countries. It selects
from pre-defined traffic signal plans (i.e., cycle length, phase split and offsets) according to the data derived from
loop detectors or other road traffic sensors. However, these conventional methods are usually based on oversimplified
information, assumptions about the traffic model (i.e., assuming the traffic flow is uniform during a certain period), or
need expert knowledge to design the pre-defined signal plans.
RL-based Traffic Signal Control.
With the success of deep reinforcement learning (DRL) in different areas [
26
,
27
],
more and more research studies are trying to use DRL to solve traffic signal problems [
28
]. A number of works
[
7
,
8
,
9
,
18
,
10
,
11
,
12
,
13
,
14
,
15
,
16
,
6
] use value-based methods while others [
17
,
19
,
20
,
21
,
22
] use policy-based
methods. These works vary in the action designs, including choose next phase [
11
,
12
,
13
,
15
,
6
,
21
,
22
], keep or
change [8, 9, 10] and set current phase duration [7, 18, 19].
A few attempts at training general models for traffic signal control have been made by previous research works. For
instance, the FRAP [
13
] method is proposed to adapt to new scenarios. [
15
] adapt the parameter sharing method
based on FRAP and show strong performance on the scale of thousands of traffic lights. Nevertheless, they all utilize
choose next phase as the action design, and it cannot keep the original phase structure of traffic lights. At each action
decision step, the signal phase can be chosen from all possible phases that are combined from all non-conflicting
movements rather than the original phases, which ignores the pedestrian and are against driving habits. AttendLight
[
22
] incorporates the attention mechanism to train a universal model for the intersections with different structures
and traffic flow distribution. Although it can maintain the phase structure, its action design is also choose next phase,
which may make traffic signals change in a random sequence. This can be an issue of concern in practice as it leads to
an unsafe situation for both drivers and pedestrians [
14
]. In sharp contrast, set current phase duration is used in our
research, which is more reasonable and efficient than choose next phase.
4 Preliminary Definitions
A standard four-way intersection shown in Figure 1a is used as an example to illustrate the terminology. These concepts
can be easily extended to the intersections with different structures.
Traffic movement:
A traffic movement is a connection between an incoming lane
lin
to an outgoing lane
lout
,
denoted as
lin lout
. For the common 4-way intersection in Figure 1a, there are a total of
12
movements,
including straight, left and right turns in four directions.
Movement signal:
A movement signal is defined on the traffic movement. The green signal means the
corresponding movement is allowed and the red signal indicates the movement is prohibited. In Figure 1a,
the movement signal
3
is green, indicating that the vehicle can travel from east to west at this time. Although
there are
12
movements in a 4-way intersection, as the right-turn traffic can pass regardless of the signal, only
eight movement signals are used. Figure 1b shows the eight movements signals and the incoming lanes and
outgoing lanes associated with each signal. For example,
m3
indicates that the vehicles can go from
lin
5
to
lout
10 ,lout
11 and lout
12 respectively.
Phase:
A phase is a combination of movement signals. Figure 1c shows the four phases of the 4-way
intersection. Each phase involves a set of movement signals. For example, phase-1 involves
m1={lin
2
lout
7, lin
2lout
8, lin
2lout
9}
and
m5={lin
8lout
1, lin
8lout
2, lin
8lout
3}
. It should be noted that the
number of movement signals contained in different phases may vary.
Signal plan:
A signal plan for an intersection is a sequence of phases and their corresponding durations.
Here we denote a signal plan as
{(p1, t1),(p2, t2),· · · ,(pi, ti),· · · }
, where
pi
and
ti
represent a phase and
the duration of this phase, respectively. Usually, the phase sequence is in a cyclic order; that is, the signal plan
can be denoted as
{(p1, t1),(p2, t2),· · · ,(pN, tN),(p1, t1),(p2, t2),· · · ,(pN, tN),· · · }
. Figure 1c shows a
cycle-based signal plan and the duration of each phase is
50
,
20
,
50
and
20
as an example. For the signal plan
optimization task, the common methods mainly adjust each phase
pi
duration
ti
to achieve the purpose of
reducing the total waiting time of vehicles at the intersection.
3
摘要:

ADLIGHT:AUNIVERSALAPPROACHOFTRAFFICSIGNALCONTROLWITHAUGMENTEDDATAUSINGREINFORCEMENTLEARNINGMaonanWangSchoolofScienceandEngineering,TheChineseUniversityofHongKong,ShenzhenShanghaiAILaboratory,Shanghai,ChinaEmail:maonanwang@link.cuhk.edu.cnYutongXuSchoolofDataScience,TheChineseUniversityofHongKong,She...

展开>> 收起<<
ADL IGHT A U NIVERSAL APPROACH OF TRAFFIC SIGNAL CONTROL WITH AUGMENTED DATA USING REINFORCEMENT LEARNING.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.65MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注