ADL IGHT A U NIVERSAL APPROACH OF TRAFFIC SIGNAL CONTROL WITH AUGMENTED DATA USING REINFORCEMENT LEARNING

2025-05-06 0 0 1.65MB 14 页 10玖币

侵权投诉

ADLIGHT: A UNIVERSAL APPROACH OF TRAFFIC SIGNAL

CONTROL WITH AUGMENTED DATA USING REINFORCEMENT

LEARNING

Maonan Wang

School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen

Shanghai AI Laboratory, Shanghai, China

Email: maonanwang@link.cuhk.edu.cn

Yutong Xu

School of Data Science, The Chinese University of Hong Kong, Shenzhen

Email: yutongxu@link.cuhk.edu.cn

Xi Xiong

Shenzhen Research Institute of Big Data

Email: xiongxi@cuhk.edu.cn

Yuheng Kan

SenseTime Group Limited, Shanghai, China

Shanghai AI Laboratory, Shanghai, China

Email: kanyuheng@sensetime.com

Chengcheng Xu

SenseTime Group Limited, Shanghai, China

Email: xuchengcheng@sensetime.com

Man-On Pun

School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen

Shenzhen Research Institute of Big Data

Email: simonpun@cuhk.edu.cn (Corresponding Author)

1 Abstract

Trafﬁc signal control has the potential to reduce congestion in dynamic networks. Recent studies show that trafﬁc

signal control with reinforcement learning (RL) methods can signiﬁcantly reduce the average waiting time. However, a

shortcoming of existing methods is that they require model retraining for new intersections with different structures. In

this paper, we propose a novel reinforcement learning approach with augmented data (ADLight) to train a universal

model for intersections with different structures. We propose a new agent design incorporating features on movements

and actions with set current phase duration to allow the generalized model to have the same structure for different

intersections. A new data augmentation method named movement shufﬂe is developed to improve the generalization

performance. We also test the universal model with new intersections in Simulation of Urban MObility (SUMO). The

results show that the performance of our approach is close to the models trained in a single environment directly (only a

loss of average waiting time), and we can reduce more than

80%

of training time, which saves a lot of computational

resources in scalable operations of trafﬁc lights.

arXiv:2210.13378v2 [eess.SY] 18 Mar 2023

Wang, Xu, Xiong, Kan, Xu, and Pun

Keywords: trafﬁc signal control, model generalization, reinforcement learning, data augmentation

2 Introduction

Trafﬁc congestion has been one of the major challenges in many metropolitan areas. Congestion can not only increase

travel time in networks, but also result in environmental problems, e.g., fuel waste and pollutant emissions [

]. One

way to address this issue is to control trafﬁc lights dynamically. Nowadays, most trafﬁc lights are controlled with

pre-deﬁned ﬁxed-time plans [

]. Conventional trafﬁc signal control methods, such as the Webster model [

] and

Self-Organizing Trafﬁc Light Control (SOTL) [

], are also used in this domain. However, these methods rely heavily

on expert knowledge and certain assumptions about the trafﬁc model, e.g., uniform trafﬁc ﬂow distribution [

], which is

not realistic in real applications [6].

In order to recognize trafﬁc patterns from real trafﬁc data, some reinforcement learning methods are proposed to solve

this problem [

]. [

] proposed a deep Q-learning algorithm

with the phase gate to control signals on a large-scale network. [

] used the Advantage Actor Critic (A2C) method

to train the agent and leverage demonstrations collected from classic methods to accelerate the learning process. By

directly interacting with the environment, an RL agent can learn effectively how to adapt to changes in trafﬁc conditions

with real experience.

In spite of the signiﬁcant improvement with RL methods in this domain, the major limitation is that the model needs to

be redesigned and trained from scratch when faced with intersections of different structures, e.g., different approaching

roads, different lanes, and different phases. It will take much time if we choose to learn the optimal policy for each

intersection in large-scale networks. There exist some works to design a universal model for different junctions

[

]; however, all these methods utilize the choose next phase as the action design, which would lead to safety

concerns since this method breaks the traditional trafﬁc cycle by randomly choosing phases.

To address the problem above, we present the ADLight, a novel approach that can be used to train a universal

reinforcement learning model with augmented data for the trafﬁc signal control problem. To address the intersections

with the different structures, we perform feature extraction by movement, instead of lanes. The features of each

movement include not only trafﬁc ﬂows but also the movement itself. To ensure safety and generalization with different

phase structures, set current phase duration is chosen as the action in our ADLight. With this design, the agent

can select actions from a list of pre-deﬁned time periods to establish the duration for the current phase. A new data

augmentation method movement shufﬂe is developed in the ADLight, and the results show that it can improve the

performance signiﬁcantly. Experiments are conducted on intersections with different approaching roads, different lanes,

and different phases in Simulation of Urban MObility (SUMO) [

]. The experiments show that the resulting model

from ADLight can achieve satisfactory performance in untrained intersections with different structures.

Our contributions can be summarized as below:

•

We propose an agent design that allows the model to have the same structure at different intersections. This

paper is, to the best of our knowledge, the ﬁrst effort that utilizes the set current phase duration to create

universal models for the trafﬁc signal control problem.

•

A new data augmentation method for trafﬁc signal control is proposed in our approach. The performance can

be further improved by incorporating the data augmentation into the agent design.

•

We demonstrate the generalization performance of our ADLight at

intersections with different structures.

The results show that when compared with the model trained on a single environment from scratch (which

takes around

6×106

steps), the loss of the average waiting time in our generalization model can be reduced to

after only

1×106

steps by retraining the universal model. When a network has 1000 intersections, we can

save more than

5×109

interactions with SUMO, thereby greatly reducing computing resources consumption.

The rest of the paper is organized as follows. We ﬁrst summarize the related work on trafﬁc signal control. The road and

trafﬁc signal terminology is introduced in the following section. This is followed by the Methodology section, which

formally describes the ADLight. We then show the dataset and the experiments in the SUMO platform. Finally, we

provide the conclusions and future directions.

3 Related Work

Conventional Trafﬁc Signal Control.

Several classical methods have been developed to reduce the total delay for all

vehicles [

]. For example, the Webster [

] method is one of the most widely-used methods for the single intersection

in this ﬁeld. It determines the optimum cycle length and phase split for a single intersection according to trafﬁc volume.

Wang, Xu, Xiong, Kan, Xu, and Pun

In this research, the Webster method is taken into comparison as a baseline for RL-based methods. Self-Organizing

Trafﬁc Light Control (SOTL) [

] is a fully-actuated control algorithm. It decides whether to keep or change the current

phase based on whether the number of vehicles approaching the green signal is larger than a threshold. The detailed

rules for SOTL can be found in [

]. SCATS (Sydney Coordinated Adaptive Trafﬁc System) [

] is an intelligent

transportation system that is widely used at more than

50,000

intersections in over

180

cities in

countries. It selects

from pre-deﬁned trafﬁc signal plans (i.e., cycle length, phase split and offsets) according to the data derived from

loop detectors or other road trafﬁc sensors. However, these conventional methods are usually based on oversimpliﬁed

information, assumptions about the trafﬁc model (i.e., assuming the trafﬁc ﬂow is uniform during a certain period), or

need expert knowledge to design the pre-deﬁned signal plans.

RL-based Trafﬁc Signal Control.

With the success of deep reinforcement learning (DRL) in different areas [

more and more research studies are trying to use DRL to solve trafﬁc signal problems [

]. A number of works

[

] use value-based methods while others [

] use policy-based

methods. These works vary in the action designs, including choose next phase [

], keep or

change [8, 9, 10] and set current phase duration [7, 18, 19].

A few attempts at training general models for trafﬁc signal control have been made by previous research works. For

instance, the FRAP [

] method is proposed to adapt to new scenarios. [

] adapt the parameter sharing method

based on FRAP and show strong performance on the scale of thousands of trafﬁc lights. Nevertheless, they all utilize

choose next phase as the action design, and it cannot keep the original phase structure of trafﬁc lights. At each action

decision step, the signal phase can be chosen from all possible phases that are combined from all non-conﬂicting

movements rather than the original phases, which ignores the pedestrian and are against driving habits. AttendLight

[

] incorporates the attention mechanism to train a universal model for the intersections with different structures

and trafﬁc ﬂow distribution. Although it can maintain the phase structure, its action design is also choose next phase,

which may make trafﬁc signals change in a random sequence. This can be an issue of concern in practice as it leads to

an unsafe situation for both drivers and pedestrians [

]. In sharp contrast, set current phase duration is used in our

research, which is more reasonable and efﬁcient than choose next phase.

4 Preliminary Deﬁnitions

A standard four-way intersection shown in Figure 1a is used as an example to illustrate the terminology. These concepts

can be easily extended to the intersections with different structures.

•Trafﬁc movement:

A trafﬁc movement is a connection between an incoming lane

lin

to an outgoing lane

lout

denoted as

lin →lout

. For the common 4-way intersection in Figure 1a, there are a total of

movements,

including straight, left and right turns in four directions.

•Movement signal:

A movement signal is deﬁned on the trafﬁc movement. The green signal means the

corresponding movement is allowed and the red signal indicates the movement is prohibited. In Figure 1a,

the movement signal

is green, indicating that the vehicle can travel from east to west at this time. Although

there are

movements in a 4-way intersection, as the right-turn trafﬁc can pass regardless of the signal, only

eight movement signals are used. Figure 1b shows the eight movements signals and the incoming lanes and

outgoing lanes associated with each signal. For example,

indicates that the vehicles can go from

lin

lout

10 ,lout

11 and lout

12 respectively.

•Phase:

A phase is a combination of movement signals. Figure 1c shows the four phases of the 4-way

intersection. Each phase involves a set of movement signals. For example, phase-1 involves

m1={lin

2→

lout

7, lin

2→lout

8, lin

2→lout

and

m5={lin

8→lout

1, lin

8→lout

2, lin

8→lout

. It should be noted that the

number of movement signals contained in different phases may vary.

•Signal plan:

A signal plan for an intersection is a sequence of phases and their corresponding durations.

Here we denote a signal plan as

{(p1, t1),(p2, t2),· · · ,(pi, ti),· · · }

, where

and

represent a phase and

the duration of this phase, respectively. Usually, the phase sequence is in a cyclic order; that is, the signal plan

can be denoted as

{(p1, t1),(p2, t2),· · · ,(pN, tN),(p1, t1),(p2, t2),· · · ,(pN, tN),· · · }

. Figure 1c shows a

cycle-based signal plan and the duration of each phase is

and

as an example. For the signal plan

optimization task, the common methods mainly adjust each phase

duration

to achieve the purpose of

reducing the total waiting time of vehicles at the intersection.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ADLIGHT:AUNIVERSALAPPROACHOFTRAFFICSIGNALCONTROLWITHAUGMENTEDDATAUSINGREINFORCEMENTLEARNINGMaonanWangSchoolofScienceandEngineering,TheChineseUniversityofHongKong,ShenzhenShanghaiAILaboratory,Shanghai,ChinaEmail:maonanwang@link.cuhk.edu.cnYutongXuSchoolofDataScience,TheChineseUniversityofHongKong,She...

展开>> 收起<<

ADL IGHT A U NIVERSAL APPROACH OF TRAFFIC SIGNAL CONTROL WITH AUGMENTED DATA USING REINFORCEMENT LEARNING.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ADL IGHT A U NIVERSAL APPROACH OF TRAFFIC SIGNAL CONTROL WITH AUGMENTED DATA USING REINFORCEMENT LEARNING

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: