DynamicLight Two-Stage Dynamic Traffic Signal Timing

2025-08-18 1 0 5.63MB 17 页 10玖币

侵权投诉

DynamicLight: Two-Stage Dynamic Trafﬁc Signal Timing

Liang Zhang 1Yutong Zhang 2Shubin Xie 1Jianming Deng 1Chen Li 3

Abstract

Reinforcement learning (RL) is gaining popularity

as an effective approach for trafﬁc signal control

(TSC) and is increasingly applied in this domain.

However, most existing RL methodologies are

conﬁned to a single-stage TSC framework, pri-

marily focusing on selecting an appropriate trafﬁc

signal phase at ﬁxed action intervals, leading to

inﬂexible and less adaptable phase durations. To

address such limitations, we introduce a novel

two-stage TSC framework named DynamicLight.

This framework initiates with a phase control strat-

egy responsible for determining the optimal traf-

ﬁc phase, followed by a duration control strategy

tasked with determining the corresponding phase

duration. Experimental results show that Dynami-

cLight outperforms state-of-the-art TSC models

and exhibits exceptional model generalization ca-

pabilities. Additionally, the robustness and po-

tential for real-world implementation of Dynam-

icLight are further demonstrated and validated

through various DynamicLight variants. The

code is released at

https://github.com/

LiangZhang1996/DynamicLight.

1. Introduction

Signalized intersections dominate as the primary type of

road junctions in urban landscapes, where trafﬁc signal con-

trol (TSC) plays a pivotal role in ensuring effective trafﬁc

management. Established methods, exempliﬁed by Fixed-

Time (Koonce & Rodegerdts,2008), GreenWave (T¨

or¨

ok &

Kert

esz,1996), SCATS (Lowrie,1990), and SCOOT (Hunt

et al.,1982), have undergone widespread implementation in

urban environments, signiﬁcantly contributing to the mitiga-

tion of trafﬁc congestion.

Equal contribution

State Key Laboratory of Herbage Im-

provement and Grassland Agro-ecosystems, College of Ecology,

Lanzhou University, Lanzhou 730000, China

School of Artiﬁcial

Intelligence, Beijing University of Posts and Telecommunications

Graduate School of Informatics, Nagoya University, Chikusa,

Nagoya 464-8601, Japan. Correspondence to: Jianming Deng

<dengjm@lzu.edu.cn>.

Preprint.

With the rapid advancement of artiﬁcial intelligence and the

growing abundance of available trafﬁc data, such as surveil-

lance camera feeds in recent years, the pattern of TSC has

undergone substantial evolution. Among these changes,

reinforcement learning (RL) techniques have driven signiﬁ-

cant progress within TSC. For example, CoLight (Wei et al.,

2019b) demonstrates exceptional performance and scala-

bility in large-scale TSC, while AttendLight (Oroojlooy

et al.,2020), another innovative model, exhibits versatility

in handling various intersection topologies. These pioneer-

ing developments underscore the transformative potential

of emerging technologies in reshaping the future landscape

of trafﬁc management at signalized intersections.

Generally, RL-based methodologies signiﬁcantly enhance

TSC performance through three primary approaches. First,

some methods contribute to the ﬁeld by ingeniously design-

ing effective state representations or reward functions, as

exempliﬁed by PressLight (Wei et al.,2019a) and Advanced-

XLight (Zhang et al.,2022). These advancements aim to

optimize decision-making processes within TSC, ensuring

a more nuanced and responsive system. Second, developing

advanced neural networks, as observed in FRAP (Zheng

et al.,2019a) and CoLight (Wei et al.,2019b), signiﬁcantly

enhances transportation efﬁciency. These developments

pave the way for more streamlined and adaptive control sys-

tems capable of responding dynamically to the complexities

of urban trafﬁc. Third, integrates advanced RL techniques,

such as HiLight (Xu et al.,2021) and MetaLight (Zang et al.,

2020). These approaches explore new horizons in learning

and adaptation, pushing the limits of optimizing trafﬁc ﬂow

and alleviating congestion.

Despite these remarkable advancements, prior studies still

grapple with the challenge of insufﬁciently addressing inher-

ent limitations within the existing TSC control framework.

Contemporary advanced RL strategies for TSC predomi-

nantly employ a single-stage control framework. For each

ﬁxed action duration, an appropriate signal phase is deter-

mined, with the choice between maintaining the current

phase or switching to a more suitable one. This mechanism

mirrors control systems found in human-interactive games,

such as Atari (Mnih et al.,2013). However, such a single-

stage control framework exhibits two primary limitations.

First, the duration of each phase is signiﬁcantly inﬂuenced

by the ﬁxed action duration (Zhang et al.,2022;2023), lack-

arXiv:2211.01025v2 [cs.LG] 2 May 2024

DynamicLight: Two-Stage Dynamic Trafﬁc Signal Timing

ing sufﬁcient ﬂexibility and variability in phase durations.

Second, the duration of each phase cannot be ascertained

until another phase is actuated. Therefore, there exists a

critical need to develop models capable of supporting dy-

namic phase durations. Recognizing and overcoming these

challenges is essential for unlocking the full potential of

RL-based methodologies in revolutionizing TSC.

This study introduces a novel two-stage framework named

DynamicLight to enhance the single-stage framework and

achieve dynamic phase duration. The improvement involves

integrating a duration control strategy that actively deter-

mines the phase duration, rather than allowing passive varia-

tion. Within such a new structure, one policy is dedicated to

controlling the trafﬁc phase, while another is responsible for

determining the corresponding duration. This sophisticated

two-stage approach promises to introduce a higher degree

of adaptability and responsiveness to the dynamic nature of

trafﬁc conditions, marking a signiﬁcant advancement in the

realm of intelligent trafﬁc management systems. The main

contributions are organized as follows:

•

Two-stage dynamic TSC framework: Introducing Dy-

namicLight, an efﬁcient two-stage control framework em-

ploying a dual-policy mechanism. This framework seam-

lessly integrates phase selection and duration determina-

tion, allowing for dynamic phase durations in TSC.

•

Robust scalability of DynamicLight: Various Dynami-

cLight variants are created by replacing the phase control

strategy with an alternative one. These variants validate

the effectiveness and robustness of our framework, high-

lighting its practical applicability.

•

Superior performance beyond state-of-the-art (SOTA)

models: Experimental results show that DynamicLight

surpassed SOTA TSC models, establishing a new bench-

mark for advanced trafﬁc control systems.

2. Related Work

2.1. Traditional Methods

In the realm of real-world TSC, commonly applied tradi-

tional methods exhibit a signiﬁcant dependence on either

manually crafted signal plans or rule-based systems.

FixedTime (Koonce & Rodegerdts,2008) is a trafﬁc sig-

nal timing strategy that effectively regulates trafﬁc signal

operations by relying on predetermined values for cycle

length, phase sequence, and phase split. GreenWave (T

& Kert

esz,1996) is designed to analyze applicable condi-

tions of the Green-Wave trafﬁc theory, employing a two-

phase signal control concept for optimization. This strategy

allows vehicles to pass through multiple intersections con-

secutively on green lights, optimizing trafﬁc ﬂow on main

roads. Actuated control (Cools et al.,2013) introduced a

self-organizing mechanism that dynamically responds to

varying trafﬁc conditions. This innovation has improved

trafﬁc ﬂow by enabling trafﬁc signals to autonomously adapt

based on pre-deﬁned rules and real-time trafﬁc data. Adap-

tive control systems, such as SCATS (Lowrie,1990) and

SCOOT (Hunt et al.,1982), employed a decision-making

process to select optimal trafﬁc plans based on real-time

data obtained from loop sensors. Widely embraced in large

urban settings, such adaptive control systems signiﬁcantly

enhance trafﬁc ﬂow and responsiveness by dynamically ad-

justing to the prevailing trafﬁc conditions.

Recently, traditional optimization-based methodologies,

such as Max Pressure (Varaiya,2013) and MaxQueue-

Length (Zhang et al.,2023), employed max-pressure and

max queue-length strategies to optimize TSC. These ap-

proaches have demonstrated signiﬁcant efﬁcacy in tackling

complex congestion challenges at urban intersections, lead-

ing to a substantial enhancement in the overall efﬁciency of

trafﬁc management systems.

2.2. RL-based Methods

Several RL-based methodologies enhanced TSC perfor-

mance by designing effective state representations of reward

functions. LIT (Zheng et al.,2019b) made signiﬁcant strides

in optimizing TSC by introducing a streamlined approach

to state and reward design. This innovative methodology

proved to be highly effective, surpassing the performance

of IntelliLight (Wei et al.,2018). PressLight (Wei et al.,

2019a) advanced the capabilities of LIT and IntelliLight

through the seamless integration of “pressure” into both

the state and reward design. This integration signiﬁcantly

contributed to enhancing the overall TSC strategy, demon-

strating its effectiveness in coordinating signals on arterial

roadway networks. MPLight (Chen et al.,2020) enhanced

FRAP (Zheng et al.,2019a) by incorporating “pressure” in

the state representation and reward function design. Atten-

tionLight (Zhang et al.,2023) employed queue length for

both state representation and reward function, signiﬁcantly

surpassing FRAP. Advanced-XLight (Zhang et al.,2022) in-

troduced effective running vehicle number and trafﬁc move-

ment pressure as the state representations, demonstrating

SOTA performance.

Furthermore, some RL-based methods have signiﬁcantly

enhanced TSC performance by developing sophisticated net-

work structures. FRAP (Zheng et al.,2019a) demonstrated

exceptional skill in crafting phase features and adeptly cap-

turing intricate relationships arising from phase competition

in TSC. CoLight (Wei et al.,2019b) harnessed the capabili-

ties of a graph attention network (Velickovic et al.,2017),

speciﬁcally tailored to facilitate seamless cooperation at

intersections, showcasing improved TSC efﬁcacy. Attend-

DynamicLight: Two-Stage Dynamic Trafﬁc Signal Timing

Light (Oroojlooy et al.,2020) utilized an attention network

to adeptly manage diverse intersection topologies.

Some other RL-based methodologies adopted advanced

RL techniques to enhance model performance. Demo-

Light (Xiong et al.,2019) utilized imitation learning (Ho

& Ermon,2016) to accelerate learning. HiLight (Xu et al.,

2021) enabled each strategy to learn a high-level policy, opti-

mizing the objective locally using hierarchical RL (Kulkarni

et al.,2016). MetaLight (Zang et al.,2020) utilized meta-

learning (Finn et al.,2017) to efﬁciently and robustly adapt

to changing trafﬁc scenarios.

All the aforementioned methods utilized a single-stage con-

trol framework. However, the duration of a phase is solely

inﬂuenced by the action duration, leading to a lack of ad-

equate variability. Moreover, the single-stage framework

lacks the capability to pre-determine the duration of each

stage before initiating the next one. In this study, we in-

troduce DynamicLight, a two-stage framework designed

to improve upon the single-stage framework and enable

dynamic phase durations.

3. Preliminaries

Figure 1.

Illustration of a standard intersection structure with four

entry and four exit approaches (East, West, South, and North), each

featuring three types of lanes (left, straight, and right). Subﬁgures

depict (b) trafﬁc movement signals, (c) signal phases, and (d) state

representations for a comprehensive overview.

Trafﬁc network. A typical representation of a trafﬁc net-

work involves a directed graph, with nodes corresponding

to intersections and roads corresponding to edges. Figure 1

(a) illustrates a standard intersection structure within this

graph. Each road is composed of three types of lanes (i.e.,

turning left, going straight, and turning right), acting as the

fundamental units facilitating vehicle movement and deter-

mining the trajectory of each vehicle passing through the

intersection. An incoming lane serves as the entry point for

vehicles approaching the intersection, orchestrating the ini-

tial ﬂow of trafﬁc. An outgoing lane provides a designated

area for vehicles to seamlessly exit an intersection, thereby

enhancing the overall efﬁciency of the trafﬁc network.

Trafﬁc movements and phases. A trafﬁc movement

refers to vehicles traveling at an intersection in a speciﬁc di-

rection. In certain countries, vehicles making a right turn are

allowed to proceed regardless of the signal but must come

to a stop at a red light, as indicated by the black signals in

Figure 1(b). Additionally, each intersection has its own

phase settings. A signal phase comprises a set of permitted

trafﬁc movements. As illustrated in Figure 1(c), each of

the four signal phases controls two trafﬁc movements that

do not conﬂict with each other. Once a phase is activated,

its duration is the period during which it remains active. To

comprehensively reﬂect the trafﬁc environment, the state

representations are lane-based, as depicted in Figure 1(d).

Problem statement. In a multi-intersection TSC system,

each intersection is managed by an RL agent. An agent

observes the environment and takes actions involving phase

and duration, which lead to receiving a reward. The ob-

jective function for all agents is to learn an optimal policy

that maximizes their cumulative rewards. For ease of de-

ployment, certain agents are designed to handle various

intersection topologies, ensuring adaptability to different

conﬁgurations and enhancing the overall versatility of the

implemented system.

4. DynamicLight

DynamicLight, a two-stage TSC framework, utilizes one

deep Q-network for both the phase and duration control to

dynamically adjust phase durations. Speciﬁcally, the phase

control is responsible for determining the optimal trafﬁc

phase, while the duration control is tasked with determining

the duration of the selected phase. Figure 2shows the

overview architecture of DynamicLight.

4.1. DynamicLight Agent

State. Consider a TSC system with

intersections

{I1

· · ·

IN}

. Here,

Lin

and

Lout

represent the sets of

incoming and outgoing lanes, respectively, for a speciﬁc

intersection. Seven state representations are utilized to de-

scribe the environment. Formally, let

Sl= [sl,1,· · · , sl,7]

denote the set of all state descriptors, where

sl,i

l∈ Lin

represents the

-th state representation on lane

. These

state representations include the current phase (

sl,1

), queue

length (

sl,2

), effective running vehicle number (

sl,3

), and

the number of vehicles under the segmented road (four seg-

ments of 100 meters each, i.e., sl,4to sl,7).

DynamicLight: Two-Stage Dynamic Trafﬁc Signal Timing

Figure 2.

Overview architecture of DynamicLight. (a) The TSC environment facilitates DynamicLight by providing state representations

, executing received actions

⟨ap, ad⟩

, and generating new states

S′

and rewards

. It serves as the essential interface for interaction,

enabling the seamless ﬂow of information and feedback between the agent and its environment. These transition tuples

⟨S, ap, ad, r, S′⟩

an intersection are collected as the replay memory. (b) Feature fusion involves acquiring states from the environment and embedding them

into lane features. Subsequently, the lane features undergo phase feature fusion through a multi-head self-attention (MHA) mechanism.

(c) Phase control utilizes phase features as inputs and employs a deep network to approximate the Q-values. (d) Duration control selects

the phase feature corresponding to the predicted phase action in (c) and embeds it to predict the Q-values. The phase action and duration

action are determined using argmax operation. Note that the networks in (b) and (c) are updated with mini-batches

⟨S, ap, r, S′⟩

from the

replay memory. Similarly, the networks in (b) and (d) are updated with mini-batches ⟨S, ad, r, S′⟩.

Action. Deﬁne the phase and duration action spaces as

Ap={ap

1, ap

2,· · · , ap

and

Ad={ad

1, ad

2,· · · , ad

, re-

spectively. Each element in

corresponds to a speciﬁc

signal phase type (e.g., Type A, B, C, or D), while each ele-

ment in

represents the duration time of a phase. In this

study, we extensively explored the duration action space and

ultimately selected

Ad={10,15,20,25,30,35,40}

sec-

onds in Appendix A.1. At an intersection, the agent selects

a phase action

as its initial phase and subsequently main-

tains it for the duration of

. These two actions control

the signal phase of the intersection, and the agent receives

a reward based on its decisions. Through

interactions,

each agent learns and reﬁnes its control policies over time.

Reward. Both the phase and duration controls utilize

negative intersection queue length as their rewards, with

the reward for controlling an intersection denoted as

−Psl,2

. Intuitively, DynamicLight seeks to minimize the

average travel time by maximizing the reward.

4.2. Deep Q-Network Design

Feature fusion. The features of each state descriptor

si,l

are initially embedded and concatenated to a lane feature:

Fl= Embed(Embed(sl,1)⊕ · · · ⊕ Embed(sl,7)),(1)

where

⊕

denotes the concatenation operation. Various

feature fusion methods, including addition (Zheng et al.,

2019a), embedding with a multi-layer perceptron (MLP),

and multi-head self-attention (MHA) (Vaswani et al.,2017),

were explored in Appendix A.2. Finally, MHA was chosen

due to its superior performance. Since each phase comprises

two lanes (

Fl1

and

Fl2

as illustrated in Figure 2(c)), the

averaged feature fusion for phase pcan be calculated by

Fp= Mean (MHA(Fl1⊕Fl2)) .(2)

Note that the fused phase feature

serves as the input for

Q-value prediction in both phase and duration controls.

Q-value prediction. All the fused phase features are mod-

eled with MHA to capture their correlations, and the cor-

related features are embedded to generate Q-values for the

phase control. Subsequently, the phase action with the max-

imum Q-value is selected. In practice, the phase control

needs to complete its task before the duration control, as

selecting an appropriate duration depends on the determined

phase.

Next, the fused phase feature

and the pre-determined

phase action serve as inputs to the duration control. The four

features are concatenated, and the result is multiplied by the

representation of the phase action to extract the correspond-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DynamicLight:Two-StageDynamicTrafficSignalTimingLiangZhang1YutongZhang2ShubinXie1JianmingDeng1ChenLi3AbstractReinforcementlearning(RL)isgainingpopularityasaneffectiveapproachfortrafficsignalcontrol(TSC)andisincreasinglyappliedinthisdomain.However,mostexistingRLmethodologiesareconfinedtoasingle-stage...

展开>> 收起<<

DynamicLight Two-Stage Dynamic Traffic Signal Timing.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DynamicLight Two-Stage Dynamic Traffic Signal Timing

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: