DynamicLight Two-Stage Dynamic Traffic Signal Timing

2025-08-18 0 0 5.63MB 17 页 10玖币
侵权投诉
DynamicLight: Two-Stage Dynamic Traffic Signal Timing
Liang Zhang 1Yutong Zhang 2Shubin Xie 1Jianming Deng 1Chen Li 3
Abstract
Reinforcement learning (RL) is gaining popularity
as an effective approach for traffic signal control
(TSC) and is increasingly applied in this domain.
However, most existing RL methodologies are
confined to a single-stage TSC framework, pri-
marily focusing on selecting an appropriate traffic
signal phase at fixed action intervals, leading to
inflexible and less adaptable phase durations. To
address such limitations, we introduce a novel
two-stage TSC framework named DynamicLight.
This framework initiates with a phase control strat-
egy responsible for determining the optimal traf-
fic phase, followed by a duration control strategy
tasked with determining the corresponding phase
duration. Experimental results show that Dynami-
cLight outperforms state-of-the-art TSC models
and exhibits exceptional model generalization ca-
pabilities. Additionally, the robustness and po-
tential for real-world implementation of Dynam-
icLight are further demonstrated and validated
through various DynamicLight variants. The
code is released at
https://github.com/
LiangZhang1996/DynamicLight.
1. Introduction
Signalized intersections dominate as the primary type of
road junctions in urban landscapes, where traffic signal con-
trol (TSC) plays a pivotal role in ensuring effective traffic
management. Established methods, exemplified by Fixed-
Time (Koonce & Rodegerdts,2008), GreenWave (T¨
or¨
ok &
Kert
´
esz,1996), SCATS (Lowrie,1990), and SCOOT (Hunt
et al.,1982), have undergone widespread implementation in
urban environments, significantly contributing to the mitiga-
tion of traffic congestion.
*
Equal contribution
1
State Key Laboratory of Herbage Im-
provement and Grassland Agro-ecosystems, College of Ecology,
Lanzhou University, Lanzhou 730000, China
2
School of Artificial
Intelligence, Beijing University of Posts and Telecommunications
3
Graduate School of Informatics, Nagoya University, Chikusa,
Nagoya 464-8601, Japan. Correspondence to: Jianming Deng
<dengjm@lzu.edu.cn>.
Preprint.
With the rapid advancement of artificial intelligence and the
growing abundance of available traffic data, such as surveil-
lance camera feeds in recent years, the pattern of TSC has
undergone substantial evolution. Among these changes,
reinforcement learning (RL) techniques have driven signifi-
cant progress within TSC. For example, CoLight (Wei et al.,
2019b) demonstrates exceptional performance and scala-
bility in large-scale TSC, while AttendLight (Oroojlooy
et al.,2020), another innovative model, exhibits versatility
in handling various intersection topologies. These pioneer-
ing developments underscore the transformative potential
of emerging technologies in reshaping the future landscape
of traffic management at signalized intersections.
Generally, RL-based methodologies significantly enhance
TSC performance through three primary approaches. First,
some methods contribute to the field by ingeniously design-
ing effective state representations or reward functions, as
exemplified by PressLight (Wei et al.,2019a) and Advanced-
XLight (Zhang et al.,2022). These advancements aim to
optimize decision-making processes within TSC, ensuring
a more nuanced and responsive system. Second, developing
advanced neural networks, as observed in FRAP (Zheng
et al.,2019a) and CoLight (Wei et al.,2019b), significantly
enhances transportation efficiency. These developments
pave the way for more streamlined and adaptive control sys-
tems capable of responding dynamically to the complexities
of urban traffic. Third, integrates advanced RL techniques,
such as HiLight (Xu et al.,2021) and MetaLight (Zang et al.,
2020). These approaches explore new horizons in learning
and adaptation, pushing the limits of optimizing traffic flow
and alleviating congestion.
Despite these remarkable advancements, prior studies still
grapple with the challenge of insufficiently addressing inher-
ent limitations within the existing TSC control framework.
Contemporary advanced RL strategies for TSC predomi-
nantly employ a single-stage control framework. For each
fixed action duration, an appropriate signal phase is deter-
mined, with the choice between maintaining the current
phase or switching to a more suitable one. This mechanism
mirrors control systems found in human-interactive games,
such as Atari (Mnih et al.,2013). However, such a single-
stage control framework exhibits two primary limitations.
First, the duration of each phase is significantly influenced
by the fixed action duration (Zhang et al.,2022;2023), lack-
1
arXiv:2211.01025v2 [cs.LG] 2 May 2024
DynamicLight: Two-Stage Dynamic Traffic Signal Timing
ing sufficient flexibility and variability in phase durations.
Second, the duration of each phase cannot be ascertained
until another phase is actuated. Therefore, there exists a
critical need to develop models capable of supporting dy-
namic phase durations. Recognizing and overcoming these
challenges is essential for unlocking the full potential of
RL-based methodologies in revolutionizing TSC.
This study introduces a novel two-stage framework named
DynamicLight to enhance the single-stage framework and
achieve dynamic phase duration. The improvement involves
integrating a duration control strategy that actively deter-
mines the phase duration, rather than allowing passive varia-
tion. Within such a new structure, one policy is dedicated to
controlling the traffic phase, while another is responsible for
determining the corresponding duration. This sophisticated
two-stage approach promises to introduce a higher degree
of adaptability and responsiveness to the dynamic nature of
traffic conditions, marking a significant advancement in the
realm of intelligent traffic management systems. The main
contributions are organized as follows:
Two-stage dynamic TSC framework: Introducing Dy-
namicLight, an efficient two-stage control framework em-
ploying a dual-policy mechanism. This framework seam-
lessly integrates phase selection and duration determina-
tion, allowing for dynamic phase durations in TSC.
Robust scalability of DynamicLight: Various Dynami-
cLight variants are created by replacing the phase control
strategy with an alternative one. These variants validate
the effectiveness and robustness of our framework, high-
lighting its practical applicability.
Superior performance beyond state-of-the-art (SOTA)
models: Experimental results show that DynamicLight
surpassed SOTA TSC models, establishing a new bench-
mark for advanced traffic control systems.
2. Related Work
2.1. Traditional Methods
In the realm of real-world TSC, commonly applied tradi-
tional methods exhibit a significant dependence on either
manually crafted signal plans or rule-based systems.
FixedTime (Koonce & Rodegerdts,2008) is a traffic sig-
nal timing strategy that effectively regulates traffic signal
operations by relying on predetermined values for cycle
length, phase sequence, and phase split. GreenWave (T
¨
or
¨
ok
& Kert
´
esz,1996) is designed to analyze applicable condi-
tions of the Green-Wave traffic theory, employing a two-
phase signal control concept for optimization. This strategy
allows vehicles to pass through multiple intersections con-
secutively on green lights, optimizing traffic flow on main
roads. Actuated control (Cools et al.,2013) introduced a
self-organizing mechanism that dynamically responds to
varying traffic conditions. This innovation has improved
traffic flow by enabling traffic signals to autonomously adapt
based on pre-defined rules and real-time traffic data. Adap-
tive control systems, such as SCATS (Lowrie,1990) and
SCOOT (Hunt et al.,1982), employed a decision-making
process to select optimal traffic plans based on real-time
data obtained from loop sensors. Widely embraced in large
urban settings, such adaptive control systems significantly
enhance traffic flow and responsiveness by dynamically ad-
justing to the prevailing traffic conditions.
Recently, traditional optimization-based methodologies,
such as Max Pressure (Varaiya,2013) and MaxQueue-
Length (Zhang et al.,2023), employed max-pressure and
max queue-length strategies to optimize TSC. These ap-
proaches have demonstrated significant efficacy in tackling
complex congestion challenges at urban intersections, lead-
ing to a substantial enhancement in the overall efficiency of
traffic management systems.
2.2. RL-based Methods
Several RL-based methodologies enhanced TSC perfor-
mance by designing effective state representations of reward
functions. LIT (Zheng et al.,2019b) made significant strides
in optimizing TSC by introducing a streamlined approach
to state and reward design. This innovative methodology
proved to be highly effective, surpassing the performance
of IntelliLight (Wei et al.,2018). PressLight (Wei et al.,
2019a) advanced the capabilities of LIT and IntelliLight
through the seamless integration of “pressure” into both
the state and reward design. This integration significantly
contributed to enhancing the overall TSC strategy, demon-
strating its effectiveness in coordinating signals on arterial
roadway networks. MPLight (Chen et al.,2020) enhanced
FRAP (Zheng et al.,2019a) by incorporating “pressure” in
the state representation and reward function design. Atten-
tionLight (Zhang et al.,2023) employed queue length for
both state representation and reward function, significantly
surpassing FRAP. Advanced-XLight (Zhang et al.,2022) in-
troduced effective running vehicle number and traffic move-
ment pressure as the state representations, demonstrating
SOTA performance.
Furthermore, some RL-based methods have significantly
enhanced TSC performance by developing sophisticated net-
work structures. FRAP (Zheng et al.,2019a) demonstrated
exceptional skill in crafting phase features and adeptly cap-
turing intricate relationships arising from phase competition
in TSC. CoLight (Wei et al.,2019b) harnessed the capabili-
ties of a graph attention network (Velickovic et al.,2017),
specifically tailored to facilitate seamless cooperation at
intersections, showcasing improved TSC efficacy. Attend-
2
DynamicLight: Two-Stage Dynamic Traffic Signal Timing
Light (Oroojlooy et al.,2020) utilized an attention network
to adeptly manage diverse intersection topologies.
Some other RL-based methodologies adopted advanced
RL techniques to enhance model performance. Demo-
Light (Xiong et al.,2019) utilized imitation learning (Ho
& Ermon,2016) to accelerate learning. HiLight (Xu et al.,
2021) enabled each strategy to learn a high-level policy, opti-
mizing the objective locally using hierarchical RL (Kulkarni
et al.,2016). MetaLight (Zang et al.,2020) utilized meta-
learning (Finn et al.,2017) to efficiently and robustly adapt
to changing traffic scenarios.
All the aforementioned methods utilized a single-stage con-
trol framework. However, the duration of a phase is solely
influenced by the action duration, leading to a lack of ad-
equate variability. Moreover, the single-stage framework
lacks the capability to pre-determine the duration of each
stage before initiating the next one. In this study, we in-
troduce DynamicLight, a two-stage framework designed
to improve upon the single-stage framework and enable
dynamic phase durations.
3. Preliminaries
Figure 1.
Illustration of a standard intersection structure with four
entry and four exit approaches (East, West, South, and North), each
featuring three types of lanes (left, straight, and right). Subfigures
depict (b) traffic movement signals, (c) signal phases, and (d) state
representations for a comprehensive overview.
Traffic network. A typical representation of a traffic net-
work involves a directed graph, with nodes corresponding
to intersections and roads corresponding to edges. Figure 1
(a) illustrates a standard intersection structure within this
graph. Each road is composed of three types of lanes (i.e.,
turning left, going straight, and turning right), acting as the
fundamental units facilitating vehicle movement and deter-
mining the trajectory of each vehicle passing through the
intersection. An incoming lane serves as the entry point for
vehicles approaching the intersection, orchestrating the ini-
tial flow of traffic. An outgoing lane provides a designated
area for vehicles to seamlessly exit an intersection, thereby
enhancing the overall efficiency of the traffic network.
Traffic movements and phases. A traffic movement
refers to vehicles traveling at an intersection in a specific di-
rection. In certain countries, vehicles making a right turn are
allowed to proceed regardless of the signal but must come
to a stop at a red light, as indicated by the black signals in
Figure 1(b). Additionally, each intersection has its own
phase settings. A signal phase comprises a set of permitted
traffic movements. As illustrated in Figure 1(c), each of
the four signal phases controls two traffic movements that
do not conflict with each other. Once a phase is activated,
its duration is the period during which it remains active. To
comprehensively reflect the traffic environment, the state
representations are lane-based, as depicted in Figure 1(d).
Problem statement. In a multi-intersection TSC system,
each intersection is managed by an RL agent. An agent
observes the environment and takes actions involving phase
and duration, which lead to receiving a reward. The ob-
jective function for all agents is to learn an optimal policy
that maximizes their cumulative rewards. For ease of de-
ployment, certain agents are designed to handle various
intersection topologies, ensuring adaptability to different
configurations and enhancing the overall versatility of the
implemented system.
4. DynamicLight
DynamicLight, a two-stage TSC framework, utilizes one
deep Q-network for both the phase and duration control to
dynamically adjust phase durations. Specifically, the phase
control is responsible for determining the optimal traffic
phase, while the duration control is tasked with determining
the duration of the selected phase. Figure 2shows the
overview architecture of DynamicLight.
4.1. DynamicLight Agent
State. Consider a TSC system with
N
intersections
I=
{I1
,
· · ·
,
IN}
. Here,
Lin
and
Lout
represent the sets of
incoming and outgoing lanes, respectively, for a specific
intersection. Seven state representations are utilized to de-
scribe the environment. Formally, let
Sl= [sl,1,· · · , sl,7]
denote the set of all state descriptors, where
sl,i
,
l∈ Lin
represents the
i
-th state representation on lane
l
. These
state representations include the current phase (
sl,1
), queue
length (
sl,2
), effective running vehicle number (
sl,3
), and
the number of vehicles under the segmented road (four seg-
ments of 100 meters each, i.e., sl,4to sl,7).
3
DynamicLight: Two-Stage Dynamic Traffic Signal Timing
Figure 2.
Overview architecture of DynamicLight. (a) The TSC environment facilitates DynamicLight by providing state representations
S
, executing received actions
ap, ad
, and generating new states
S
and rewards
r
. It serves as the essential interface for interaction,
enabling the seamless flow of information and feedback between the agent and its environment. These transition tuples
⟨S, ap, ad, r, S
at
an intersection are collected as the replay memory. (b) Feature fusion involves acquiring states from the environment and embedding them
into lane features. Subsequently, the lane features undergo phase feature fusion through a multi-head self-attention (MHA) mechanism.
(c) Phase control utilizes phase features as inputs and employs a deep network to approximate the Q-values. (d) Duration control selects
the phase feature corresponding to the predicted phase action in (c) and embeds it to predict the Q-values. The phase action and duration
action are determined using argmax operation. Note that the networks in (b) and (c) are updated with mini-batches
⟨S, ap, r, S
from the
replay memory. Similarly, the networks in (b) and (d) are updated with mini-batches ⟨S, ad, r, S.
Action. Define the phase and duration action spaces as
Ap={ap
1, ap
2,· · · , ap
4}
and
Ad={ad
1, ad
2,· · · , ad
7}
, re-
spectively. Each element in
Ap
corresponds to a specific
signal phase type (e.g., Type A, B, C, or D), while each ele-
ment in
Ad
represents the duration time of a phase. In this
study, we extensively explored the duration action space and
ultimately selected
Ad={10,15,20,25,30,35,40}
sec-
onds in Appendix A.1. At an intersection, the agent selects
a phase action
ap
i
as its initial phase and subsequently main-
tains it for the duration of
ad
j
. These two actions control
the signal phase of the intersection, and the agent receives
a reward based on its decisions. Through
Nt
interactions,
each agent learns and refines its control policies over time.
Reward. Both the phase and duration controls utilize
negative intersection queue length as their rewards, with
the reward for controlling an intersection denoted as
r=
Psl,2
. Intuitively, DynamicLight seeks to minimize the
average travel time by maximizing the reward.
4.2. Deep Q-Network Design
Feature fusion. The features of each state descriptor
si,l
are initially embedded and concatenated to a lane feature:
Fl= Embed(Embed(sl,1) · · · Embed(sl,7)),(1)
where
denotes the concatenation operation. Various
feature fusion methods, including addition (Zheng et al.,
2019a), embedding with a multi-layer perceptron (MLP),
and multi-head self-attention (MHA) (Vaswani et al.,2017),
were explored in Appendix A.2. Finally, MHA was chosen
due to its superior performance. Since each phase comprises
two lanes (
Fl1
and
Fl2
as illustrated in Figure 2(c)), the
averaged feature fusion for phase pcan be calculated by
Fp= Mean (MHA(Fl1Fl2)) .(2)
Note that the fused phase feature
Fp
serves as the input for
Q-value prediction in both phase and duration controls.
Q-value prediction. All the fused phase features are mod-
eled with MHA to capture their correlations, and the cor-
related features are embedded to generate Q-values for the
phase control. Subsequently, the phase action with the max-
imum Q-value is selected. In practice, the phase control
needs to complete its task before the duration control, as
selecting an appropriate duration depends on the determined
phase.
Next, the fused phase feature
Fp
and the pre-determined
phase action serve as inputs to the duration control. The four
features are concatenated, and the result is multiplied by the
representation of the phase action to extract the correspond-
4
摘要:

DynamicLight:Two-StageDynamicTrafficSignalTimingLiangZhang1YutongZhang2ShubinXie1JianmingDeng1ChenLi3AbstractReinforcementlearning(RL)isgainingpopularityasaneffectiveapproachfortrafficsignalcontrol(TSC)andisincreasinglyappliedinthisdomain.However,mostexistingRLmethodologiesareconfinedtoasingle-stage...

展开>> 收起<<
DynamicLight Two-Stage Dynamic Traffic Signal Timing.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:5.63MB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注