
DynamicLight: Two-Stage Dynamic Traffic Signal Timing
ing sufficient flexibility and variability in phase durations.
Second, the duration of each phase cannot be ascertained
until another phase is actuated. Therefore, there exists a
critical need to develop models capable of supporting dy-
namic phase durations. Recognizing and overcoming these
challenges is essential for unlocking the full potential of
RL-based methodologies in revolutionizing TSC.
This study introduces a novel two-stage framework named
DynamicLight to enhance the single-stage framework and
achieve dynamic phase duration. The improvement involves
integrating a duration control strategy that actively deter-
mines the phase duration, rather than allowing passive varia-
tion. Within such a new structure, one policy is dedicated to
controlling the traffic phase, while another is responsible for
determining the corresponding duration. This sophisticated
two-stage approach promises to introduce a higher degree
of adaptability and responsiveness to the dynamic nature of
traffic conditions, marking a significant advancement in the
realm of intelligent traffic management systems. The main
contributions are organized as follows:
•
Two-stage dynamic TSC framework: Introducing Dy-
namicLight, an efficient two-stage control framework em-
ploying a dual-policy mechanism. This framework seam-
lessly integrates phase selection and duration determina-
tion, allowing for dynamic phase durations in TSC.
•
Robust scalability of DynamicLight: Various Dynami-
cLight variants are created by replacing the phase control
strategy with an alternative one. These variants validate
the effectiveness and robustness of our framework, high-
lighting its practical applicability.
•
Superior performance beyond state-of-the-art (SOTA)
models: Experimental results show that DynamicLight
surpassed SOTA TSC models, establishing a new bench-
mark for advanced traffic control systems.
2. Related Work
2.1. Traditional Methods
In the realm of real-world TSC, commonly applied tradi-
tional methods exhibit a significant dependence on either
manually crafted signal plans or rule-based systems.
FixedTime (Koonce & Rodegerdts,2008) is a traffic sig-
nal timing strategy that effectively regulates traffic signal
operations by relying on predetermined values for cycle
length, phase sequence, and phase split. GreenWave (T
¨
or
¨
ok
& Kert
´
esz,1996) is designed to analyze applicable condi-
tions of the Green-Wave traffic theory, employing a two-
phase signal control concept for optimization. This strategy
allows vehicles to pass through multiple intersections con-
secutively on green lights, optimizing traffic flow on main
roads. Actuated control (Cools et al.,2013) introduced a
self-organizing mechanism that dynamically responds to
varying traffic conditions. This innovation has improved
traffic flow by enabling traffic signals to autonomously adapt
based on pre-defined rules and real-time traffic data. Adap-
tive control systems, such as SCATS (Lowrie,1990) and
SCOOT (Hunt et al.,1982), employed a decision-making
process to select optimal traffic plans based on real-time
data obtained from loop sensors. Widely embraced in large
urban settings, such adaptive control systems significantly
enhance traffic flow and responsiveness by dynamically ad-
justing to the prevailing traffic conditions.
Recently, traditional optimization-based methodologies,
such as Max Pressure (Varaiya,2013) and MaxQueue-
Length (Zhang et al.,2023), employed max-pressure and
max queue-length strategies to optimize TSC. These ap-
proaches have demonstrated significant efficacy in tackling
complex congestion challenges at urban intersections, lead-
ing to a substantial enhancement in the overall efficiency of
traffic management systems.
2.2. RL-based Methods
Several RL-based methodologies enhanced TSC perfor-
mance by designing effective state representations of reward
functions. LIT (Zheng et al.,2019b) made significant strides
in optimizing TSC by introducing a streamlined approach
to state and reward design. This innovative methodology
proved to be highly effective, surpassing the performance
of IntelliLight (Wei et al.,2018). PressLight (Wei et al.,
2019a) advanced the capabilities of LIT and IntelliLight
through the seamless integration of “pressure” into both
the state and reward design. This integration significantly
contributed to enhancing the overall TSC strategy, demon-
strating its effectiveness in coordinating signals on arterial
roadway networks. MPLight (Chen et al.,2020) enhanced
FRAP (Zheng et al.,2019a) by incorporating “pressure” in
the state representation and reward function design. Atten-
tionLight (Zhang et al.,2023) employed queue length for
both state representation and reward function, significantly
surpassing FRAP. Advanced-XLight (Zhang et al.,2022) in-
troduced effective running vehicle number and traffic move-
ment pressure as the state representations, demonstrating
SOTA performance.
Furthermore, some RL-based methods have significantly
enhanced TSC performance by developing sophisticated net-
work structures. FRAP (Zheng et al.,2019a) demonstrated
exceptional skill in crafting phase features and adeptly cap-
turing intricate relationships arising from phase competition
in TSC. CoLight (Wei et al.,2019b) harnessed the capabili-
ties of a graph attention network (Velickovic et al.,2017),
specifically tailored to facilitate seamless cooperation at
intersections, showcasing improved TSC efficacy. Attend-
2