Wang, Xu, Xiong, Kan, Xu, and Pun
In this research, the Webster method is taken into comparison as a baseline for RL-based methods. Self-Organizing
Traffic Light Control (SOTL) [
4
] is a fully-actuated control algorithm. It decides whether to keep or change the current
phase based on whether the number of vehicles approaching the green signal is larger than a threshold. The detailed
rules for SOTL can be found in [
24
]. SCATS (Sydney Coordinated Adaptive Traffic System) [
25
] is an intelligent
transportation system that is widely used at more than
50,000
intersections in over
180
cities in
28
countries. It selects
from pre-defined traffic signal plans (i.e., cycle length, phase split and offsets) according to the data derived from
loop detectors or other road traffic sensors. However, these conventional methods are usually based on oversimplified
information, assumptions about the traffic model (i.e., assuming the traffic flow is uniform during a certain period), or
need expert knowledge to design the pre-defined signal plans.
RL-based Traffic Signal Control.
With the success of deep reinforcement learning (DRL) in different areas [
26
,
27
],
more and more research studies are trying to use DRL to solve traffic signal problems [
28
]. A number of works
[
7
,
8
,
9
,
18
,
10
,
11
,
12
,
13
,
14
,
15
,
16
,
6
] use value-based methods while others [
17
,
19
,
20
,
21
,
22
] use policy-based
methods. These works vary in the action designs, including choose next phase [
11
,
12
,
13
,
15
,
6
,
21
,
22
], keep or
change [8, 9, 10] and set current phase duration [7, 18, 19].
A few attempts at training general models for traffic signal control have been made by previous research works. For
instance, the FRAP [
13
] method is proposed to adapt to new scenarios. [
15
] adapt the parameter sharing method
based on FRAP and show strong performance on the scale of thousands of traffic lights. Nevertheless, they all utilize
choose next phase as the action design, and it cannot keep the original phase structure of traffic lights. At each action
decision step, the signal phase can be chosen from all possible phases that are combined from all non-conflicting
movements rather than the original phases, which ignores the pedestrian and are against driving habits. AttendLight
[
22
] incorporates the attention mechanism to train a universal model for the intersections with different structures
and traffic flow distribution. Although it can maintain the phase structure, its action design is also choose next phase,
which may make traffic signals change in a random sequence. This can be an issue of concern in practice as it leads to
an unsafe situation for both drivers and pedestrians [
14
]. In sharp contrast, set current phase duration is used in our
research, which is more reasonable and efficient than choose next phase.
4 Preliminary Definitions
A standard four-way intersection shown in Figure 1a is used as an example to illustrate the terminology. These concepts
can be easily extended to the intersections with different structures.
•Traffic movement:
A traffic movement is a connection between an incoming lane
lin
to an outgoing lane
lout
,
denoted as
lin →lout
. For the common 4-way intersection in Figure 1a, there are a total of
12
movements,
including straight, left and right turns in four directions.
•Movement signal:
A movement signal is defined on the traffic movement. The green signal means the
corresponding movement is allowed and the red signal indicates the movement is prohibited. In Figure 1a,
the movement signal
3
is green, indicating that the vehicle can travel from east to west at this time. Although
there are
12
movements in a 4-way intersection, as the right-turn traffic can pass regardless of the signal, only
eight movement signals are used. Figure 1b shows the eight movements signals and the incoming lanes and
outgoing lanes associated with each signal. For example,
m3
indicates that the vehicles can go from
lin
5
to
lout
10 ,lout
11 and lout
12 respectively.
•Phase:
A phase is a combination of movement signals. Figure 1c shows the four phases of the 4-way
intersection. Each phase involves a set of movement signals. For example, phase-1 involves
m1={lin
2→
lout
7, lin
2→lout
8, lin
2→lout
9}
and
m5={lin
8→lout
1, lin
8→lout
2, lin
8→lout
3}
. It should be noted that the
number of movement signals contained in different phases may vary.
•Signal plan:
A signal plan for an intersection is a sequence of phases and their corresponding durations.
Here we denote a signal plan as
{(p1, t1),(p2, t2),· · · ,(pi, ti),· · · }
, where
pi
and
ti
represent a phase and
the duration of this phase, respectively. Usually, the phase sequence is in a cyclic order; that is, the signal plan
can be denoted as
{(p1, t1),(p2, t2),· · · ,(pN, tN),(p1, t1),(p2, t2),· · · ,(pN, tN),· · · }
. Figure 1c shows a
cycle-based signal plan and the duration of each phase is
50
,
20
,
50
and
20
as an example. For the signal plan
optimization task, the common methods mainly adjust each phase
pi
duration
ti
to achieve the purpose of
reducing the total waiting time of vehicles at the intersection.
3