2. Related Work
One classical model of scheduling problems is the Markov Decision Process (MDP), since most
scheduling problems make decisions more on the current state rather than on the entire history.
Of course, mathematically most scheduling problems are not Markovian, so there will be some
adjustments to the ordinary MDP model. Generalized semi-Markov Decision Process (GSMDP)
is one of the suitable adjusted models [1], where it allows the enabled event to trigger at any
time and optimize the makespan. When stochasticity is added to the model, GSMDP can use
continuous phase-type distributions to simulate the duration, such as the duration time of the
task [2].
Before the introduction of reinforcement learning (RL) to the scheduling problem, many
traditional approaches were developed. One of the famous scheduling problems is the Traveling
Salesmen Problem (TSP), where Lin-Kernighan-Helsgaun TSP solver was once the state-of-art
heuristic for it [3]. LKH solver transfers asymmetric problems into symmetric ones and uses
the amount of violated constraints to scale the penalty. Another popular problem related to
scheduling is the Resource-Constrained Project Scheduling Problem (RCPSP). Kolisch extended
the precedence based minimum slack priority rule (MSLK) to a precedence and resources based
slack priority rule to replace RSM to solve RCPSP [4]. If we relax the resource constraint
and add stochasticity to the activities themselves, we get the Stochastic Scheduling Problem
(SSP). There are many non-RL approaches to it. For example, the Rollout algorithm is used
to solve the Quiz Problem, which is one variation of SSP [5]. There are also many heuristic
approaches which describe different aspects of the stochastic features, which allows the usage of
a computationally tractable closed-loop algorithm to solve SSP [6].
Another well-known area of scheduling problems is the Job Shop Scheduling Problem (JSSP),
which we will further discuss it later in Section 4. Miloˇs introduced a classical model of
it, including job-based representation, operation-based representation, and disjunctive graph-
based representation [7], while actually there are many variations of JSSP. According to the
number of operations, the number of available machines for each operation, and whether
stochasticity is introduced to the problem, we have Parallel Machines Job Shop Scheduling
Problem (JSSP-PM), Flexible Job Shop Scheduling (FJSSP), and Stochastic Flexible Job Shop
Scheduling (SFJSSP) [8]. Wittrock introduced a non-RL algorithm to a variation of FJSSP,
which decompose the problem into three sub-problems (machine allocation, order sequencing,
and execution time) and each of these is solved using a fast heuristic [9]. There are also many
RL-based algorithms developed for solving JSSP. Rinciog and Meyer developed the possibility
of using RL-based algorithms to solve JSSP and gave a general design [10]. They separated
MDPs for production scheduling into operation sequencing, routing before sequencing, interlaced
routing and sequencing transport-centric routing and re-scheduling. They also classified RL
algorithms along three discrete axes: value, policy or actor-critic methods; model-free or model-
based methods; and on or off-policy methods. For more specific work, Cheng, Gen and Tsujimura
introduced nine most common representations (job-based, operation-based, preference list-based,
job pair relation-based, priority rule-based, disjunctive graph-based, completion time-based,
machine-based, and random keys) of JSSP that are suitable for further Genetic algorithm (GA)
work [11], and those representations are also suitable for other algorithms, both RL and non-
RL ones. They also gave a full review of different variations of GA that can be applied to
JSSP, including adapting problems to the genetic algorithms, adapting the genetic algorithms
to problems, and adapting both the genetic algorithms and problems [12]. Hasan, Sarker and
Cornforth also gave a detailed improved GA algorithm which includes phenotypes creation,
feasibility check, solution ranking or heuristic ordering, and generation updates (crossover and
mutations) [13]. Type-Aware Graph Attention (TAGAT) algorithm can also solve JSSP by
using ScheduleNet, a decentralized decision-making policy that can coordinate multiple agents
to complete tasks [14]. Besides those two algorithms, there are also many works on solving JSSP