loss, approximation-based methods [19] can be used where
the expected sum of future rewards for a newly discovered
state can be approximated based on known states (often with
the help of state abstraction when they are too complex to
directly compare). While they can address complex problems
in very large state spaces, they can only provide approximate
solutions.
One of the most commonly used tabular-based reinforce-
ment learning algorithms is Q-learning due to its simplicity
and guaranteed convergence to an optimal policy [17]. It stores
and iteratively updates the expected sum of future rewards
for each state-action pair in a table (a.k.a., Q-table) while
going through trials and errors. A properly updated Q-table
can therefore tell what is the best action to choose in a given
state. A more detailed explanation, including how to update
a Q-table to ensure the convergence, can be found in Sutton
and Barto [11].
B. Many-Objective Search
Many-objective search [20, 21] refers to solving multiple
objectives simultaneously, typically more than four, using
meta-heuristic search algorithms, such as MOEA/D [22] and
NSGA-III [23]. With the help of fitness functions that quantify
the goodness of candidate solutions in terms of individual ob-
jectives, the algorithms can deliver a set of solutions satisfying
as many objectives as possible.
In software testing, many-objective search has been applied
to solve testing problems with many test requirements. The
idea is to recast such testing problems into optimization
problems by carefully defining fitness functions for all ob-
jectives. However, since the number of objectives (i.e., test
requirements) is often much more than four (e.g., covering all
branches in a program), researchers have developed tailored
algorithms for testing. MOSA [12] is a well-known algorithm
dedicated to test suite generation with many objectives. It uses
an evolutionary algorithm to achieve each objective individ-
ually by effectively searching for uncovered objectives and
keeping an archive for the best test cases achieving objectives.
By defining fitness functions to measure the likelihood of
covering individual branches in a program, MOSA can ef-
fectively and efficiently generate a test suite covering as many
branches as possible [12]. FITEST [13] is another state-of-the-
art algorithm that extends MOSA to decrease the population
size as the number of uncovered objectives decreases to
improve efficiency, in contrast to MOSA that maintains the
same population size during the entire search.
III. PROBLEM DESCRIPTION
In this section, we provide a precise problem description
regarding the automated online testing of DNN-enabled sys-
tems (DES) by dynamically changing their environment during
simulation. As a working example, we use a DNN-enabled
autonomous driving system (DADS) to illustrate our main
points, but the description can be generalized to any DES.
In online testing, a DADS under test is embedded into and
interacts with its driving environment. However, because of the
risks and costs it entails, online testing is usually performed
with a simulator rather than on-road vehicle testing. Using a
simulator enables the control of the driving environment, such
as weather conditions, lighting conditions, and the behavior
of other actors (e.g., vehicles on the road and pedestrians).
During simulation, the DADS continuously interacts with
the environment by observing the environment via the ego
vehicle’s sensors (e.g., camera and LIDAR) and driving the
ego vehicle through commands (e.g., steering, throttle, and
braking). Due to the closed-loop interaction between the
DADS and its environment, simulation is an effective in-
strument to check if any requirements violation can occur
under realistic conditions. Notice that such violations can be
triggered by dynamically changing the driving environment
during simulation; for example, certain changes to the speed
of the vehicle in front, which can often occur in practice due
to impaired driving or sudden stops, can cause a collision. The
goal of DADS online testing, based on dynamically changing
the environment, is to find a minimal set of test cases, each
of them changing the environment in a different way, to cause
the DADS to violate as many requirements as possible.
Specifically, let dbe the DADS under test on board the
ego vehicle and E= (X, K)be the environment where
X={x1, x2, . . . }is a set of dynamic elements of the
simulation (e.g., actors other than the ego vehicle, and the
environment conditions) and Kis a set of static elements (e.g.,
roads, buildings, and trees). Each dynamic element x∈Xcan
be further decomposed into a sequence x=hx1, x2, . . . , xJi
where Jis the duration of the simulation and xjis the value
of xat time j∈ {1,2, . . . , J}(e.g., the position, speed,
and acceleration of the vehicle in front at time j). Based
on that, we can define Xj={xj|x∈X}as capturing
the set of values of all the dynamic elements in Xat time
jand Ej= (Xj, K)as indicating the set of values of all
environmental elements (both dynamic and static) in Eat time
j. Let a test case t=hX1, . . . , XJibe a sequence of values
of the dynamic elements for Jtime steps. By running din E
with t, using a simulator, at each time step j∈ {1,2, . . . , J},
dobserves the snapshot of Ej= (Xj, K)using the sensors
(e.g., camera and LIDAR) and generates driving commands
Cj(e.g., throttle, steering, and braking) for the ego vehicle.
Note that what dobserves from the same Evaries depending
on the ego vehicle’s dynamics (e.g., position, speed, and
acceleration) computed by C; for example, dobserves the
relative distance between the ego vehicle and the vehicle in
front, which naturally varies depending on the position of
the ego vehicle. For the next time step j+ 1, the simulator
computes Ej+1 = (Xj+1, K)according to tand generates
what dobserves from Ej+1 by taking into account Cj.
Given a set of requirements (safety, functional) R, the
degree of a violation for a requirement r∈Rproduced by
din Efor tat any time step j∈ {1, . . . , J}, denoted by
v(r, d, E, t, j), can be measured by monitoring the simulation
of din Efor t. For example, the distance between the ego
vehicle and the vehicle in front can be used to measure how
close they are from colliding. If v(r, d, E, t, j)is greater than