For a reasonable choice of parameters, the user either needs
expert knowledge of the target application or a preliminary
fuzzing campaign. Finally, MOPT fails to outperform AFL,
which is built on, in the popular FuzzBench fuzzer benchmark
by Google [27]. This makes designing and building a practical
mutation scheduler a challenging open problem.
This work. This paper focuses on one aspect of
the fuzzing process: finding (approx.) optimal mutation
scheduling strategies to improve fuzzing algorithmically.
Here, the challenging goal is to infer which mutation operator
is the optimal choice for the next fuzzing iteration. We address
this problem with DARWIN, a novel mutation-scheduling
algorithm to improve the general performance of mutational
fuzzers. DARWIN leverages an Evolution Strategy (ES), in a
setting similar to reinforcement learning, to approximate ideal
probability distribution for the mutation operator selection
to avoid wasting fuzzing iterations on suboptimal mutators.
The resulting probability distribution is not statically set
but learned during the fuzzing process and dynamically
adapted to the target program. DARWIN outperforms related
work significantly, not only in coverage but also in the
time to find bugs, without the user having to adjust any
target-specific parameters, which allows non-expert users to
leverage mutation scheduling.
Challenges. Although we focus only on a specific phase
of the fuzzing process, namely the mutation selection in the
havoc phase, the problem of finding an optimal probability
distribution for mutation selection is highly challenging:
numerous mutation operators can be used, and their efficiency
varies depending on the target program, the current input, and
the state inherently implied by the current input. Furthermore,
the efficiency can vary depending on the non-deterministic
nature of each fuzzing run and the interplay between fuzzing
stages. Therefore, it is impossible to examine all possible
options exhaustively in the general case.
Contributions. Our DARWIN mutation scheduler and its
implementation based on AFL tackle all these challenges. To
summarize, our main contributions include:
•We present a novel mutation scheduling approach, DAR-
WIN; the first mutation scheduler that leverages a variant
of Evolution Strategy to optimize the probability distri-
bution of mutation operators. DARWIN dramatically im-
proves the efficiency of mutation selection while keeping
the execution speed constant. DARWIN can be applied
to any feedback-guided mutation-based fuzzer.
•We implemented a prototype of DARWIN by extend-
ing AFL with our mutation scheduling algorithm. By
modifying only three code lines in AFL to integrate
our DARWIN mutation scheduler, we show that DAR-
WIN’s design is easily adoptable by existing fuzzers. We
further highlight this by also integrating DARWIN in
EcoFuzz[67]. What is more, we do not introduce any
additional user-configurable parameters to avoid creating
adoption barriers.
•We thoroughly evaluate DARWIN against AFL as a base-
line and the most recent related work in this area, MOPT.
Our prototype significantly outperforms both fuzzers,
MOPT and AFL, in terms of code coverage reached in
the well-fuzzed GNU binutils suite. Next, DARWIN is
the first mutation scheduler to outperform its base fuzzer
in Google’s Fuzzbench. Further, we evaluate DARWIN
on MAGMA, where we show that DARWIN triggers 15
out of the 21 bugs found the fastest. Finally, DARWIN
finds 20 unique bugs (including one previously unreported
bug), 66% more than AFL, across various real-world
targets.
•We thoroughly analyze the root causes for DARWIN’s
efficiency by first comparing DARWIN to a static pre-
optimized mutation probability distribution, and further,
studying the mutation probability distribution over time,
and introducing a metric to measure a fuzzer’s effective-
ness in scheduling mutations. We show that DARWIN
needs fewer mutations than AFL to reach a coverage point
while achieving a higher execution speed than the state-
of-the-art MOPT fuzzer.
To foster future research in this area, we open-source our
fuzzer at https://github.com/TUDA-SSL/DARWIN.
II. BACKGROUND
This section presents the necessary background information
to understand the general concept of fuzzers, the workflow of
mutation-based fuzzers, and metaheuristic optimization.
A. Fuzzing
On a high level, fuzzers can be divided into mutational, i.e.,
mutating testcases, and generational, i.e., deriving structured
inputs, fuzzers. Mutational fuzzing requires a set (corpus) of
program inputs (seeds), which can, e.g., be obtained from
testcases or real inputs. These seeds are then mutated using
operations known from genetics, like inserting random errors
(bit flips), changing values to corner cases, or combining two
inputs to create a new input. As this way of input generation
does not follow any constraints on the input, the generated
inputs are more unlikely to pass, e.g., initial parser checks or
checksums [63]. The process of mutation can be influenced in
two ways: 1) the location in the input that gets mutated and 2)
the mutation that is applied, whereby the selection can either
be made randomly or guided by a heuristic. Such a heuristic
can be, e.g., success measured in an increase of coverage or a
certain state that should be reached (where the target has been
tainted to find a clear path to that state). For example, the
popular AFL fuzzer uses the coverage metric of basic-block
transitions as a heuristic [29].
B. Fuzzing Loop of Mutational Fuzzers
For mutational fuzzers, the so-called fuzzing loop, which
is the place in the code where the loaded seeds are mutated
before being used as inputs for the program under test, usually
can be divided into three stages, the deterministic, havoc, and
splicing stage [56], [7], [11], [39], [4]. While some aspects are
AFL-specific, most concepts presented are implemented in a
similar way for other fuzzers.
Deterministic stage. In the first stage, the deterministic
stage, a small set of mutations is applied to seeds in a prede-
fined order to create inputs for the target program, whereby the
seeds are drawn from a queue of initial seeds provided by the
user. AFL uses code coverage as a heuristic to decide whether
2