DARWIN Survival of the Fittest Fuzzing Mutators Patrick Jauernig Domagoj Jakobovicz Stjepan Picekx Emmanuel Stapfand Ahmad-Reza Sadeghiy Technical University of Darmstadt Germany fpatrick.jauernig emmanuel.stapf gsanctuary.dev

2025-05-06 0 0 1.38MB 17 页 10玖币
侵权投诉
DARWIN: Survival of the Fittest Fuzzing Mutators
Patrick Jauernig, Domagoj Jakobovic, Stjepan Picek§, Emmanuel Stapfand Ahmad-Reza Sadeghi
Technical University of Darmstadt, Germany, {patrick.jauernig, emmanuel.stapf}@sanctuary.dev
Technical University of Darmstadt, Germany, ahmad.sadeghi@trust.tu-darmstadt.de
University of Zagreb, Croatia, domagoj.jakobovic@fer.hr
§Radboud University and TU Delft, The Netherlands, picek.stjepan@gmail.com
Abstract—Fuzzing is an automated software testing technique
broadly adopted by the industry. A popular variant is mutation-
based fuzzing, which discovers a large number of bugs in
practice. While the research community has studied mutation-
based fuzzing for years now, the algorithms’ interactions within
the fuzzer are highly complex and can, together with the
randomness in every instance of a fuzzer, lead to unpredictable
effects. Most efforts to improve this fragile interaction focused
on optimizing seed scheduling. However, real-world results like
Google’s FuzzBench highlight that these approaches do not
consistently show improvements in practice. Another approach
to improve the fuzzing process algorithmically is optimizing
mutation scheduling. Unfortunately, existing mutation scheduling
approaches also failed to convince because of missing real-world
improvements or too many user-controlled parameters whose
configuration requires expert knowledge about the target pro-
gram. This leaves the challenging problem of cleverly processing
test cases and achieving a measurable improvement unsolved.
We present DARWIN, a novel mutation scheduler and the first
to show fuzzing improvements in a realistic scenario without the
need to introduce additional user-configurable parameters, open-
ing this approach to the broad fuzzing community. DARWIN
uses an Evolution Strategy to systematically optimize and adapt
the probability distribution of the mutation operators during
fuzzing. We implemented a prototype based on the popular
general-purpose fuzzer AFL. DARWIN significantly outperforms
the state-of-the-art mutation scheduler and the AFL baseline in
our own coverage experiment, in FuzzBench, and by finding 15
out of 21 bugs the fastest in the MAGMA benchmark. Finally,
DARWIN found 20 unique bugs (including one novel bug), 66%
more than AFL, in widely-used real-world applications.
I. INTRODUCTION
Vulnerabilities caused by programming errors are still a
major threat to today’s programs [47]. An important class
of programming errors is memory corruption vulnerabilities,
where unexpected, malformed inputs can lead to uncontrolled
behavior in the program, which can often be abused by
attackers. A modern, cost-efficient strategy to uncover these
programming errors is automated software testing using fuzz
testing (commonly known as fuzzing). Fuzzing automatically
generates inputs from testcases and feeds them to the program
under test while monitoring the program. If a programming er-
ror has been reached, the fuzzer notices that the program hangs
or crashes. Optionally, the observed control-flow changes can
serve as feedback for the next iteration, i.e., whether a new
path in the control flow (known as coverage) has been taken
due to the generated input. In recent years, fuzzers emerged as
an important topic in academic as well as industrial research
and are nowadays widely used for finding bugs in commercial
software [41], [28]. Projects like Google OSSFuzz [28] helped
to significantly increase the adoption rate by offering free com-
putation for fuzzing while still allowing security researchers,
who provide the fuzzers, to keep the bug bounty for discovered
vulnerabilities.
While fuzzers are responsible for discovering tremendous
amounts of bugs, even in operating system kernels [62], they
are still extensively researched, e.g., in the areas of making
targets available to fuzz testing [69], [20], improving fuzzers
using new algorithms [7], [11], [38], [39], [37], [51], and
leveraging new hardware features for performance or coverage
improvements [56], [13].
This paper focuses on the subject of algorithmic improve-
ments for mutational fuzzers, which leverage an existing set
of testcases (referred to as corpus) to constantly generate new
variants of these testcases by applying mutation operators
inspired by genetic mutations. Most notably, a significant
number of works focused on the effects of algorithmically
sampling a subset of optimal seeds from the corpus. The goals
of these works range from removing redundancy to creating
a minimal coverage-preserving corpus with small files [67],
[48], efficiently reaching specific locations in the control-flow
graph [7], [11], [70], or improving coverage in general [44].
While these approaches are designed to select from a large
number of possible testcases, in reality, testcases suitable for
fuzzing are often rare [29].
Aside from seed-selection algorithms, other approaches
have been proposed [37], [38], [51] that approximate which
byte positions in the testcase give the best results when being
mutated, but not which mutation operators to apply. Yet, this
problem is highly challenging, as it is required to be shown
whether 1) mutation selection is actually target-dependent, 2)
the selection distribution is static or dynamic, 3) introducing an
optimization algorithm reduces execution speed s.t. its better
mutation selection is outweighed.
The first approach to optimize the actual selection of muta-
tions (mutation scheduling) has been MOPT [39]. MOPT pro-
poses a variant of the Particle Swarm Optimization algorithm
(PSO) to learn a globally optimal mutation probability distri-
bution. However, MOPTs PSO algorithm has both local and
global best probability distributions, making finding the best
solution, and therefore the algorithm itself, complex and more
expensive to use during fuzzing. Similar to other algorithmic
improvements to fuzzing, finding a practical trade-off between
complexity and algorithmic improvements is challenging. All
additional algorithms have direct implications on execution
speed, and hence, reduce coverage over time. Further, MOPT
introduces various user-configurable parameters that steer the
optimization process directly, so the user needs to solve another
complex problem instead to avoid non-optimal scheduling.
arXiv:2210.11783v1 [cs.SE] 21 Oct 2022
For a reasonable choice of parameters, the user either needs
expert knowledge of the target application or a preliminary
fuzzing campaign. Finally, MOPT fails to outperform AFL,
which is built on, in the popular FuzzBench fuzzer benchmark
by Google [27]. This makes designing and building a practical
mutation scheduler a challenging open problem.
This work. This paper focuses on one aspect of
the fuzzing process: finding (approx.) optimal mutation
scheduling strategies to improve fuzzing algorithmically.
Here, the challenging goal is to infer which mutation operator
is the optimal choice for the next fuzzing iteration. We address
this problem with DARWIN, a novel mutation-scheduling
algorithm to improve the general performance of mutational
fuzzers. DARWIN leverages an Evolution Strategy (ES), in a
setting similar to reinforcement learning, to approximate ideal
probability distribution for the mutation operator selection
to avoid wasting fuzzing iterations on suboptimal mutators.
The resulting probability distribution is not statically set
but learned during the fuzzing process and dynamically
adapted to the target program. DARWIN outperforms related
work significantly, not only in coverage but also in the
time to find bugs, without the user having to adjust any
target-specific parameters, which allows non-expert users to
leverage mutation scheduling.
Challenges. Although we focus only on a specific phase
of the fuzzing process, namely the mutation selection in the
havoc phase, the problem of finding an optimal probability
distribution for mutation selection is highly challenging:
numerous mutation operators can be used, and their efficiency
varies depending on the target program, the current input, and
the state inherently implied by the current input. Furthermore,
the efficiency can vary depending on the non-deterministic
nature of each fuzzing run and the interplay between fuzzing
stages. Therefore, it is impossible to examine all possible
options exhaustively in the general case.
Contributions. Our DARWIN mutation scheduler and its
implementation based on AFL tackle all these challenges. To
summarize, our main contributions include:
We present a novel mutation scheduling approach, DAR-
WIN; the first mutation scheduler that leverages a variant
of Evolution Strategy to optimize the probability distri-
bution of mutation operators. DARWIN dramatically im-
proves the efficiency of mutation selection while keeping
the execution speed constant. DARWIN can be applied
to any feedback-guided mutation-based fuzzer.
We implemented a prototype of DARWIN by extend-
ing AFL with our mutation scheduling algorithm. By
modifying only three code lines in AFL to integrate
our DARWIN mutation scheduler, we show that DAR-
WINs design is easily adoptable by existing fuzzers. We
further highlight this by also integrating DARWIN in
EcoFuzz[67]. What is more, we do not introduce any
additional user-configurable parameters to avoid creating
adoption barriers.
We thoroughly evaluate DARWIN against AFL as a base-
line and the most recent related work in this area, MOPT.
Our prototype significantly outperforms both fuzzers,
MOPT and AFL, in terms of code coverage reached in
the well-fuzzed GNU binutils suite. Next, DARWIN is
the first mutation scheduler to outperform its base fuzzer
in Google’s Fuzzbench. Further, we evaluate DARWIN
on MAGMA, where we show that DARWIN triggers 15
out of the 21 bugs found the fastest. Finally, DARWIN
finds 20 unique bugs (including one previously unreported
bug), 66% more than AFL, across various real-world
targets.
We thoroughly analyze the root causes for DARWIN’s
efficiency by first comparing DARWIN to a static pre-
optimized mutation probability distribution, and further,
studying the mutation probability distribution over time,
and introducing a metric to measure a fuzzer’s effective-
ness in scheduling mutations. We show that DARWIN
needs fewer mutations than AFL to reach a coverage point
while achieving a higher execution speed than the state-
of-the-art MOPT fuzzer.
To foster future research in this area, we open-source our
fuzzer at https://github.com/TUDA-SSL/DARWIN.
II. BACKGROUND
This section presents the necessary background information
to understand the general concept of fuzzers, the workflow of
mutation-based fuzzers, and metaheuristic optimization.
A. Fuzzing
On a high level, fuzzers can be divided into mutational, i.e.,
mutating testcases, and generational, i.e., deriving structured
inputs, fuzzers. Mutational fuzzing requires a set (corpus) of
program inputs (seeds), which can, e.g., be obtained from
testcases or real inputs. These seeds are then mutated using
operations known from genetics, like inserting random errors
(bit flips), changing values to corner cases, or combining two
inputs to create a new input. As this way of input generation
does not follow any constraints on the input, the generated
inputs are more unlikely to pass, e.g., initial parser checks or
checksums [63]. The process of mutation can be influenced in
two ways: 1) the location in the input that gets mutated and 2)
the mutation that is applied, whereby the selection can either
be made randomly or guided by a heuristic. Such a heuristic
can be, e.g., success measured in an increase of coverage or a
certain state that should be reached (where the target has been
tainted to find a clear path to that state). For example, the
popular AFL fuzzer uses the coverage metric of basic-block
transitions as a heuristic [29].
B. Fuzzing Loop of Mutational Fuzzers
For mutational fuzzers, the so-called fuzzing loop, which
is the place in the code where the loaded seeds are mutated
before being used as inputs for the program under test, usually
can be divided into three stages, the deterministic, havoc, and
splicing stage [56], [7], [11], [39], [4]. While some aspects are
AFL-specific, most concepts presented are implemented in a
similar way for other fuzzers.
Deterministic stage. In the first stage, the deterministic
stage, a small set of mutations is applied to seeds in a prede-
fined order to create inputs for the target program, whereby the
seeds are drawn from a queue of initial seeds provided by the
user. AFL uses code coverage as a heuristic to decide whether
2
a mutated seed has been successful. If a seed increases the
code coverage, it is stored in the fuzzing queue. By reusing
successful seeds in the later iterations, the overall fuzzing
performance is improved. Measuring the code coverage is
achieved by instrumenting the binary of the program under
test such that the program is intercepted on every branch hit.
When an input leads to a crash of the program under test, the
user is notified since this indicates a bug in the program. The
first stage of the fuzzing loop with its deterministic mutation
scheme is slow and tends to contribute less to the overall
coverage [39]. Thus, AFL allows disabling the deterministic
stage entirely, which is especially beneficial for short fuzzing
runs [39] or to reduce noise in performance measurements of
the following stages.
Havoc stage. In the second stage of the fuzzing loop,
the non-deterministic havoc stage, randomly chosen mutations
are selected from a list of mutational operators [57], [52],
[29], [12]. In Table X, Appendix E, we list the mutations
used in AFLs havoc stage. The selected mutations are applied
to the inputs received from the deterministic stage or to the
mutated seeds from the fuzzing queue. When the generated
program inputs achieve new coverage, they are again saved
in the fuzzing queue. The fuzzing loop then returns to the
deterministic stage and selects the next element from the
fuzzing queue for the next iteration. The havoc stage is the
most generic stage and widely adopted by AFL-based and
other mutational fuzzers [52], [12], [29], [21], which is also
why our novel mutation scheduler DARWIN targets the havoc
stage.
Splicing stage. The last stage of the fuzzing loop, the
splicing stage, is only activated when none of the inputs in
the fuzzing queue led to new coverage in the havoc stage.
In the splicing stage, a crossover mutation of two inputs is
performed, which is then fed back to the havoc stage, which
again applies a random mutation on the input before testing it
on the target program.
C. Metaheuristics
While in the previous section, we mentioned several ap-
proaches to fuzzing, we did not discuss how such approaches
can actually find good solutions. This is because there exist no
specialized algorithms developed for that particular problem.
Instead, we need to rely on more general solving procedures.
Metaheuristics represent an intuitive choice since they encom-
pass problem-independent techniques used in a broad range
of applications. For example, we can consider the problem of
finding a suitable mutation schedule in the havoc stage as an
optimization problem. Since there is no explicit cost function
for this optimization problem, it cannot readily be paired with
classical optimization algorithms requiring gradient informa-
tion. In that case, metaheuristic algorithms, which do not pose
any requirements on the optimization problem, have proven
to be the method of choice in many engineering applications.
Metaheuristic techniques are commonly used in domains like
the automotive industry [22], medicine [1], scheduling [9],
adversarial examples [59], and implementation attacks [65].
Metaheuristics, in their original definition, represent solu-
tion finding methods that orchestrate an interaction between
local improvement and higher-level strategies to create a
process capable of escaping from local optima and performing
a robust search in a solution space [26]. A common division of
metaheuristic optimization algorithms is into single solution-
based and population-based metaheuristics [60]. Population-
based metaheuristics work on a population of solutions (e.g.,
Evolutionary Algorithms (EA) and swarm algorithms like Par-
ticle Swarm Optimization (PSO)). A population in this context
denotes a set of individuals used during an optimization pro-
cess, whereby an individual is a data structure that corresponds
to an element in the search space (a candidate solution). In
contrast, single solution-based metaheuristics manipulate and
transform a single solution (or a smaller number of solutions)
during the search.
Evolutionary algorithms occupy a prominent place among
metaheuristic algorithms, as they have been successfully ap-
plied to a large number of difficult optimization problems [24],
[54]. We depict pseudocode for the generic evolutionary al-
gorithm in Algorithm 3, Appendix A. In each iteration, the
algorithm applies a selection mechanism that emulates natural
selection. Based on their respective quality, usually denoted
as fitness, better individuals survive, while worse ones are
eliminated. The population then undergoes variation, creating
new genetic material as new individuals in the population.
Finally, all the individuals are reevaluated, and the process
is repeated until a specific termination criterion is met. Since
no knowledge is presumed about the nature of the solutions
in the current population, the termination is usually based on
the number of iterations, allotted time, or finding a solution of
acceptable quality.
Metaheuristic optimization algorithms balance diversifica-
tion and intensification properties; diversification enables the
discovery of promising areas in the search space and escaping
from local optima. Intensification aims to exploit a promising
area by concentrating on the current best solution and finding
better neighboring solutions. The interplay of these properties
determines the effectiveness of metaheuristic methods when
applied to a specific optimization problem.
III. CHALLENGES
Designing a mutation scheduling algorithm comes with a
number of challenges, as mutation scheduling is a fragile part
in the fuzzing process. These challenges are:
C.1: Optimal Mutation Selection. Finding an optimal prob-
ability distribution for mutation selection is challenging, as
the optimal distribution might change per target. Further, the
probability distribution might depend on the state implied by
a part of the input (that is not mutated). Hence, a mutation
scheduler needs to show that this mutation selection indeed
needs to adapt dynamically and, if so, show that iterative
adaption outperforms random selection.
C.2: Integrating an Optimization Algorithm. Properly se-
lecting a candidate algorithm for mutation scheduling is itself
highly challenging. However, integrating this algorithm into
the existing fuzzing process requires a 1) carefully designed
representation not only of the problem but also the solution to
avoid spending too much computation on encoding, 2) finding
a parameter fit for the respective algorithm that fine-tunes
exploration versus intensification.
C.3: Easy Adoption and Reproducibility. A complex ap-
proach with a large number of user-tweakable parameters
3
Bitflip
Overwrite Bytes
Instrumented
Target
Havoc Stage
Feedback
New Probability Distribution
12
3
8%
Select Mutation
1%
0%
9%
DARWIN
Mutation Scheduler
4
Test Case
Fig. 1. High-level overview showing how DARWIN iteratively optimizes the probability distribution for mutation selection and how the selected mutations
are applied to the testcases.
might achieve outstanding results. However, it will still not
be used in practice due to the difficulties in integrating the
approach into a fuzzer or because users fail to find good pa-
rameter values, and hence, they cannot achieve results similar
to the ones reported by the authors.
C.4: Performance Trade-off. Achieving an optimal trade-
off for the mutation selection scenario, which is our goal,
is complex. For instance, fuzzing approaches typically tune
the trade-off between performance and cleverness in seed
selection. Better seeds reach basic blocks guarded by complex
constraints, but optimizing seed selection with algorithms takes
additional time, and hence, decreases execution speed.
We designed DARWIN with these challenges in mind.
Next, we explain how we addressed these challenges through-
out the design, implementation, and evaluation of DARWIN.
IV. DARWIN DESIGN
DARWIN is a novel mutation scheduling algorithm
using an Evolution Strategy (ES) to find an optimal mutation
selection probability distribution to be applied during the
havoc stage. DARWIN is not only determining a static
probability distribution but keeps on adapting the distribution
throughout the fuzzing run based on coverage information.
Our approach, as depicted in Figure 1, comprises a well-
defined optimization module that does not need to expose
any parameters to the user of the fuzzer. In detail, a fuzzer
featuring DARWIN performs the following steps in the havoc
stage (each step is marked in Figure 1):
1) At the beginning of the havoc stage, the fuzzer selects
an input from the queue and randomly selects the next
mutation to apply. Initially, the probability distribution for
mutation selection is uniform.
2) After applying a mutation, the fuzzer decides whether it
should keep mutating this input or if the input should be
tested on the instrumented application.
3) After running the instrumented application with the se-
lected input, feedback is reported to assign a success score
to the test input and the DARWIN Mutation Scheduler.
The mutation scheduler learns based on the reported
feedback and optimizes the probability distribution using
DARWIN’s Evolution Strategy.
4) Finally, the updated probability distribution is applied
for the next iteration.
In the following, we explain the optimization process of
DARWIN’s Mutation Scheduler in more detail.
A. Metaheuristics and Mutation Scheduling
In the context of the complete fuzzing pipeline, we con-
centrate on improving the mutation scheduler, as illustrated in
Figure 1. The problem of finding a suitable mutation schedule
is considered here as an optimization problem, where the
candidate solution is a vector of relative mutation operator
probabilities. In a classical optimization scenario, a candidate
solution is refined through a series of iterations. In each
iteration, the candidate is evaluated, which is usually the
most time-consuming part of the optimization. Only after a
number of iterations, when a candidate of acceptable quality is
obtained, the solution is applied to the process being optimized.
In the case of fuzzing, however, the optimization is per-
formed concurrently with the process being optimized since
each candidate solution is used as it is being evaluated, and
the optimization is performed for each target independently.
Because of this, the optimization algorithm should be able to
provide a fast convergence, which means as large a perfor-
mance improvement with as few evaluations as possible.
As mentioned in Section II-C, metaheuristic techniques
balance between diversification and intensification, with con-
flicting goals to evade local optima and, at the same time,
enable convergence to better quality solutions. Population-
based metaheuristics, such as Genetic Algorithm (GA) [43] or
Particle Swarm Optimization (PSO) [55] are generally focused
on diversification and can locate an optimum with a greater
probability. However, as mentioned above, the optimum in
4
摘要:

DARWIN:SurvivaloftheFittestFuzzingMutatorsPatrickJauernig,DomagojJakobovicz,StjepanPicekx,EmmanuelStapfandAhmad-RezaSadeghiyTechnicalUniversityofDarmstadt,Germany,fpatrick.jauernig,emmanuel.stapfg@sanctuary.devyTechnicalUniversityofDarmstadt,Germany,ahmad.sadeghi@trust.tu-darmstadt.dezUniversityo...

展开>> 收起<<
DARWIN Survival of the Fittest Fuzzing Mutators Patrick Jauernig Domagoj Jakobovicz Stjepan Picekx Emmanuel Stapfand Ahmad-Reza Sadeghiy Technical University of Darmstadt Germany fpatrick.jauernig emmanuel.stapf gsanctuary.dev.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:1.38MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注