
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
1.3 Previous Work on Runtime Distributions
Before continuing, we proceed to report on related work regarding the analysis of runtime
distributions. We include here related work showing why knowledge of runtime distributions, as
we obtain it in this work, is immensely valuable.
The study [
FRV97
] presented empirical evidence for the fact that the distribution of the
effort (more precisely, the number of consistency checks) required for backtracking algorithms to
solve constraint satisfaction problems randomly generated at the 50 % satisfiable point can be
approximated by the Weibull distribution (in the satisfiable case) and the lognormal distribution
(in the unsatisfiable case). These results were later extended to a wider region around the
50 % satisfiable point [
RF97
]. It should be emphasized that this study created all instances
using the same generation model. This resulted in the creation of similar yet logically non-
equivalent formulas. We, however, firstly use different models to rule out any influence of the
generation model and secondly generate logically equivalent modifications of a base instance
(see Algorithm 1). This approach lends itself to the analysis of existing SLS solvers, like
GapSAT
.
The significant advantage is that the conducted work is not lost in the case of a restart: only
the logically equivalent instance could be changed while keeping the current assignment.
The runtime distributions of CDCL solvers was studies in [
KLW22
]. The authors empiri-
cally demonstrated that Weibull mixture distributions can accurately describe the multimodal
distributions found. They concluded that adding new clauses to a base instance has an inherent
effect of making runtimes long-tailed.
In [
GSCK00
], the cost profiles of combinatorial search procedures were studied. It was shown
that they are often characterized by Pareto-L´evy distributions and empirically demonstrated
how rapid randomized restarts can eliminate tail behavior. We, however, theoretically prove the
effectiveness of restarts for the larger class of long-tailed distributions.
The paper [
ATC13
] studied the solvers
Sparrow
and
CCASAT
and found that the lognormal
distribution is a good fit for the runtime distributions of randomly generated instances. For
this, the Kolmogorov–Smirnov statistic
supt∈R|ˆ
Fn
(
t
)
−F
(
t
)
|
was used. Although the KS-test
is very versatile, this comes with the disadvantage that its statistical power is relatively low.
The KS statistic is also nearly useless in the tails of a distribution: A high relative deviation
of the empirical from the theoretical cumulative distribution function in either tail results in
a very small absolute deviation. It should also be remarked that the paper studies only few
formulas in just two domains, ten randomly generated and nine crafted. Our work addressed
both shortcomings of this paper: The
χ2
-test gives equal importance to the goodness-of-fit
over the full support, and various instance domain models (both theoretical and applied) are
considered in this paper.
We want to stress the fact that studies on the runtime distribution of algorithms are quite
sparse, even though knowledge of the runtime distribution of an algorithm is extremely valuable:
•
Intuitively speaking, if the distribution is long-tailed, one knows there is a risk of ending
in the tail and experiencing very long runs; simultaneously, the knowledge that the time
the algorithm used thus far is in the tail of the distribution can be exploited to restart the
procedure (and create a new logically equivalent instance
F(2)
). We rigorously prove this
statement for all long-tailed algorithms.
•
Given the distribution of an algorithm’s sequential runtime, it was shown in [
ATC13
] how
to predict and quantify the algorithm’s expected speedup due to parallelization.
•
If the hardness distribution is known, experiments with a small number of instances can
lead to parameter estimations of the underlying distribution [FRV97].
5