Towards an Understanding of Long-Tailed Runtimes of SLS Algorithms Jan-Hendrik Lorenz

2025-05-06 0 0 1.76MB 50 页 10玖币
侵权投诉
Towards an Understanding of
Long-Tailed Runtimes of SLS Algorithms
Jan-Hendrik Lorenz and Florian W¨orz
Universit¨at Ulm, Germany
jan-hendrik.lorenz@alumni.uni-ulm.de,florian.woerz@uni-ulm.de
October 25, 2022
Abstract
The satisfiability problem (SAT) is one of the most famous problems in computer science.
Traditionally, its NP-completeness has been used to argue that SAT is intractable. However,
there have been tremendous practical advances in recent years that allow modern SAT
solvers to solve instances with millions of variables and clauses. A particularly successful
paradigm in this context is stochastic local search (SLS).
In most cases, there are different ways of formulating the underlying SAT problem.
While it is known that the precise formulation of the problem has a significant impact on
the runtime of solvers, finding a helpful formulation is generally non-trivial. The recently
introduced
GapSAT
solver [Lorenz and W¨orz 2020] demonstrated a successful way to improve
the performance of an SLS solver on average by learning additional information which logically
entails from the original problem. Still, there were also cases in which the performance slightly
deteriorated. This justifies in-depth investigations into how learning logical implications
affects runtimes for SLS algorithms.
In this work, we propose a method for generating logically equivalent problem formulations,
generalizing the ideas of
GapSAT
. This method allows a rigorous mathematical study of the
effect on the runtime of SLS SAT solvers. Initially, we conduct empirical investigations. If
the modification process is treated as random, Johnson SB distributions provide a perfect
characterization of the hardness. Since the observed Johnson SB distributions approach
lognormal distributions, our analysis also suggests that the hardness is long-tailed.
As a second contribution, we theoretically prove that restarts are useful for long-tailed
distributions. This implies that incorporating additional restarts can further refine all algo-
rithms employing above mentioned modification technique.
Since the empirical studies compellingly suggest that the runtime distributions follow
Johnson SB distributions, we also investigate this property on a theoretical basis. We
succeed in proving that the runtimes for the special case of Sch¨oning’s random walk algo-
rithm [Sconing 2002] are approximately Johnson SB distributed.
Keywords
Stochastic Local Search
·
Runtime Distribution
·
Statistical Analysis
·
Johnson SB
Distribution
·
Lognormal Distribution
·
Long-Tailed Distribution
·
Restarts
·
SAT Solving
·
Learned Clauses ·Logical Entailment
Previous Versions This is the full-length version of the paper in the ACM Journal
of Experimental Algorithmics (JEA). See the last section for a discussion.
1
arXiv:2210.13159v1 [cs.DS] 24 Oct 2022
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
Contents
1 Introduction 3
1.1 Studying Runtime Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 OurResults............................................ 4
1.2.1 HardnessDistribution .................................. 4
1.2.2 Theoretical Arguments for the Hardness Distribution . . . . . . . . . . . . . . . . . 4
1.3 Previous Work on Runtime Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 OutlineofThisPaper ...................................... 6
2 Preliminaries 6
2.1 BasicNotation .......................................... 6
2.2 TheResolutionProofSystem .................................. 6
2.3 AShortProbabilityPrimer ................................... 7
2.4 ProbabilityDistributions..................................... 9
2.4.1 The Johnson SB Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Lognormal Distributions as Embedded Model of Johnson SB Distributions . . . . 11
3 Evidence for Long-Tails in SLS Algorithms 11
3.1 Design of the Adjusted Logical Formula Algorithm Alfa . . . . . . . . . . . . . . . . . . . 12
3.2 Empirical Evaluation of the Hardness Distribution . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1
Experimental Setup, Instance Types, and Solvers Used to Obtain Hardness Distri-
butionData........................................ 13
3.2.2 Experimental Results and Statistical Evaluation of the Hardness Distribution . . . 14
3.3 Restarts Are Useful For Long-Tailed Distributions . . . . . . . . . . . . . . . . . . . . . . 18
4 Theoretical Justifications for the Johnson SB Conjecture 21
4.1 ProofOverview.......................................... 22
4.2 Glossary of Notations and Notational Conventions . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Analysis of the Runtime Distribution of the Algorithm . . . . . . . . . . . . . . . . . . . . 24
4.3.1 The Infinite Case NeverSel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.2 The Finite Case c < ................................. 31
4.3.3 CombiningBothCases.................................. 32
4.4 Analysis of the Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.1 Distribution Analysis of the Random Variable P................... 33
4.4.2 Distribution Analysis of the Random Variable Q................... 35
4.4.3 Distribution Analysis of the Random Variable R................... 35
4.4.4 Concluding Remarks Regarding the Analysis of the Random Variables . . . . . . . 36
4.5 PuttingEverythingTogether .................................. 36
5 Conclusion 36
A The Finite Case 42
A.1 The Case Sel(c+ 1) = 1 of the First Factor D1in Line (19) ................ 44
A.2 The Case Sel(k) = 0 of the Factors in the Big Product of Line (19) ............ 45
A.3 PuttingTogetherBothCases .................................. 45
A.4 AllowingRestarts......................................... 46
B Connections Between Different Distributions 46
B.1 Embedded Models: Johnson SB Approaches Lognormal . . . . . . . . . . . . . . . . . . . 46
B.2 A Proof of Lemma 54 ...................................... 47
B.3 A Proof of Lemma 55 ...................................... 48
2
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
1 Introduction
The satisfiability problem (SAT) asks to determine if a given propositional formula
F
has a
satisfying assignment or not. Since Cook’s
NP
-completeness proof of the problem [
Coo71
], SAT
is believed to be computationally intractable in the worst case. However, in the field of applied
SAT solving, there have been enormous improvements in the performance of SAT solvers in
the last 20 years. Motivated by these significant improvements, SAT solvers have been applied
to an increasing number of areas, including bounded model checking [
BCC+03
,
CBRZ01
],
cryptology [EPV08], and even bioinformatics [LM06], to name just a few.
Stochastic local search (SLS) is an especially successful algorithmic paradigm that many SAT
solvers employ [
BHvMW09
, Chapter 6]: There are solvers solely based on the SLS paradigm,
e. g., the solvers
probSAT
[
BS12
],
dimetheus
[
BM16
], and
YalSAT
[
Bie14
]; SLS has been used
in parallel solvers, e. g.,
Plingeling
[
Bie17
]; and is nowadays even a standard component of
sequential conflict-driven clause learning (CDCL) solvers, for example of
ReasonLS
[
CZ18
],
CaDiCaL
[
BFFH20
], the
Relaxed
family of solvers [
CZ19
,
CZ21
],
Kissat
[
BFH21
], newer
versions of
CryptoMiniSat
[
SNC09
], and
MergeSat
[
Man21
]. In [
CZFB22
], Cai et al. tightly
integrated SLS with three CDCL solvers, which significantly increased performance. The SLS
paradigm is furthermore frequently employed in solving MaxSAT (see e. g., [BBJM21]).
Broadly speaking, SLS solvers operate on complete assignments for a formula
F
. These
solvers are started with a randomly generated complete initial assignment
α
. If
α
satisfies
F
, a
solution is found. Otherwise, the SLS solver explores the neighborhood of the current assignment
by repeatedly flipping the value of some variable in the assignment when this variable is chosen
according to some underlying heuristic (e. g., aiming to minimize the number of unsatisfied
clauses by the assignment). That is, these solvers perform a random walk over the set of complete
assignments for the underlying formula.1
The success of SLS solvers is demonstrated by
probSAT
[
BS12
],
dimetheus
[
BM16
], and
YalSAT
[
Bie17
], winning several gold medals in the random track of previous SAT competitions.
SLS algorithms are also of interest from a theoretical perspective. For example, Sconing [
Sch02
]
describes an algorithm (called
SRWA
in the following) with an appealing worst-case guarantee.
Furthermore, we firmly believe that a better understanding of SLS will help in the design of
future CDCL–SLS hybrids.
1.1 Studying Runtime Distributions
Although SLS algorithms are highly successful in solving SAT instances, as witnessed by their
comparatively low mean runtime, they often show a high variation in the runtime required to
solve a fixed instance over repeated runs. However, measures like the mean or the variance cannot
capture the long-tailed behavior of difficult instances. Some authors (e. g., [
FRV97
,
GS97
,
RF97
])
thus shifted their focus to studying the runtime distributions of search algorithms, which helps
to understand these methods better and draw meaningful conclusions for the design of new
algorithms.
A relatively new algorithmic technique is considering modified versions of the input problem.
For example, in the mixed integer programming community, it is known that the performance
is sensitive to the used modification [
LRV16
]. A similar approach is also employed in some
backtracking SAT solvers (known as CDCL solvers [
MS96
,
MMZ+01
]) that learn additional
1
In contrast to CDCL solvers and resolution, which are complete algorithms that can prove the unsatisfiability
of a formula in a finite amount of steps, SLS solvers are incomplete, i. e., in general, they cannot output the
solution in a finite number of steps.
3
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
information during their run. However, all successful SLS SAT solvers of the last decades work
on the original, unmodified instance.
In [
LW20
], the authors investigated the effect of modifying the input instance for SLS SAT
solvers. More specifically, they changed the input instance by adding new, logically equivalent
clauses to the problem. For this, a new solver, called
GapSAT
, was introduced. This new solver
is based on
probSAT
and uses the addition of new clauses as a preprocessing step, thus, yielding
a terraformed landscape. A comprehensive experimental evaluation found statistical evidence
that the performance of
probSAT
substantially increased with this modification technique.
However, the authors pointed to the fact that for some instances, the performance slightly
deteriorated when
probSAT
had access to these additional clauses, albeit all of them contained
useful information.
These experiments motivate to study the technique of adding new clauses in more detail.
In particular, it seems worthwhile to obtain a better understanding of the phenomenon that
adding new clauses improves the mean runtime, but there exist instances where adding clauses
can harm the performance of SLS.
Motivated by that, this work centers around studying the behavior of SLS solvers when these
solvers work on formulas that were extended by logical consequences of the initial formulation.
1.2 Our Results
1.2.1 Hardness Distribution
We study the runtime (or, more precisely, hardness) distribution of several SLS algorithms when
logical implications are added to an original formula. Central to all our investigations is the
basic elementary algorithm
Alfa
, that we introduce in this work. This algorithm is specifically
constructed in such a way that it is convenient to construct mathematical arguments after an
initial empirical analysis.
Our empirical evaluations suggest that the hardness distribution is long-tailed (called the
Weak Conjecture
). In fact, a stronger statement can be deduced: The data indicate that
the distribution follows a Johnson SB distribution (called the
Strong Conjecture
). We also
empirically show for our setting that this distribution converges to a lognormal distribution.
Since lognormal distributions are long-tailed, it is thus already established that if the Strong
Conjecture is true, the algorithm can be improved by restarts [
Lor18
]. We extend this result to
the case in which the Weak Conjecture is true: That is, we theoretically prove that restarts are
useful for the larger class of algorithms that exhibit a long-tailed distribution.
1.2.2 Theoretical Arguments for the Hardness Distribution
It should be highlighted how good the Johnson SB fit is for the observed data. The distribution
describes both typical and exceedingly low or high values exceptionally accurately. Only a
marginal absolute and relative error between the fits and the observations can be observed.
Moreover, this is true for all considered problem domains.
It is extraordinary that a simple parameterized distribution accurately describes the runtime
behavior of an entire group of algorithms (SLS solvers) on various domains. Since such behavior
is unlikely due to chance, we are pursuing theoretical explanations for this phenomenon. We
succeed in showing that the hardness distribution for the special case of Sconing’s random
walk algorithm
SRWA
is indeed approximately Johnson SB distributed, confirming the Strong
Conjecture in practice. To the best of our knowledge, there are no comparable works deriving
the runtime distribution for the full support.
4
Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨
orz
1.3 Previous Work on Runtime Distributions
Before continuing, we proceed to report on related work regarding the analysis of runtime
distributions. We include here related work showing why knowledge of runtime distributions, as
we obtain it in this work, is immensely valuable.
The study [
FRV97
] presented empirical evidence for the fact that the distribution of the
effort (more precisely, the number of consistency checks) required for backtracking algorithms to
solve constraint satisfaction problems randomly generated at the 50 % satisfiable point can be
approximated by the Weibull distribution (in the satisfiable case) and the lognormal distribution
(in the unsatisfiable case). These results were later extended to a wider region around the
50 % satisfiable point [
RF97
]. It should be emphasized that this study created all instances
using the same generation model. This resulted in the creation of similar yet logically non-
equivalent formulas. We, however, firstly use different models to rule out any influence of the
generation model and secondly generate logically equivalent modifications of a base instance
(see Algorithm 1). This approach lends itself to the analysis of existing SLS solvers, like
GapSAT
.
The significant advantage is that the conducted work is not lost in the case of a restart: only
the logically equivalent instance could be changed while keeping the current assignment.
The runtime distributions of CDCL solvers was studies in [
KLW22
]. The authors empiri-
cally demonstrated that Weibull mixture distributions can accurately describe the multimodal
distributions found. They concluded that adding new clauses to a base instance has an inherent
effect of making runtimes long-tailed.
In [
GSCK00
], the cost profiles of combinatorial search procedures were studied. It was shown
that they are often characterized by Pareto-L´evy distributions and empirically demonstrated
how rapid randomized restarts can eliminate tail behavior. We, however, theoretically prove the
effectiveness of restarts for the larger class of long-tailed distributions.
The paper [
ATC13
] studied the solvers
Sparrow
and
CCASAT
and found that the lognormal
distribution is a good fit for the runtime distributions of randomly generated instances. For
this, the Kolmogorov–Smirnov statistic
suptR|ˆ
Fn
(
t
)
F
(
t
)
|
was used. Although the KS-test
is very versatile, this comes with the disadvantage that its statistical power is relatively low.
The KS statistic is also nearly useless in the tails of a distribution: A high relative deviation
of the empirical from the theoretical cumulative distribution function in either tail results in
a very small absolute deviation. It should also be remarked that the paper studies only few
formulas in just two domains, ten randomly generated and nine crafted. Our work addressed
both shortcomings of this paper: The
χ2
-test gives equal importance to the goodness-of-fit
over the full support, and various instance domain models (both theoretical and applied) are
considered in this paper.
We want to stress the fact that studies on the runtime distribution of algorithms are quite
sparse, even though knowledge of the runtime distribution of an algorithm is extremely valuable:
Intuitively speaking, if the distribution is long-tailed, one knows there is a risk of ending
in the tail and experiencing very long runs; simultaneously, the knowledge that the time
the algorithm used thus far is in the tail of the distribution can be exploited to restart the
procedure (and create a new logically equivalent instance
F(2)
). We rigorously prove this
statement for all long-tailed algorithms.
Given the distribution of an algorithm’s sequential runtime, it was shown in [
ATC13
] how
to predict and quantify the algorithm’s expected speedup due to parallelization.
If the hardness distribution is known, experiments with a small number of instances can
lead to parameter estimations of the underlying distribution [FRV97].
5
摘要:

TowardsanUnderstandingofLong-TailedRuntimesofSLSAlgorithmsJan-HendrikLorenzandFlorianWorzUniversitatUlm,Germanyjan-hendrik.lorenz@alumni.uni-ulm.de,florian.woerz@uni-ulm.deOctober25,2022AbstractThesatis abilityproblem(SAT)isoneofthemostfamousproblemsincomputerscience.Traditionally,itsNP-completene...

展开>> 收起<<
Towards an Understanding of Long-Tailed Runtimes of SLS Algorithms Jan-Hendrik Lorenz.pdf

共50页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:50 页 大小:1.76MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 50
客服
关注