Towards an Understanding of Long-Tailed Runtimes of SLS Algorithms Jan-Hendrik Lorenz

2025-05-06 0 0 1.76MB 50 页 10玖币

侵权投诉

Towards an Understanding of

Long-Tailed Runtimes of SLS Algorithms

Jan-Hendrik Lorenz and Florian W¨orz

Universit¨at Ulm, Germany

jan-hendrik.lorenz@alumni.uni-ulm.de,florian.woerz@uni-ulm.de

October 25, 2022

Abstract

The satisﬁability problem (SAT) is one of the most famous problems in computer science.

Traditionally, its NP-completeness has been used to argue that SAT is intractable. However,

there have been tremendous practical advances in recent years that allow modern SAT

solvers to solve instances with millions of variables and clauses. A particularly successful

paradigm in this context is stochastic local search (SLS).

In most cases, there are diﬀerent ways of formulating the underlying SAT problem.

While it is known that the precise formulation of the problem has a signiﬁcant impact on

the runtime of solvers, ﬁnding a helpful formulation is generally non-trivial. The recently

introduced

GapSAT

solver [Lorenz and W¨orz 2020] demonstrated a successful way to improve

the performance of an SLS solver on average by learning additional information which logically

entails from the original problem. Still, there were also cases in which the performance slightly

deteriorated. This justiﬁes in-depth investigations into how learning logical implications

aﬀects runtimes for SLS algorithms.

In this work, we propose a method for generating logically equivalent problem formulations,

generalizing the ideas of

GapSAT

. This method allows a rigorous mathematical study of the

eﬀect on the runtime of SLS SAT solvers. Initially, we conduct empirical investigations. If

the modiﬁcation process is treated as random, Johnson SB distributions provide a perfect

characterization of the hardness. Since the observed Johnson SB distributions approach

lognormal distributions, our analysis also suggests that the hardness is long-tailed.

As a second contribution, we theoretically prove that restarts are useful for long-tailed

distributions. This implies that incorporating additional restarts can further reﬁne all algo-

rithms employing above mentioned modiﬁcation technique.

Since the empirical studies compellingly suggest that the runtime distributions follow

Johnson SB distributions, we also investigate this property on a theoretical basis. We

succeed in proving that the runtimes for the special case of Sch¨oning’s random walk algo-

rithm [Sch¨oning 2002] are approximately Johnson SB distributed.

Keywords

Stochastic Local Search

Runtime Distribution

Statistical Analysis

Johnson SB

Distribution

Lognormal Distribution

Long-Tailed Distribution

Restarts

SAT Solving

Learned Clauses ·Logical Entailment

Previous Versions This is the full-length version of the paper in the ACM Journal

of Experimental Algorithmics (JEA). See the last section for a discussion.

arXiv:2210.13159v1 [cs.DS] 24 Oct 2022

Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨

orz

Contents

1 Introduction 3

1.1 Studying Runtime Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 OurResults............................................ 4

1.2.1 HardnessDistribution .................................. 4

1.2.2 Theoretical Arguments for the Hardness Distribution . . . . . . . . . . . . . . . . . 4

1.3 Previous Work on Runtime Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 OutlineofThisPaper ...................................... 6

2 Preliminaries 6

2.1 BasicNotation .......................................... 6

2.2 TheResolutionProofSystem .................................. 6

2.3 AShortProbabilityPrimer ................................... 7

2.4 ProbabilityDistributions..................................... 9

2.4.1 The Johnson SB Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 Lognormal Distributions as Embedded Model of Johnson SB Distributions . . . . 11

3 Evidence for Long-Tails in SLS Algorithms 11

3.1 Design of the Adjusted Logical Formula Algorithm Alfa . . . . . . . . . . . . . . . . . . . 12

3.2 Empirical Evaluation of the Hardness Distribution . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1

Experimental Setup, Instance Types, and Solvers Used to Obtain Hardness Distri-

butionData........................................ 13

3.2.2 Experimental Results and Statistical Evaluation of the Hardness Distribution . . . 14

3.3 Restarts Are Useful For Long-Tailed Distributions . . . . . . . . . . . . . . . . . . . . . . 18

4 Theoretical Justiﬁcations for the Johnson SB Conjecture 21

4.1 ProofOverview.......................................... 22

4.2 Glossary of Notations and Notational Conventions . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Analysis of the Runtime Distribution of the Algorithm . . . . . . . . . . . . . . . . . . . . 24

4.3.1 The Inﬁnite Case NeverSel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3.2 The Finite Case c < ∞................................. 31

4.3.3 CombiningBothCases.................................. 32

4.4 Analysis of the Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4.1 Distribution Analysis of the Random Variable P................... 33

4.4.2 Distribution Analysis of the Random Variable Q................... 35

4.4.3 Distribution Analysis of the Random Variable R................... 35

4.4.4 Concluding Remarks Regarding the Analysis of the Random Variables . . . . . . . 36

4.5 PuttingEverythingTogether .................................. 36

5 Conclusion 36

A The Finite Case 42

A.1 The Case Sel(c+ 1) = 1 of the First Factor D1in Line (19) ................ 44

A.2 The Case Sel(k) = 0 of the Factors in the Big Product of Line (19) ............ 45

A.3 PuttingTogetherBothCases .................................. 45

A.4 AllowingRestarts......................................... 46

B Connections Between Diﬀerent Distributions 46

B.1 Embedded Models: Johnson SB Approaches Lognormal . . . . . . . . . . . . . . . . . . . 46

B.2 A Proof of Lemma 54 ...................................... 47

B.3 A Proof of Lemma 55 ...................................... 48

Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨

orz

1 Introduction

The satisﬁability problem (SAT) asks to determine if a given propositional formula

has a

satisfying assignment or not. Since Cook’s

-completeness proof of the problem [

Coo71

], SAT

is believed to be computationally intractable in the worst case. However, in the ﬁeld of applied

SAT solving, there have been enormous improvements in the performance of SAT solvers in

the last 20 years. Motivated by these signiﬁcant improvements, SAT solvers have been applied

to an increasing number of areas, including bounded model checking [

BCC+03

CBRZ01

cryptology [EPV08], and even bioinformatics [LM06], to name just a few.

Stochastic local search (SLS) is an especially successful algorithmic paradigm that many SAT

solvers employ [

BHvMW09

, Chapter 6]: There are solvers solely based on the SLS paradigm,

e. g., the solvers

probSAT

[

BS12

dimetheus

[

BM16

], and

YalSAT

[

Bie14

]; SLS has been used

in parallel solvers, e. g.,

Plingeling

[

Bie17

]; and is nowadays even a standard component of

sequential conﬂict-driven clause learning (CDCL) solvers, for example of

ReasonLS

[

CZ18

CaDiCaL

[

BFFH20

], the

Relaxed∗

family of solvers [

CZ19

CZ21

Kissat

[

BFH21

], newer

versions of

CryptoMiniSat

[

SNC09

], and

MergeSat

[

Man21

]. In [

CZFB22

], Cai et al. tightly

integrated SLS with three CDCL solvers, which signiﬁcantly increased performance. The SLS

paradigm is furthermore frequently employed in solving MaxSAT (see e. g., [BBJM21]).

Broadly speaking, SLS solvers operate on complete assignments for a formula

. These

solvers are started with a randomly generated complete initial assignment

. If

satisﬁes

, a

solution is found. Otherwise, the SLS solver explores the neighborhood of the current assignment

by repeatedly ﬂipping the value of some variable in the assignment when this variable is chosen

according to some underlying heuristic (e. g., aiming to minimize the number of unsatisﬁed

clauses by the assignment). That is, these solvers perform a random walk over the set of complete

assignments for the underlying formula.1

The success of SLS solvers is demonstrated by

probSAT

[

BS12

dimetheus

[

BM16

], and

YalSAT

[

Bie17

], winning several gold medals in the random track of previous SAT competitions.

SLS algorithms are also of interest from a theoretical perspective. For example, Sch¨oning [

Sch02

]

describes an algorithm (called

SRWA

in the following) with an appealing worst-case guarantee.

Furthermore, we ﬁrmly believe that a better understanding of SLS will help in the design of

future CDCL–SLS hybrids.

1.1 Studying Runtime Distributions

Although SLS algorithms are highly successful in solving SAT instances, as witnessed by their

comparatively low mean runtime, they often show a high variation in the runtime required to

solve a ﬁxed instance over repeated runs. However, measures like the mean or the variance cannot

capture the long-tailed behavior of diﬃcult instances. Some authors (e. g., [

FRV97

GS97

RF97

])

thus shifted their focus to studying the runtime distributions of search algorithms, which helps

to understand these methods better and draw meaningful conclusions for the design of new

algorithms.

A relatively new algorithmic technique is considering modiﬁed versions of the input problem.

For example, in the mixed integer programming community, it is known that the performance

is sensitive to the used modiﬁcation [

LRV16

]. A similar approach is also employed in some

backtracking SAT solvers (known as CDCL solvers [

MS96

MMZ+01

]) that learn additional

In contrast to CDCL solvers and resolution, which are complete algorithms that can prove the unsatisﬁability

of a formula in a ﬁnite amount of steps, SLS solvers are incomplete, i. e., in general, they cannot output the

solution in a ﬁnite number of steps.

Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨

orz

information during their run. However, all successful SLS SAT solvers of the last decades work

on the original, unmodiﬁed instance.

In [

LW20

], the authors investigated the eﬀect of modifying the input instance for SLS SAT

solvers. More speciﬁcally, they changed the input instance by adding new, logically equivalent

clauses to the problem. For this, a new solver, called

GapSAT

, was introduced. This new solver

is based on

probSAT

and uses the addition of new clauses as a preprocessing step, thus, yielding

a terraformed landscape. A comprehensive experimental evaluation found statistical evidence

that the performance of

probSAT

substantially increased with this modiﬁcation technique.

However, the authors pointed to the fact that for some instances, the performance slightly

deteriorated when

probSAT

had access to these additional clauses, albeit all of them contained

useful information.

These experiments motivate to study the technique of adding new clauses in more detail.

In particular, it seems worthwhile to obtain a better understanding of the phenomenon that

adding new clauses improves the mean runtime, but there exist instances where adding clauses

can harm the performance of SLS.

Motivated by that, this work centers around studying the behavior of SLS solvers when these

solvers work on formulas that were extended by logical consequences of the initial formulation.

1.2 Our Results

1.2.1 Hardness Distribution

We study the runtime (or, more precisely, hardness) distribution of several SLS algorithms when

logical implications are added to an original formula. Central to all our investigations is the

basic elementary algorithm

Alfa

, that we introduce in this work. This algorithm is speciﬁcally

constructed in such a way that it is convenient to construct mathematical arguments after an

initial empirical analysis.

Our empirical evaluations suggest that the hardness distribution is long-tailed (called the

Weak Conjecture

). In fact, a stronger statement can be deduced: The data indicate that

the distribution follows a Johnson SB distribution (called the

Strong Conjecture

). We also

empirically show for our setting that this distribution converges to a lognormal distribution.

Since lognormal distributions are long-tailed, it is thus already established that if the Strong

Conjecture is true, the algorithm can be improved by restarts [

Lor18

]. We extend this result to

the case in which the Weak Conjecture is true: That is, we theoretically prove that restarts are

useful for the larger class of algorithms that exhibit a long-tailed distribution.

1.2.2 Theoretical Arguments for the Hardness Distribution

It should be highlighted how good the Johnson SB ﬁt is for the observed data. The distribution

describes both typical and exceedingly low or high values exceptionally accurately. Only a

marginal absolute and relative error between the ﬁts and the observations can be observed.

Moreover, this is true for all considered problem domains.

It is extraordinary that a simple parameterized distribution accurately describes the runtime

behavior of an entire group of algorithms (SLS solvers) on various domains. Since such behavior

is unlikely due to chance, we are pursuing theoretical explanations for this phenomenon. We

succeed in showing that the hardness distribution for the special case of Sch¨oning’s random

walk algorithm

SRWA

is indeed approximately Johnson SB distributed, conﬁrming the Strong

Conjecture in practice. To the best of our knowledge, there are no comparable works deriving

the runtime distribution for the full support.

Towards an Understanding of Long-Tailed Runtimes Jan-Hendrik Lorenz & Florian W¨

orz

1.3 Previous Work on Runtime Distributions

Before continuing, we proceed to report on related work regarding the analysis of runtime

distributions. We include here related work showing why knowledge of runtime distributions, as

we obtain it in this work, is immensely valuable.

The study [

FRV97

] presented empirical evidence for the fact that the distribution of the

eﬀort (more precisely, the number of consistency checks) required for backtracking algorithms to

solve constraint satisfaction problems randomly generated at the 50 % satisﬁable point can be

approximated by the Weibull distribution (in the satisﬁable case) and the lognormal distribution

(in the unsatisﬁable case). These results were later extended to a wider region around the

50 % satisﬁable point [

RF97

]. It should be emphasized that this study created all instances

using the same generation model. This resulted in the creation of similar yet logically non-

equivalent formulas. We, however, ﬁrstly use diﬀerent models to rule out any inﬂuence of the

generation model and secondly generate logically equivalent modiﬁcations of a base instance

(see Algorithm 1). This approach lends itself to the analysis of existing SLS solvers, like

GapSAT

The signiﬁcant advantage is that the conducted work is not lost in the case of a restart: only

the logically equivalent instance could be changed while keeping the current assignment.

The runtime distributions of CDCL solvers was studies in [

KLW22

]. The authors empiri-

cally demonstrated that Weibull mixture distributions can accurately describe the multimodal

distributions found. They concluded that adding new clauses to a base instance has an inherent

eﬀect of making runtimes long-tailed.

In [

GSCK00

], the cost proﬁles of combinatorial search procedures were studied. It was shown

that they are often characterized by Pareto-L´evy distributions and empirically demonstrated

how rapid randomized restarts can eliminate tail behavior. We, however, theoretically prove the

eﬀectiveness of restarts for the larger class of long-tailed distributions.

The paper [

ATC13

] studied the solvers

Sparrow

and

CCASAT

and found that the lognormal

distribution is a good ﬁt for the runtime distributions of randomly generated instances. For

this, the Kolmogorov–Smirnov statistic

supt∈R|ˆ

(

)

−F

(

)

was used. Although the KS-test

is very versatile, this comes with the disadvantage that its statistical power is relatively low.

The KS statistic is also nearly useless in the tails of a distribution: A high relative deviation

of the empirical from the theoretical cumulative distribution function in either tail results in

a very small absolute deviation. It should also be remarked that the paper studies only few

formulas in just two domains, ten randomly generated and nine crafted. Our work addressed

both shortcomings of this paper: The

χ2

-test gives equal importance to the goodness-of-ﬁt

over the full support, and various instance domain models (both theoretical and applied) are

considered in this paper.

We want to stress the fact that studies on the runtime distribution of algorithms are quite

sparse, even though knowledge of the runtime distribution of an algorithm is extremely valuable:

•

Intuitively speaking, if the distribution is long-tailed, one knows there is a risk of ending

in the tail and experiencing very long runs; simultaneously, the knowledge that the time

the algorithm used thus far is in the tail of the distribution can be exploited to restart the

procedure (and create a new logically equivalent instance

F(2)

). We rigorously prove this

statement for all long-tailed algorithms.

•

Given the distribution of an algorithm’s sequential runtime, it was shown in [

ATC13

] how

to predict and quantify the algorithm’s expected speedup due to parallelization.

•

If the hardness distribution is known, experiments with a small number of instances can

lead to parameter estimations of the underlying distribution [FRV97].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsanUnderstandingofLong-TailedRuntimesofSLSAlgorithmsJan-HendrikLorenzandFlorianWorzUniversitatUlm,Germanyjan-hendrik.lorenz@alumni.uni-ulm.de,florian.woerz@uni-ulm.deOctober25,2022AbstractThesatisabilityproblem(SAT)isoneofthemostfamousproblemsincomputerscience.Traditionally,itsNP-completene...

展开>> 收起<<

Towards an Understanding of Long-Tailed Runtimes of SLS Algorithms Jan-Hendrik Lorenz.pdf

共50页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards an Understanding of Long-Tailed Runtimes of SLS Algorithms Jan-Hendrik Lorenz

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: