A nearly optimal randomized algorithm for explorable heap selection Sander Borst1 Daniel Dadush1 Sophie Huiberts2 and Danish Kashaev1

2025-04-27 0 0 660.41KB 20 页 10玖币

侵权投诉

A nearly optimal randomized algorithm for explorable

heap selection

Sander Borst∗1, Daniel Dadush∗1, Sophie Huiberts2, and Danish Kashaev1

1Centrum Wiskunde & Informatica (CWI), Amsterdam

2Columbia University, New York

{sander.borst,dadush,danish.kashaev}@cwi.nl,sophie@huiberts.me

September 12, 2024

Abstract

Explorable heap selection is the problem of selecting the nth smallest value in a binary

heap. The key values can only be accessed by traversing through the underlying inﬁnite

binary tree, and the complexity of the algorithm is measured by the total distance traveled in

the tree (each edge has unit cost). This problem was originally proposed as a model to study

search strategies for the branch-and-bound algorithm with storage restrictions by Karp, Saks

and Widgerson (FOCS ’86), who gave deterministic and randomized n·exp(O(√log n))

time algorithms using O(log(n)2.5)and O(√log n)space respectively. We present a new

randomized algorithm with running time O(nlog(n)3)against an oblivious adversary using

O(log n)space, substantially improving the previous best randomized running time at the

expense of slightly increased space usage. We also show an Ω(log(n)n/ log(log(n))) lower

bound for any algorithm that solves the problem in the same amount of space, indicating

that our algorithm is nearly optimal.

1 Introduction

Many important problems in theoretical computer science are fundamentally search problems.

The objective of these problems is to ﬁnd a certain solution from the search space. In this

paper we analyze a search problem that we call explorable heap selection. The problem is related

to the famous branch-and-bound algorithm and was originally proposed by Karp, Saks and

Widgerson [KSW86] to model node selection for branch-and-bound with low space-complexity.

Furthermore, as we will explain later, the problem remains practically relevant to branch-and-

bound even in the full space setting.

The explorable heap selection problem1is an online graph exploration problem for an agent

on a rooted (possibly inﬁnite) binary tree. The nodes of the tree are labeled by distinct real

numbers (the key values) that increase along every path starting from the root. The tree can

thus be thought of as a min-heap. Starting at the root, the agent’s objective is to select the nth

smallest value in the tree while minimizing the distance traveled, where each edge of the tree has

∗This project has received funding from the European Research Council (ERC) under the European Union’s

Horizon 2020 research and innovation programme (grant agreement QIP–805241)

1[KSW86] did not give the problem a name, so we have attempted to give a descriptive one here.

arXiv:2210.05982v2 [cs.DS] 11 Sep 2024

unit travel cost. The key value of a node is only revealed when the agent visits it, and thus the

problem has an online nature. When the agent learns the key value of a node, it still does not

know the rank of this value.

The selection problem for ordinary heaps, which allow for random access (i.e., jumping to

arbitrary nodes in the tree for “free”), has also been studied. In this model, it was shown

by [Fre93] that selecting the nth minimum can be achieved deterministically in O(n)time using

O(n)workspace. We note that in both models, Ω(n)is a natural lower bound. This is because

verifying that a value Lis the nth minimum requires Θ(n)time – one must at least inspect the

nnodes with value at most L– which can be done via straightforward depth-ﬁrst search.

A simple selection strategy is to use the best-ﬁrst rule2, which repeatedly explores the un-

explored node whose parent has the smallest key value. While this rule is optimal in terms of

the number of nodes that it explores, namely Θ(n), the distance traveled by the agent can be

far from optimal. In the worst-case, an agent using this rule will need to travel a distance of

Θ(n2)to ﬁnd the nth smallest value. A simple bad example for this rule is to consider a rooted

tree consisting of two paths (which one can extend to a binary tree), where the two paths are

consecutively labeled by all positive even and odd integers respectively. Moreover, the space

complexity becomes Ω(n)in general when using the best-ﬁrst rule, because essentially all the

explored nodes might need to be kept in memory. We note that irrespective of computational

considerations on the agent, either in terms of working memory or running time restrictions,

minimizing the total travel distance in explorable heap selection remains a challenging online

problem.

Improving on the best-ﬁrst strategy, Karp, Saks and Wigderson [KSW86] gave a randomized

algorithm with expected cost n·exp(O(plog(n))) using O(plog(n)) working space. They also

showed how to make the algorithm deterministic using O(log(n)2.5)space. In this work, our

main contribution is an improved randomized algorithm with expected cost O(nlog(n)3)using

O(log(n)) space. Given the Ω(n)lower bound, our travel cost is optimal up to logarithmic factors.

Furthermore we show that any algorithm for explorable heap selection that uses only sunits of

memory, must take at least n·logs(n)time in expectation. An interesting open problem is the

question whether a superlinear lower bound also holds without any restriction on the memory

usage.

To clarify the memory model, it is assumed that any key value and O(log n)bit integer can be

stored using O(1) space. We also assume that maintaining the current position in the tree does

not take up memory. Furthermore, we assume that key value comparisons and moving across

an edge of the tree require O(1) time. Under these assumptions, the running times of the above

algorithms happen to be proportional to their travel cost. Throughout the paper, we will thus

use travel cost and running time interchangeably.

Motivation The motivation to look at this problem comes from the branch-and-bound algo-

rithm. This is a well-known algorithm that can be used for solving many types of problems.

In particular, it is often used to solve integer linear programs (IPs), which are of the form

arg min{c⊤x:x∈Zn, Ax ≤b}. In that setting, branch-and-bound works by ﬁrst solving the

linear programming (LP) relaxation, which does not have integrality constraints. The value of

the solution to the relaxation forms a lower bound on the objective value of the original prob-

lem. Moreover, if this solution only has integral components, it is also optimal for the original

problem. Otherwise, the algorithm chooses a component xifor which the solution value ˆxiis

not integral. It then creates two new subproblems, by either adding the constraint xi≤ ⌊ˆxi⌋or

xi≥ ⌈ˆxi⌉. This operation is called branching. The tree of subproblems, in which the children of

2Frederickson’s algorithm [Fre93] is in fact a highly optimized variant of this rule

a problem are created by the branching operation, is called the branch-and-bound tree. Because

a subproblem contains more constraints than its parent, its objective value is greater or equal

to the one of its parent. The algorithm can also be used to solve mixed-integer linear programs

(MIPs), where some of the variables are allowed to be continuous.

At the core, the algorithm consists of two important components: the branching rule and the

node selection rule. The branching rule determines how to split up a problem into subproblems,

by choosing a variable to branch on. Substantial research has been done on branching rules, see,

e.g., [LS99, AKM05, LZ17, BDSV18].

The node selection rule decides which subproblem to solve next. Not much theoretical research

has been done on the choice of the node selection rule. Traditionally, the best-ﬁrst strategy is

thought to be optimal from a theoretical perspective because this rule minimizes the number

of nodes that need to be visited. However, a disadvantage of this rule is that searches using it

might use space proportional to the number of explored nodes, because all of them need to be

kept in memory. In contrast to this, a simple strategy like depth-ﬁrst search only needs to store

the current solution. Unfortunately, performing a depth-ﬁrst search can lead to an arbitrarily

bad running time. This was the original motivation for introducing the explorable heap selection

problem [KSW86]. By guessing the number Nof branch-and-bound nodes whose LP values are

at most that of the optimal IP solution (which can be done via successive doubling), a search

strategy for this problem can be directly interpreted as a node selection rule. The algorithm

that they introduced can therefore be used to implement branch-and-bound eﬃciently in only

Oplog(N)space.

Nowadays, computers have a lot of memory available. This usually makes it feasible to store

all explored nodes of the branch-and-bound tree in memory. However, many MIP-solvers still

make use of a hybrid method that consists of both depth-ﬁrst and best-ﬁrst searches. This is

not only done because depth-ﬁrst search uses less memory, but also because it is often faster.

Experimental studies have conﬁrmed that the depth-ﬁrst strategy is in many cases faster than

best-ﬁrst one [CP99]. This seems contradictory, because the running time of best-ﬁrst search is

often thought to be theoretically optimal.

In part, this contradiction can be explained by the fact that actual IP-solvers often employ

complementary techniques and heuristics on top of branch-and-bound, which might beneﬁt from

depth-ﬁrst searches. Additionally, a best-ﬁrst search can hop between diﬀerent parts of the tree,

while a depth ﬁrst search subsequently explores nodes that are very close to each other. In the

latter case, the LP-solver can start from a very similar state, which is known as warm starting.

This is faster for a variety of technical reasons [Ach09]. For example, this can be the case when

the LP-solver makes use of the LU-factorization of the optimal basis matrix [MJSS16]. Through

the use of dynamic algorithms, computing this can be done faster if a factorization for a similar

LP-basis is known [SS93]. Because of its large size, MIP-solvers will often not store the LU-

factorization for all nodes in the tree. This makes it beneﬁcial to move between similar nodes

in the branch-and-bound tree. Furthermore, moving from one part of the tree to another means

that the solver needs to undo and redo many bound changes, which also takes up time. Hence,

the amount of distance traveled between nodes in the tree is a metric that inﬂuences the running

time. This can also be observed when running the academic MIP-solver SCIP [Gle22].

The explorable heap selection problem captures these beneﬁts of locality by measuring the

running time in terms of the amount of travel through the tree. Therefore, we argue that this

problem is still relevant for the choice of a node selection rule, even if all nodes can be stored in

memory.

Related work The explorable heap selection problem was ﬁrst introduced in [KSW86]. Their

result was later applied to prove an upper bound on the parallel running time of branch-and-

bound [PPSV15].

When random access to the heap is provided at constant cost, selecting the n’th value in the

heap can be done by a deterministic algorithm in O(n)time by using an additional O(n)memory

for auxilliary data structures [Fre93].

The explorable heap selection problem can be thought of as a search game [AG03] and

bears some similarity to the cow path problem. In the cow path problem, an agent explores

an unweighted unlabeled graph in search of a target node. The location of the target node is

unknown, but when the agent visits a node they are told whether or not that node is the target.

The performance of an algorithm is judged by the ratio of the number of visited nodes to the

distance of the target from the agent’s starting point. In both the cow path problem and the

explorable heap selection problem, the cost of backtracking and retracing paths is an important

consideration. The cow path problem on inﬁnite b-ary trees was studied in [DCD95] under the

assumption that when present at a node the agent can obtain an estimate on that node’s distance

to the target.

Other explorable graph problems exist without a target, where typically the graph itself is

unknown at the outset. There is an extensive literature on exploration both in graphs and in

the plane [Ber98, Tho06]. In some of the used models the objective is to minimize the distance

traveled [BCGL23, BDHS23, MMS12, KP94]. Other models are about minimizing the amount

of used memory [DFKP04]. What distinguises the explorable heap selection problem from these

problems is the information that the graph is a heap and that the ordinal of the target is known.

This can allow an algorithm to rule out certain locations for the target. Because of this additional

information, the techniques used here do not seem to be applicable to these other problems.

Outline In Section 2 we formally introduce the explorable heap selection problem and any

notation we will use. In Section 3 we introduce a new algorithm for solving this problem and

provide a running time analysis. In Section 4 we give a lower bound on the complexity of solving

explorable heap selection using a limited amount of memory.

2 The explorable heap selection problem

We introduce in this section the formal model for the explorable heap selection problem. The

input to the algorithm is an inﬁnite binary tree T= (V, E), where each node v∈Vhas an asso-

ciated real value, denoted by val(v)∈R. We assume that all the values are distinct. Moreover,

for each node in the tree, the values of its children are larger than its own value. Hence, for every

v1, v2∈Vsuch that v1is an ancestor of v2, we have that val(v2)>val(v1). The binary tree T

is thus a heap.

The algorithmic problem we are interested in is ﬁnding the nth smallest value in this tree.

This may be seen as an online graph exploration problem where an agent can move in the tree

and learns the value of a node each time they explore it. At each time step, the agent resides at

a vertex v∈Vand may decide to move to either the left child, the right child or the parent of

v(if it exists, i.e. if vis not the root of the tree). Each traversal of an edge costs one unit of

time, and the complexity of an algorithm for this problem is thus measured by the total traveled

distance in the binary tree. The algorithm is also allowed to store values in memory.

We now introduce a few notations used throughout the paper.

•For a node v∈V, also per abuse of notation written v∈T, we denote by T(v)the subtree

of Trooted at v.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnearlyoptimalrandomizedalgorithmforexplorableheapselectionSanderBorst∗1,DanielDadush∗1,SophieHuiberts2,andDanishKashaev11CentrumWiskunde&Informatica(CWI),Amsterdam2ColumbiaUniversity,NewYork{sander.borst,dadush,danish.kashaev}@cwi.nl,sophie@huiberts.meSeptember12,2024AbstractExplorableheapselection...

展开>> 收起<<

A nearly optimal randomized algorithm for explorable heap selection Sander Borst1 Daniel Dadush1 Sophie Huiberts2 and Danish Kashaev1.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A nearly optimal randomized algorithm for explorable heap selection Sander Borst1 Daniel Dadush1 Sophie Huiberts2 and Danish Kashaev1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: