A nearly optimal randomized algorithm for explorable heap selection Sander Borst1 Daniel Dadush1 Sophie Huiberts2 and Danish Kashaev1

2025-04-27 0 0 660.41KB 20 页 10玖币
侵权投诉
A nearly optimal randomized algorithm for explorable
heap selection
Sander Borst1, Daniel Dadush1, Sophie Huiberts2, and Danish Kashaev1
1Centrum Wiskunde & Informatica (CWI), Amsterdam
2Columbia University, New York
{sander.borst,dadush,danish.kashaev}@cwi.nl,sophie@huiberts.me
September 12, 2024
Abstract
Explorable heap selection is the problem of selecting the nth smallest value in a binary
heap. The key values can only be accessed by traversing through the underlying infinite
binary tree, and the complexity of the algorithm is measured by the total distance traveled in
the tree (each edge has unit cost). This problem was originally proposed as a model to study
search strategies for the branch-and-bound algorithm with storage restrictions by Karp, Saks
and Widgerson (FOCS ’86), who gave deterministic and randomized n·exp(O(log n))
time algorithms using O(log(n)2.5)and O(log n)space respectively. We present a new
randomized algorithm with running time O(nlog(n)3)against an oblivious adversary using
O(log n)space, substantially improving the previous best randomized running time at the
expense of slightly increased space usage. We also show an Ω(log(n)n/ log(log(n))) lower
bound for any algorithm that solves the problem in the same amount of space, indicating
that our algorithm is nearly optimal.
1 Introduction
Many important problems in theoretical computer science are fundamentally search problems.
The objective of these problems is to find a certain solution from the search space. In this
paper we analyze a search problem that we call explorable heap selection. The problem is related
to the famous branch-and-bound algorithm and was originally proposed by Karp, Saks and
Widgerson [KSW86] to model node selection for branch-and-bound with low space-complexity.
Furthermore, as we will explain later, the problem remains practically relevant to branch-and-
bound even in the full space setting.
The explorable heap selection problem1is an online graph exploration problem for an agent
on a rooted (possibly infinite) binary tree. The nodes of the tree are labeled by distinct real
numbers (the key values) that increase along every path starting from the root. The tree can
thus be thought of as a min-heap. Starting at the root, the agent’s objective is to select the nth
smallest value in the tree while minimizing the distance traveled, where each edge of the tree has
This project has received funding from the European Research Council (ERC) under the European Union’s
Horizon 2020 research and innovation programme (grant agreement QIP–805241)
1[KSW86] did not give the problem a name, so we have attempted to give a descriptive one here.
1
arXiv:2210.05982v2 [cs.DS] 11 Sep 2024
unit travel cost. The key value of a node is only revealed when the agent visits it, and thus the
problem has an online nature. When the agent learns the key value of a node, it still does not
know the rank of this value.
The selection problem for ordinary heaps, which allow for random access (i.e., jumping to
arbitrary nodes in the tree for “free”), has also been studied. In this model, it was shown
by [Fre93] that selecting the nth minimum can be achieved deterministically in O(n)time using
O(n)workspace. We note that in both models, Ω(n)is a natural lower bound. This is because
verifying that a value Lis the nth minimum requires Θ(n)time – one must at least inspect the
nnodes with value at most L– which can be done via straightforward depth-first search.
A simple selection strategy is to use the best-first rule2, which repeatedly explores the un-
explored node whose parent has the smallest key value. While this rule is optimal in terms of
the number of nodes that it explores, namely Θ(n), the distance traveled by the agent can be
far from optimal. In the worst-case, an agent using this rule will need to travel a distance of
Θ(n2)to find the nth smallest value. A simple bad example for this rule is to consider a rooted
tree consisting of two paths (which one can extend to a binary tree), where the two paths are
consecutively labeled by all positive even and odd integers respectively. Moreover, the space
complexity becomes Ω(n)in general when using the best-first rule, because essentially all the
explored nodes might need to be kept in memory. We note that irrespective of computational
considerations on the agent, either in terms of working memory or running time restrictions,
minimizing the total travel distance in explorable heap selection remains a challenging online
problem.
Improving on the best-first strategy, Karp, Saks and Wigderson [KSW86] gave a randomized
algorithm with expected cost n·exp(O(plog(n))) using O(plog(n)) working space. They also
showed how to make the algorithm deterministic using O(log(n)2.5)space. In this work, our
main contribution is an improved randomized algorithm with expected cost O(nlog(n)3)using
O(log(n)) space. Given the Ω(n)lower bound, our travel cost is optimal up to logarithmic factors.
Furthermore we show that any algorithm for explorable heap selection that uses only sunits of
memory, must take at least n·logs(n)time in expectation. An interesting open problem is the
question whether a superlinear lower bound also holds without any restriction on the memory
usage.
To clarify the memory model, it is assumed that any key value and O(log n)bit integer can be
stored using O(1) space. We also assume that maintaining the current position in the tree does
not take up memory. Furthermore, we assume that key value comparisons and moving across
an edge of the tree require O(1) time. Under these assumptions, the running times of the above
algorithms happen to be proportional to their travel cost. Throughout the paper, we will thus
use travel cost and running time interchangeably.
Motivation The motivation to look at this problem comes from the branch-and-bound algo-
rithm. This is a well-known algorithm that can be used for solving many types of problems.
In particular, it is often used to solve integer linear programs (IPs), which are of the form
arg min{cx:xZn, Ax b}. In that setting, branch-and-bound works by first solving the
linear programming (LP) relaxation, which does not have integrality constraints. The value of
the solution to the relaxation forms a lower bound on the objective value of the original prob-
lem. Moreover, if this solution only has integral components, it is also optimal for the original
problem. Otherwise, the algorithm chooses a component xifor which the solution value ˆxiis
not integral. It then creates two new subproblems, by either adding the constraint xi≤ ⌊ˆxior
xi≥ ⌈ˆxi. This operation is called branching. The tree of subproblems, in which the children of
2Frederickson’s algorithm [Fre93] is in fact a highly optimized variant of this rule
2
a problem are created by the branching operation, is called the branch-and-bound tree. Because
a subproblem contains more constraints than its parent, its objective value is greater or equal
to the one of its parent. The algorithm can also be used to solve mixed-integer linear programs
(MIPs), where some of the variables are allowed to be continuous.
At the core, the algorithm consists of two important components: the branching rule and the
node selection rule. The branching rule determines how to split up a problem into subproblems,
by choosing a variable to branch on. Substantial research has been done on branching rules, see,
e.g., [LS99, AKM05, LZ17, BDSV18].
The node selection rule decides which subproblem to solve next. Not much theoretical research
has been done on the choice of the node selection rule. Traditionally, the best-first strategy is
thought to be optimal from a theoretical perspective because this rule minimizes the number
of nodes that need to be visited. However, a disadvantage of this rule is that searches using it
might use space proportional to the number of explored nodes, because all of them need to be
kept in memory. In contrast to this, a simple strategy like depth-first search only needs to store
the current solution. Unfortunately, performing a depth-first search can lead to an arbitrarily
bad running time. This was the original motivation for introducing the explorable heap selection
problem [KSW86]. By guessing the number Nof branch-and-bound nodes whose LP values are
at most that of the optimal IP solution (which can be done via successive doubling), a search
strategy for this problem can be directly interpreted as a node selection rule. The algorithm
that they introduced can therefore be used to implement branch-and-bound efficiently in only
Oplog(N)space.
Nowadays, computers have a lot of memory available. This usually makes it feasible to store
all explored nodes of the branch-and-bound tree in memory. However, many MIP-solvers still
make use of a hybrid method that consists of both depth-first and best-first searches. This is
not only done because depth-first search uses less memory, but also because it is often faster.
Experimental studies have confirmed that the depth-first strategy is in many cases faster than
best-first one [CP99]. This seems contradictory, because the running time of best-first search is
often thought to be theoretically optimal.
In part, this contradiction can be explained by the fact that actual IP-solvers often employ
complementary techniques and heuristics on top of branch-and-bound, which might benefit from
depth-first searches. Additionally, a best-first search can hop between different parts of the tree,
while a depth first search subsequently explores nodes that are very close to each other. In the
latter case, the LP-solver can start from a very similar state, which is known as warm starting.
This is faster for a variety of technical reasons [Ach09]. For example, this can be the case when
the LP-solver makes use of the LU-factorization of the optimal basis matrix [MJSS16]. Through
the use of dynamic algorithms, computing this can be done faster if a factorization for a similar
LP-basis is known [SS93]. Because of its large size, MIP-solvers will often not store the LU-
factorization for all nodes in the tree. This makes it beneficial to move between similar nodes
in the branch-and-bound tree. Furthermore, moving from one part of the tree to another means
that the solver needs to undo and redo many bound changes, which also takes up time. Hence,
the amount of distance traveled between nodes in the tree is a metric that influences the running
time. This can also be observed when running the academic MIP-solver SCIP [Gle22].
The explorable heap selection problem captures these benefits of locality by measuring the
running time in terms of the amount of travel through the tree. Therefore, we argue that this
problem is still relevant for the choice of a node selection rule, even if all nodes can be stored in
memory.
3
Related work The explorable heap selection problem was first introduced in [KSW86]. Their
result was later applied to prove an upper bound on the parallel running time of branch-and-
bound [PPSV15].
When random access to the heap is provided at constant cost, selecting the n’th value in the
heap can be done by a deterministic algorithm in O(n)time by using an additional O(n)memory
for auxilliary data structures [Fre93].
The explorable heap selection problem can be thought of as a search game [AG03] and
bears some similarity to the cow path problem. In the cow path problem, an agent explores
an unweighted unlabeled graph in search of a target node. The location of the target node is
unknown, but when the agent visits a node they are told whether or not that node is the target.
The performance of an algorithm is judged by the ratio of the number of visited nodes to the
distance of the target from the agent’s starting point. In both the cow path problem and the
explorable heap selection problem, the cost of backtracking and retracing paths is an important
consideration. The cow path problem on infinite b-ary trees was studied in [DCD95] under the
assumption that when present at a node the agent can obtain an estimate on that node’s distance
to the target.
Other explorable graph problems exist without a target, where typically the graph itself is
unknown at the outset. There is an extensive literature on exploration both in graphs and in
the plane [Ber98, Tho06]. In some of the used models the objective is to minimize the distance
traveled [BCGL23, BDHS23, MMS12, KP94]. Other models are about minimizing the amount
of used memory [DFKP04]. What distinguises the explorable heap selection problem from these
problems is the information that the graph is a heap and that the ordinal of the target is known.
This can allow an algorithm to rule out certain locations for the target. Because of this additional
information, the techniques used here do not seem to be applicable to these other problems.
Outline In Section 2 we formally introduce the explorable heap selection problem and any
notation we will use. In Section 3 we introduce a new algorithm for solving this problem and
provide a running time analysis. In Section 4 we give a lower bound on the complexity of solving
explorable heap selection using a limited amount of memory.
2 The explorable heap selection problem
We introduce in this section the formal model for the explorable heap selection problem. The
input to the algorithm is an infinite binary tree T= (V, E), where each node vVhas an asso-
ciated real value, denoted by val(v)R. We assume that all the values are distinct. Moreover,
for each node in the tree, the values of its children are larger than its own value. Hence, for every
v1, v2Vsuch that v1is an ancestor of v2, we have that val(v2)>val(v1). The binary tree T
is thus a heap.
The algorithmic problem we are interested in is finding the nth smallest value in this tree.
This may be seen as an online graph exploration problem where an agent can move in the tree
and learns the value of a node each time they explore it. At each time step, the agent resides at
a vertex vVand may decide to move to either the left child, the right child or the parent of
v(if it exists, i.e. if vis not the root of the tree). Each traversal of an edge costs one unit of
time, and the complexity of an algorithm for this problem is thus measured by the total traveled
distance in the binary tree. The algorithm is also allowed to store values in memory.
We now introduce a few notations used throughout the paper.
For a node vV, also per abuse of notation written vT, we denote by T(v)the subtree
of Trooted at v.
4
摘要:

AnearlyoptimalrandomizedalgorithmforexplorableheapselectionSanderBorst∗1,DanielDadush∗1,SophieHuiberts2,andDanishKashaev11CentrumWiskunde&Informatica(CWI),Amsterdam2ColumbiaUniversity,NewYork{sander.borst,dadush,danish.kashaev}@cwi.nl,sophie@huiberts.meSeptember12,2024AbstractExplorableheapselection...

展开>> 收起<<
A nearly optimal randomized algorithm for explorable heap selection Sander Borst1 Daniel Dadush1 Sophie Huiberts2 and Danish Kashaev1.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:660.41KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注