Improved Bi-point Rounding Algorithms and a Golden Barrier for k-Median

2025-05-08 0 0 1.98MB 24 页 10玖币
侵权投诉
IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN
BARRIER FOR k-MEDIAN
Kishen N Gowda
University of Maryland, College Park, US
kishen19@cs.umd.edu
Thomas Pensyl
tommy.pensyl@gmail.com
Aravind Srinivasan
University of Maryland, College Park, US
srin@cs.umd.edu
Khoa Trinh
Google
khoatrinh@google.com
ABSTRACT
The current best approximation algorithms for k-median rely on first obtaining a structured fractional
solution known as a bi-point solution, and then rounding it to an integer solution. We improve this
second step by unifying and refining previous approaches. We describe a hierarchy of increasingly-
complex partitioning schemes for the facilities, along with corresponding sets of algorithms and
factor-revealing non-linear programs. We prove that the third layer of this hierarchy is a 2.613-
approximation, improving upon the current best ratio of 2.675, while no layer can be proved better
than 2.588 under the proposed analysis.
On the negative side, we give a family of bi-point solutions which cannot be approximated better
than the square root of the golden ratio, even if allowed to open k+o(k)facilities. This gives a
barrier to current approaches for obtaining an approximation better than 2φ2.544. Altogether
we reduce the approximation gap of bi-point solutions by two thirds.
1 Introduction
We study the classical NP-hard k-median problem. Given a set of clients C, a set of facilities F, and a distance metric
dover C ∪F, the goal is to open a subset Sof kfacilities which minimizes the total distance from each client to its
closest facility in S. In addition to their natural applications in facility location (e.g., in the opening of health-care
facilities), such problems also naturally model clustering in various ways.
1.1 Related Work
Three main approaches have been applied toward constant factor approximations for k-median. The first is LP-
rounding, using a half-integral solution as an intermediate step. Charikar et al. [5] used this approach to give the
first constant-factor approximation of 62
3. Charikar and Li [6] later refined this approach with dependent rounding to
achieve a factor of 3.25.
The second approach is local search. Arya et al. [2] proved that the natural local search algorithm with 2
simultaneous
swaps is a tight (3 + )-approximation. Their analysis was later nicely simplified by Gupta and Tangwongsan [9].
Very recently, Cohen-Addad et al. [8] showed that this could actually be improved by using an auxiliary cost function
which discounts clients with multiple nearby open facilities. They proved that this algorithm with O(1
)ˆ(1
)ˆ(1
)
simultaneous swaps achieves an approximation factor of 2.836 + .
The third approach, and the one we focus on for the rest of the paper, is to first generate an intermediate fractional form
called a bi-point solution, and then round it to an integral solution. These two steps are usually analyzed independently,
each incurring a multiplicative loss in the approximation factor, which we will refer to as the bi-point generation factor
Aravind Srinivasan was supported in part by NSF grant CCF-1918749, as well as by research awards from Amazon and Google.
arXiv:2210.13395v1 [cs.DS] 24 Oct 2022
IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN
and the bi-point rounding factor respectively. Jain and Vazirani [12] introduced this approach, achieving a bi-point
generation factor of 3 via reduction to the related Uncapacitated Facility Location problem, and a bi-point rounding
factor of 2, resulting in a 6-approximation for k-median. Jain, Mahdian and Saberi [11] later reduced the bi-point
generation factor from 3to 2to get a 4-approximation; Charikar and Guha also gave an earlier 4–approximation by
analyzing both steps holistically [4].
Li and Svensson [13] improved the bi-point rounding factor to 1+3+
21.366 under the relaxation of being allowed
to open k+O1
facilities, resulting in a pseudo-approximation algorithm. Then, in a remarkable result, they showed
how to convert any α-pseudo-approximation algorithm which opens k+cfacilities into a proper (α+)-approximation
algorithm, by running it on a set of nO(c/)derived instances. This enabled conversion of their pseudo-approximation
into a (1 + 3 + )-approximation with runtime nO(1/2). Their algorithm was based on constructing stars of nearby
facilities which could be randomly rounded while still guaranteeing a nearby facility for each client.
Byrka et al. [3] refined this approach by categorizing stars by their size and relative distance, and then considering a
slew of algorithms which open facilities asymmetrically across each category. They then gave a factor-revealing non-
linear program (NLP) corresponding to taking the best of all algorithms, and solved the NLP with computer assistance
to get a bi-point rounding factor of 1.3371 + . They also re-framed the star-rounding technique in terms of bounding
positive correlation, and developed a dependent-rounding technique which reduced the number of facilities required
to k+Olog 1
, resulting in a runtime of nO(1
log 1
). They also exhibited a bi-point solution with an integrality gap
of 1+2
21.207 even when allowing for k+o(k)facilities to be opened, thus giving a new lower bound for the
bi-point rounding factor, resilient to Li and Svensson’s pseudo-approximation technique.
Lastly, during the writing of this paper, Cohen-Addad et al. [7] have released results refining the bi-point generation
factor to be slightly less than 2. The remainder of our paper does not reflect this result, though it would likely imply
some small improvement to the overall factor.
1.2 Our Work
Let us now formally define bi-point solutions and summarize the current-best bi-point generation result.
Definition 1.1 (Bi-point solution).Given a k-median instance I, a bi-point solution is a pair F1, F2⊆ F such that
|F1| ≤ k≤ |F2|, along with real numbers a, b 0,a+b= 1 such that a|F1|+b|F2|=k. The cost of this bi-point
solution is defined as aD1+bD2, where D1and D2are the total connection costs of F1and F2, respectively.
Theorem 1.2 ([11]).There exists an algorithm that, given a k-median instance, produces a bi-point solution of cost
at most 2·OP T in polynomial time, where OP T is the cost of optimal solution for the given instance.
In this paper, we will focus exclusively on improving the bi-point rounding factor, where a factor of αimmediately
implies a 2α-approximation for k-median, per Theorem 1.2.
A key ingredient in improving this factor will be the star-rounding algorithm of Li and Svensson (LS). One drawback
of their algorithm is that it requires separate algorithms when bis close to 0or 1, and this caveat propagates to any
dependent results. To avoid importing this complexity, we provide a streamlined version of LS that uses a somewhat-
sophisticated dependent-rounding procedure to remove any restriction on the value of b, by essentially unifying the
star-rounding and knapsack-type algorithms from LS. This more generic analysis requires opening slightly more fa-
cilities than LS, but we feel the gain in simplicity is worthwhile and hope others may find it similarly helpful in future
work. (Indeed, this technique would have removed the need for many edge-case algorithms and analysis in [3] as
well.)
Theorem 1.3. There exists a randomized algorithm SR that takes a bi-point solution and returns a set of
k+O1
log 1
facilities, with expected cost at most
(1 + )·[(1 b)D1+b(3 2b)D2)] .
To account for the extra facilities opened, we invoke the pseudo-approximation reduction of LS for a net run-time
of nO(1
2log 1
), and an additional additive penalty of incurred in the approximation factor. For convenience, we
generally omit the term when stating our approximation factors throughout the paper. (Alternatively, we may consider
as being absorbed into the rounding error in our non-exact approximation factors).
Now consider the previously mentioned bi-point rounding algorithms of Jain and Vazirani (JV) and of LS. JV is based
on F2-centric stars, while LS is based on F1-centric stars. In each case, the stars function to provide a worst case
backup bound for each client. Byrka et al. refined LS by partitioning stars according to a factor g(i)representing their
2
IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN
relative distance to each other. This exposed new possible backup bounds beyond those immediately provided by the
stars, thus enabling a larger variety of competing algorithms, and resulting in the improved approximation.
In this paper we apply a similar refinement, but to JV instead of LS. We find JV is more amenable to these techniques
for several reasons. First, the resulting backup bounds are more direct and thus cheaper, due to only needing to make
at most one “hop" to another facility. Second, we are able to apply the g(·)-based partitioning to all stars, instead of
just some, which also means the new backup bounds are available to all clients. Third, the resulting algorithms are
simpler and do not open more than kfacilities. This last property immediately implies that our new algorithms cannot
actually beat the integrality gap of 2. Nevertheless, when run in tandem with LS, we achieve results which appear to
completely subsume those in Byrka et al. (e.g., including their algorithms in our analysis does not improve the results
of this paper, at least experimentally).
Additionally, thanks to the simpler setting of F2-centric stars, we are able to push the technique further, defining and
analyzing a hierarchy of increasingly complex partitions based on the factor g(i). For each layer of the hierarchy,
we propose a set of candidate algorithms, and a factor-revealing NLP representing the best of all solutions. The NLP
complexity increases exponentially at each layer. However with computer-assisted methods, we are able to rigorously
prove that the third layer, in tandem with SR, provides a bi-point rounding factor of 1.3064 (improving over the
previous best factor of 1.3371). We also show the existence of a difficult instance for which our NLP cannot prove a
bi-point rounding factor smaller than 1.2943, for any layer of our hierarchy.
Theorem 1.4. There exists a randomized algorithm that, given a bi-point solution of cost aD1+bD2and  > 0,
opens at most k+O1
log 1
facilities and returns a solution of cost at most (1.3064 + )·(aD1+bD2).
Theorem 1.5. There exists a bi-point solution for which our NLP cannot prove a bi-point rounding factor better than
1.2943, for any layer of our hierarchy.
Theorem 1.2, Theorem 1.4 and the pseudo-approximation reduction of Li and Svensson [13, Theorem 4] together
imply our improved approximation factor.
Corollary 1.6. There exists a randomized algorithm that, given a k-median instance and  > 0, runs in time
nO(1
2log 1
)and returns a solution of cost at most (2.613 + )·OP T , where OP T is the cost of the optimal so-
lution of the given instance.
Lastly, we provide a bi-point solution with integrality gap φ1.272 where φis the golden ratio, even when allowing
for k+o(k)facilities to be opened. This improves on the previous gap of 1.207, giving an improved lower bound for
the bi-point approximation factor. Study of this instance inspired the algorithmic improvements in this paper; we hope
it can shed further insight into the true approximability of bi-point solutions.
Theorem 1.7. For every  > 0and C(k) = o(k), there exists a family of bi-point solutions with integrality gap
φ, even if we allow solutions that open k+C(k)facilities.
The rest of the paper is organized as follows. In Section 2, we present our bi-point rounding algorithm. Section 2.2
discusses the aforementioned variant of Li and Svensson’s star-rounding algorithm SR, Section 2.3 describes our
main hierarchy of partitioning schemes for facilities and the corresponding set of rounding algorithms, and Section 2.4
presents the NLP for obtaining the bi-point rounding factor and the results achieved for the second and third layers
of our hierarchy. In Section 3, we give a lower bound for our framework. In Section 4, we exhibit the pseudo-
approximation-resilient family of integrality-gap instances. We conclude with a discussion in Section 5.
2 Bi-point Rounding Algorithm
In this section, we define a set of bi-point rounding algorithms, and bound the total cost and facilities opened for each
algorithm. Our set of algorithms consists of a family of rounding algorithms obtained via a hierarchy of partition
schemes for the facilities, along with the star-rounding algorithm SR. Our top-level algorithm runs all of these
algorithms and returns the best solution obtained.
2.1 Preliminaries
For any client j∈ C, we use c(j)to denote the connection cost of j(to its closest open facility in S). For any set J
of clients, let c(J) := PvJc(v). We let i1(j)and i2(j)denote the closest facility to jin F1and F2, respectively.
When the context is clear, we shall drop the parameter jin the notation.
By an abuse of notation, for any facility i∈ F, we let i(and ¯
i) denote the event that facility iis opened (and closed) in
our solution (respectively). For ease of notation, we also drop the operator when joining these events. For example,
3
IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN
Pr[i1¯
i2]is the probability for the event that facility i1is opened and facility i2is closed. For any set X⊆ F, we let
σX(i)denote the closest facility to iin X(i.e., σX(i) := arg mini0Xd(i, i0)).
Lastly, we use [m]to denote the set of natural numbers {1,2,··· , m}, and [m1, m2]to denote the set {m1, m1+
1,··· , m2}. However, when we refer explicitly to the set [0,1], it means the standard set of reals between 0and 1
(inclusive).
2.2 Star-Rounding Algorithm
In this section, we define the star-rounding algorithm SR and prove Theorem 1.3, which will provide the same cost
guarantee as Li and Svensson’s algorithm, but with no restrictions on bi-point parameter b(see Definition 1.1). The
proof follows a similar structure to that of previous star-rounding algorithms[13, 3]. We will first need the following
dependent-rounding procedure from [10] 2.
Theorem 2.1 ([10, Theorem 2.1]).Symmetric Randomized Dependent Rounding. Given vectors x[0,1]nand
a= (a1, . . . , an)Rn, and tN, there exists a randomized algorithm (SRDR) which runs in expected O(n2)
time and returns a vector X[0,1]nwith at most tfractional values. Both the weighted sum and all the marginal
probabilities are preserved: PiaiXi=Piaixiwith probability one, and E[Xi] = xifor all i[n]. Let S, T be
disjoint subsets of [n]. Then we have the upper correlation bound:
E
Y
iS
XiY
jT
(1 Xj)
Y
iS
xiY
jT
(1 xj)
11/(t+1)
.(1)
This can generally be converted to a standard multiplicative (1 + )error bound by setting the parameter tto be
O1
log 1
QiSxiQjT(1xj), but in the special case below, we can actually set tindependently of the xi. This will
allow us to avoid restricting the domain of the algorithm.
Corollary 2.2. Pairwise positive correlation under uniform marginals. Suppose SRDR is run with tlog(1+1/)
log(1+).
For any distinct i, j such that xi=xj=bfor any value of b, we have
E[Xi(1 Xj)] (1 + )b(1 b).
Proof. Let S={i},T={j}. Then (1) simplifies to:
E[Xi(1 Xj)] (xi(1 xj))11/(t+1) =1
b(1 b)1/(t+1)
b(1 b).(2)
If min{b, 1b} ≥
1+, then 1
b(1b)(1+)2
and 1
t+1 log(1+)
log((1+)2/), so (2) is at most (1 + )b(1 b). Otherwise
min{b, 1b} ≤
1+, in which case we may directly bound by the marginals:
E[Xi(1 Xj)] min{E[Xi],E[1 Xj]}(3)
= min{b, 1b}=b(1 b)
1min{b, 1b}(1 + )b(1 b).
We now define the star-rounding algorithm SR. Form graph Gby drawing an edge from each facility in F2to its
closest facility in F1, resulting in a forest of F1-centric stars. For each iF1, let LiF2be the leaves of the star
rooted at i. Define vectors a, x with ai:= |Li| − 1and xi:= b, and set t:= log(1+1/)
log(1+)for some  > 0. Now run
SRDR(a, x, t)to obtain output vector X. Finally, for each iF1, open dXi|Li|e facilities uniformly at random from
Liand open iitself with probability d1Xie.
Lemma 2.3. SR opens at most k+O1
log 1
facilities, with probability one.
2This result appears exclusively in v1 of the referenced arXiv paper, but continues to be publicly available.
4
IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN
Proof. By Theorem 2.1, there are at most tfractionally-valued Xi(the rest being 0or 1). Also, Pi(|Li| − 1)Xi=
PiaiXi=Piaixi=Pi(|Li| − 1)band {Li}iF1partitions F2. Thus, the count of facilities opened is
X
iF1d1Xie+dXi|Li|eX
iF1
(1 Xi+Xi|Li|)+2t(with probability one)
=X
iF1
(1 b+b|Li|)) + 2t(with probability one)
= (1 b)|F1|+b|F2|+ 2t
=k+ 2t=k+O1
log 1
.
Lemma 2.4. For any two facilities i1F1and i2F2, we have
Pr[¯
i1]b, (4)
Pr[¯
i2]1b, (5)
Pr[¯
i1¯
i2](1 + )b(1 b).(6)
Proof. We shall use this simple fact: for any event Eand random variable Y,Pr[E|Y=y]f(y)implies that
Pr[E]E[f(Y)]. Let i3be the root of the star containing i2. Then we have that
Pr[¯
i1|Xi1=y]=1− d1ye ≤ y=Pr[¯
i1]E[Xi1] = b;
Pr[¯
i2|Xi3=z] = 1 dz|Li|e
|Li|1z=Pr[¯
i2]E[1 Xi3] = 1 b;
If i1and i2are in the same star, then “i1is closed ” =Xi1= 1 =i2is opened”, so Pr[¯
i1¯
i2] = 0.
Else the two facilities are chosen by independent processes (for fixed X), and we can apply Corollary 2.2 to
show
Pr[¯
i1¯
i2|Xi1=yXi3=z] = Pr[¯
i1|Xi1=y]·Pr[¯
i2|Xi3=z]y(1 z),
implying that
Pr[¯
i1¯
i2]E[Xi1(1 Xi3)] (1 + )b(1 b).
Lemma 2.5. For any client j, let i1and i2be the closest facilities in F1and F2, at distances d1and d2from j,
respectively. SR gives
E[c(j)] (1 + )((1 b)d1+b(3 2b)d2).
Proof. Let i3F1be the root of i2in G. At least one of i2or i3will be opened. By the triangle inequality,
d(j, i3)d(j, i2) + d(i2, i3)d(j, i2) + d(i2, i1)2d2+d1. If d2< d1, we bound the cost by connecting to the
first open facility in order of precedence (i2, i1, i3):
E[c(j)] Pr[i2]d2+ Pr[¯
i2i1]d1+ Pr[¯
i2¯
i1](2d2+d1)
=d2+ Pr[¯
i2](d1d2) + 2 Pr[¯
i2¯
i1]d2
d2+ (1 b)(d1d2) + 2(1 + )b(1 b)d2
= (1 b)d1+bd2+ (1 + )b(2 2b)d2.
Similarly, if d1< d2, we connect in order (i1, i2, i3):
E[c(j)] Pr[i1]d1+ Pr[¯
i1i2]d2+ Pr[¯
i1¯
i2](2d2+d1)
=d1+ Pr[¯
i1](d2d1) + Pr[¯
i2¯
i1](d1+d2)
d1+b(d2d1) + (1 + )b(1 b)(2d2)
= (1 b)d1+bd2+ (1 + )b(2 2b)d2.
5
摘要:

IMPROVEDBI-POINTROUNDINGALGORITHMSANDAGOLDENBARRIERFORk-MEDIANKishenNGowdaUniversityofMaryland,CollegePark,USkishen19@cs.umd.eduThomasPensyltommy.pensyl@gmail.comAravindSrinivasanUniversityofMaryland,CollegePark,USsrin@cs.umd.eduKhoaTrinhGooglekhoatrinh@google.comABSTRACTThecurrentbestapproximation...

展开>> 收起<<
Improved Bi-point Rounding Algorithms and a Golden Barrier for k-Median.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:1.98MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注