Improved Bi-point Rounding Algorithms and a Golden Barrier for k-Median

2025-05-08 1 0 1.98MB 24 页 10玖币

侵权投诉

IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN

BARRIER FOR k-MEDIAN

Kishen N Gowda

University of Maryland, College Park, US

kishen19@cs.umd.edu

Thomas Pensyl

tommy.pensyl@gmail.com

Aravind Srinivasan∗

University of Maryland, College Park, US

srin@cs.umd.edu

Khoa Trinh

Google

khoatrinh@google.com

ABSTRACT

The current best approximation algorithms for k-median rely on ﬁrst obtaining a structured fractional

solution known as a bi-point solution, and then rounding it to an integer solution. We improve this

second step by unifying and reﬁning previous approaches. We describe a hierarchy of increasingly-

complex partitioning schemes for the facilities, along with corresponding sets of algorithms and

factor-revealing non-linear programs. We prove that the third layer of this hierarchy is a 2.613-

approximation, improving upon the current best ratio of 2.675, while no layer can be proved better

than 2.588 under the proposed analysis.

On the negative side, we give a family of bi-point solutions which cannot be approximated better

than the square root of the golden ratio, even if allowed to open k+o(k)facilities. This gives a

barrier to current approaches for obtaining an approximation better than 2√φ≈2.544. Altogether

we reduce the approximation gap of bi-point solutions by two thirds.

1 Introduction

We study the classical NP-hard k-median problem. Given a set of clients C, a set of facilities F, and a distance metric

dover C ∪F, the goal is to open a subset Sof kfacilities which minimizes the total distance from each client to its

closest facility in S. In addition to their natural applications in facility location (e.g., in the opening of health-care

facilities), such problems also naturally model clustering in various ways.

1.1 Related Work

Three main approaches have been applied toward constant factor approximations for k-median. The ﬁrst is LP-

rounding, using a half-integral solution as an intermediate step. Charikar et al. [5] used this approach to give the

ﬁrst constant-factor approximation of 62

3. Charikar and Li [6] later reﬁned this approach with dependent rounding to

achieve a factor of 3.25.

The second approach is local search. Arya et al. [2] proved that the natural local search algorithm with 2

simultaneous

swaps is a tight (3 + )-approximation. Their analysis was later nicely simpliﬁed by Gupta and Tangwongsan [9].

Very recently, Cohen-Addad et al. [8] showed that this could actually be improved by using an auxiliary cost function

which discounts clients with multiple nearby open facilities. They proved that this algorithm with O(1

)ˆ(1

)

simultaneous swaps achieves an approximation factor of 2.836 + .

The third approach, and the one we focus on for the rest of the paper, is to ﬁrst generate an intermediate fractional form

called a bi-point solution, and then round it to an integral solution. These two steps are usually analyzed independently,

each incurring a multiplicative loss in the approximation factor, which we will refer to as the bi-point generation factor

∗Aravind Srinivasan was supported in part by NSF grant CCF-1918749, as well as by research awards from Amazon and Google.

arXiv:2210.13395v1 [cs.DS] 24 Oct 2022

IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN

and the bi-point rounding factor respectively. Jain and Vazirani [12] introduced this approach, achieving a bi-point

generation factor of 3 via reduction to the related Uncapacitated Facility Location problem, and a bi-point rounding

factor of 2, resulting in a 6-approximation for k-median. Jain, Mahdian and Saberi [11] later reduced the bi-point

generation factor from 3to 2to get a 4-approximation; Charikar and Guha also gave an earlier 4–approximation by

analyzing both steps holistically [4].

Li and Svensson [13] improved the bi-point rounding factor to 1+√3+

2≈1.366 under the relaxation of being allowed

to open k+O1

facilities, resulting in a pseudo-approximation algorithm. Then, in a remarkable result, they showed

how to convert any α-pseudo-approximation algorithm which opens k+cfacilities into a proper (α+)-approximation

algorithm, by running it on a set of nO(c/)derived instances. This enabled conversion of their pseudo-approximation

into a (1 + √3 + )-approximation with runtime nO(1/2). Their algorithm was based on constructing stars of nearby

facilities which could be randomly rounded while still guaranteeing a nearby facility for each client.

Byrka et al. [3] reﬁned this approach by categorizing stars by their size and relative distance, and then considering a

slew of algorithms which open facilities asymmetrically across each category. They then gave a factor-revealing non-

linear program (NLP) corresponding to taking the best of all algorithms, and solved the NLP with computer assistance

to get a bi-point rounding factor of 1.3371 + . They also re-framed the star-rounding technique in terms of bounding

positive correlation, and developed a dependent-rounding technique which reduced the number of facilities required

to k+Olog 1

, resulting in a runtime of nO(1

log 1

). They also exhibited a bi-point solution with an integrality gap

of 1+√2

2≈1.207 even when allowing for k+o(k)facilities to be opened, thus giving a new lower bound for the

bi-point rounding factor, resilient to Li and Svensson’s pseudo-approximation technique.

Lastly, during the writing of this paper, Cohen-Addad et al. [7] have released results reﬁning the bi-point generation

factor to be slightly less than 2. The remainder of our paper does not reﬂect this result, though it would likely imply

some small improvement to the overall factor.

1.2 Our Work

Let us now formally deﬁne bi-point solutions and summarize the current-best bi-point generation result.

Deﬁnition 1.1 (Bi-point solution).Given a k-median instance I, a bi-point solution is a pair F1, F2⊆ F such that

|F1| ≤ k≤ |F2|, along with real numbers a, b ≥0,a+b= 1 such that a|F1|+b|F2|=k. The cost of this bi-point

solution is deﬁned as aD1+bD2, where D1and D2are the total connection costs of F1and F2, respectively.

Theorem 1.2 ([11]).There exists an algorithm that, given a k-median instance, produces a bi-point solution of cost

at most 2·OP T in polynomial time, where OP T is the cost of optimal solution for the given instance.

In this paper, we will focus exclusively on improving the bi-point rounding factor, where a factor of αimmediately

implies a 2α-approximation for k-median, per Theorem 1.2.

A key ingredient in improving this factor will be the star-rounding algorithm of Li and Svensson (LS). One drawback

of their algorithm is that it requires separate algorithms when bis close to 0or 1, and this caveat propagates to any

dependent results. To avoid importing this complexity, we provide a streamlined version of LS that uses a somewhat-

sophisticated dependent-rounding procedure to remove any restriction on the value of b, by essentially unifying the

star-rounding and knapsack-type algorithms from LS. This more generic analysis requires opening slightly more fa-

cilities than LS, but we feel the gain in simplicity is worthwhile and hope others may ﬁnd it similarly helpful in future

work. (Indeed, this technique would have removed the need for many edge-case algorithms and analysis in [3] as

well.)

Theorem 1.3. There exists a randomized algorithm SR that takes a bi-point solution and returns a set of

k+O1

log 1

facilities, with expected cost at most

(1 + )·[(1 −b)D1+b(3 −2b)D2)] .

To account for the extra facilities opened, we invoke the pseudo-approximation reduction of LS for a net run-time

of nO(1

2log 1

), and an additional additive penalty of incurred in the approximation factor. For convenience, we

generally omit the term when stating our approximation factors throughout the paper. (Alternatively, we may consider

as being absorbed into the rounding error in our non-exact approximation factors).

Now consider the previously mentioned bi-point rounding algorithms of Jain and Vazirani (JV) and of LS. JV is based

on F2-centric stars, while LS is based on F1-centric stars. In each case, the stars function to provide a worst case

backup bound for each client. Byrka et al. reﬁned LS by partitioning stars according to a factor g(i)representing their

IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN

relative distance to each other. This exposed new possible backup bounds beyond those immediately provided by the

stars, thus enabling a larger variety of competing algorithms, and resulting in the improved approximation.

In this paper we apply a similar reﬁnement, but to JV instead of LS. We ﬁnd JV is more amenable to these techniques

for several reasons. First, the resulting backup bounds are more direct and thus cheaper, due to only needing to make

at most one “hop" to another facility. Second, we are able to apply the g(·)-based partitioning to all stars, instead of

just some, which also means the new backup bounds are available to all clients. Third, the resulting algorithms are

simpler and do not open more than kfacilities. This last property immediately implies that our new algorithms cannot

actually beat the integrality gap of 2. Nevertheless, when run in tandem with LS, we achieve results which appear to

completely subsume those in Byrka et al. (e.g., including their algorithms in our analysis does not improve the results

of this paper, at least experimentally).

Additionally, thanks to the simpler setting of F2-centric stars, we are able to push the technique further, deﬁning and

analyzing a hierarchy of increasingly complex partitions based on the factor g(i). For each layer of the hierarchy,

we propose a set of candidate algorithms, and a factor-revealing NLP representing the best of all solutions. The NLP

complexity increases exponentially at each layer. However with computer-assisted methods, we are able to rigorously

prove that the third layer, in tandem with SR, provides a bi-point rounding factor of 1.3064 (improving over the

previous best factor of 1.3371). We also show the existence of a difﬁcult instance for which our NLP cannot prove a

bi-point rounding factor smaller than 1.2943, for any layer of our hierarchy.

Theorem 1.4. There exists a randomized algorithm that, given a bi-point solution of cost aD1+bD2and  > 0,

opens at most k+O1

log 1

facilities and returns a solution of cost at most (1.3064 + )·(aD1+bD2).

Theorem 1.5. There exists a bi-point solution for which our NLP cannot prove a bi-point rounding factor better than

1.2943, for any layer of our hierarchy.

Theorem 1.2, Theorem 1.4 and the pseudo-approximation reduction of Li and Svensson [13, Theorem 4] together

imply our improved approximation factor.

Corollary 1.6. There exists a randomized algorithm that, given a k-median instance and  > 0, runs in time

nO(1

2log 1

)and returns a solution of cost at most (2.613 + )·OP T , where OP T is the cost of the optimal so-

lution of the given instance.

Lastly, we provide a bi-point solution with integrality gap √φ≈1.272 where φis the golden ratio, even when allowing

for k+o(k)facilities to be opened. This improves on the previous gap of 1.207, giving an improved lower bound for

the bi-point approximation factor. Study of this instance inspired the algorithmic improvements in this paper; we hope

it can shed further insight into the true approximability of bi-point solutions.

Theorem 1.7. For every  > 0and C(k) = o(k), there exists a family of bi-point solutions with integrality gap

√φ−, even if we allow solutions that open k+C(k)facilities.

The rest of the paper is organized as follows. In Section 2, we present our bi-point rounding algorithm. Section 2.2

discusses the aforementioned variant of Li and Svensson’s star-rounding algorithm SR, Section 2.3 describes our

main hierarchy of partitioning schemes for facilities and the corresponding set of rounding algorithms, and Section 2.4

presents the NLP for obtaining the bi-point rounding factor and the results achieved for the second and third layers

of our hierarchy. In Section 3, we give a lower bound for our framework. In Section 4, we exhibit the pseudo-

approximation-resilient family of integrality-gap instances. We conclude with a discussion in Section 5.

2 Bi-point Rounding Algorithm

In this section, we deﬁne a set of bi-point rounding algorithms, and bound the total cost and facilities opened for each

algorithm. Our set of algorithms consists of a family of rounding algorithms obtained via a hierarchy of partition

schemes for the facilities, along with the star-rounding algorithm SR. Our top-level algorithm runs all of these

algorithms and returns the best solution obtained.

2.1 Preliminaries

For any client j∈ C, we use c(j)to denote the connection cost of j(to its closest open facility in S). For any set J

of clients, let c(J) := Pv∈Jc(v). We let i1(j)and i2(j)denote the closest facility to jin F1and F2, respectively.

When the context is clear, we shall drop the parameter jin the notation.

By an abuse of notation, for any facility i∈ F, we let i(and ¯

i) denote the event that facility iis opened (and closed) in

our solution (respectively). For ease of notation, we also drop the ∧operator when joining these events. For example,

IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN

Pr[i1¯

i2]is the probability for the event that facility i1is opened and facility i2is closed. For any set X⊆ F, we let

σX(i)denote the closest facility to iin X(i.e., σX(i) := arg mini0∈Xd(i, i0)).

Lastly, we use [m]to denote the set of natural numbers {1,2,··· , m}, and [m1, m2]to denote the set {m1, m1+

1,··· , m2}. However, when we refer explicitly to the set [0,1], it means the standard set of reals between 0and 1

(inclusive).

2.2 Star-Rounding Algorithm

In this section, we deﬁne the star-rounding algorithm SR and prove Theorem 1.3, which will provide the same cost

guarantee as Li and Svensson’s algorithm, but with no restrictions on bi-point parameter b(see Deﬁnition 1.1). The

proof follows a similar structure to that of previous star-rounding algorithms[13, 3]. We will ﬁrst need the following

dependent-rounding procedure from [10] 2.

Theorem 2.1 ([10, Theorem 2.1]).Symmetric Randomized Dependent Rounding. Given vectors x∈[0,1]nand

a= (a1, . . . , an)∈Rn, and t∈N, there exists a randomized algorithm (SRDR) which runs in expected O(n2)

time and returns a vector X∈[0,1]nwith at most tfractional values. Both the weighted sum and all the marginal

probabilities are preserved: PiaiXi=Piaixiwith probability one, and E[Xi] = xifor all i∈[n]. Let S, T be

disjoint subsets of [n]. Then we have the upper correlation bound:

E

Y

i∈S

XiY

j∈T

(1 −Xj)

≤

Y

i∈S

xiY

j∈T

(1 −xj)



1−1/(t+1)

.(1)

This can generally be converted to a standard multiplicative (1 + )error bound by setting the parameter tto be

O1

log 1

Qi∈SxiQj∈T(1−xj), but in the special case below, we can actually set tindependently of the xi. This will

allow us to avoid restricting the domain of the algorithm.

Corollary 2.2. Pairwise positive correlation under uniform marginals. Suppose SRDR is run with t≥log(1+1/)

log(1+).

For any distinct i, j such that xi=xj=bfor any value of b, we have

E[Xi(1 −Xj)] ≤(1 + )b(1 −b).

Proof. Let S={i},T={j}. Then (1) simpliﬁes to:

E[Xi(1 −Xj)] ≤(xi(1 −xj))1−1/(t+1) =1

b(1 −b)1/(t+1)

b(1 −b).(2)

If min{b, 1−b} ≥ 

1+, then 1

b(1−b)≤(1+)2

and 1

t+1 ≤log(1+)

log((1+)2/), so (2) is at most (1 + )b(1 −b). Otherwise

min{b, 1−b} ≤ 

1+, in which case we may directly bound by the marginals:

E[Xi(1 −Xj)] ≤min{E[Xi],E[1 −Xj]}(3)

= min{b, 1−b}=b(1 −b)

1−min{b, 1−b}≤(1 + )b(1 −b).

We now deﬁne the star-rounding algorithm SR. Form graph Gby drawing an edge from each facility in F2to its

closest facility in F1, resulting in a forest of F1-centric stars. For each i∈F1, let Li⊆F2be the leaves of the star

rooted at i. Deﬁne vectors a, x with ai:= |Li| − 1and xi:= b, and set t:= log(1+1/)

log(1+)for some  > 0. Now run

SRDR(a, x, t)to obtain output vector X. Finally, for each i∈F1, open dXi|Li|e facilities uniformly at random from

Liand open iitself with probability d1−Xie.

Lemma 2.3. SR opens at most k+O1

log 1

facilities, with probability one.

2This result appears exclusively in v1 of the referenced arXiv paper, but continues to be publicly available.

IMPROVED BI-POINT ROUNDING ALGORITHMS AND A GOLDEN BARRIER FOR k-MEDIAN

Proof. By Theorem 2.1, there are at most tfractionally-valued Xi(the rest being 0or 1). Also, Pi(|Li| − 1)Xi=

PiaiXi=Piaixi=Pi(|Li| − 1)band {Li}i∈F1partitions F2. Thus, the count of facilities opened is

i∈F1d1−Xie+dXi|Li|e≤X

i∈F1

(1 −Xi+Xi|Li|)+2t(with probability one)

i∈F1

(1 −b+b|Li|)) + 2t(with probability one)

= (1 −b)|F1|+b|F2|+ 2t

=k+ 2t=k+O1

log 1

.

Lemma 2.4. For any two facilities i1∈F1and i2∈F2, we have

Pr[¯

i1]≤b, (4)

Pr[¯

i2]≤1−b, (5)

Pr[¯

i1¯

i2]≤(1 + )b(1 −b).(6)

Proof. We shall use this simple fact: for any event Eand random variable Y,Pr[E|Y=y]≤f(y)implies that

Pr[E]≤E[f(Y)]. Let i3be the root of the star containing i2. Then we have that

•Pr[¯

i1|Xi1=y]=1− d1−ye ≤ y=⇒Pr[¯

i1]≤E[Xi1] = b;

•Pr[¯

i2|Xi3=z] = 1 −dz|Li|e

|Li|≤1−z=⇒Pr[¯

i2]≤E[1 −Xi3] = 1 −b;

• If i1and i2are in the same star, then “i1is closed ” =⇒Xi1= 1 =⇒“i2is opened”, so Pr[¯

i1¯

i2] = 0.

Else the two facilities are chosen by independent processes (for ﬁxed X), and we can apply Corollary 2.2 to

show

Pr[¯

i1¯

i2|Xi1=y∧Xi3=z] = Pr[¯

i1|Xi1=y]·Pr[¯

i2|Xi3=z]≤y(1 −z),

implying that

Pr[¯

i1¯

i2]≤E[Xi1(1 −Xi3)] ≤(1 + )b(1 −b).

Lemma 2.5. For any client j, let i1and i2be the closest facilities in F1and F2, at distances d1and d2from j,

respectively. SR gives

E[c(j)] ≤(1 + )((1 −b)d1+b(3 −2b)d2).

Proof. Let i3∈F1be the root of i2in G. At least one of i2or i3will be opened. By the triangle inequality,

d(j, i3)≤d(j, i2) + d(i2, i3)≤d(j, i2) + d(i2, i1)≤2d2+d1. If d2< d1, we bound the cost by connecting to the

ﬁrst open facility in order of precedence (i2, i1, i3):

E[c(j)] ≤Pr[i2]d2+ Pr[¯

i2i1]d1+ Pr[¯

i2¯

i1](2d2+d1)

=d2+ Pr[¯

i2](d1−d2) + 2 Pr[¯

i2¯

i1]d2

≤d2+ (1 −b)(d1−d2) + 2(1 + )b(1 −b)d2

= (1 −b)d1+bd2+ (1 + )b(2 −2b)d2.

Similarly, if d1< d2, we connect in order (i1, i2, i3):

E[c(j)] ≤Pr[i1]d1+ Pr[¯

i1i2]d2+ Pr[¯

i1¯

i2](2d2+d1)

=d1+ Pr[¯

i1](d2−d1) + Pr[¯

i2¯

i1](d1+d2)

≤d1+b(d2−d1) + (1 + )b(1 −b)(2d2)

= (1 −b)d1+bd2+ (1 + )b(2 −2b)d2.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

IMPROVEDBI-POINTROUNDINGALGORITHMSANDAGOLDENBARRIERFORk-MEDIANKishenNGowdaUniversityofMaryland,CollegePark,USkishen19@cs.umd.eduThomasPensyltommy.pensyl@gmail.comAravindSrinivasanUniversityofMaryland,CollegePark,USsrin@cs.umd.eduKhoaTrinhGooglekhoatrinh@google.comABSTRACTThecurrentbestapproximation...

展开>> 收起<<

Improved Bi-point Rounding Algorithms and a Golden Barrier for k-Median.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improved Bi-point Rounding Algorithms and a Golden Barrier for k-Median

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: