Efﬁcient and Stable Fully Dynamic Facility Location Sayan Bhattacharya Department of Computer Science

2025-05-03 0 0 1012.6KB 25 页 10玖币

侵权投诉

Efﬁcient and Stable Fully Dynamic Facility Location

Sayan Bhattacharya

Department of Computer Science

University of Warwick

Coventry, CV47AL, United Kingdom

s.bhattacharya@warwick.ac.uk

Silvio Lattanzi

Google Research

silviol@google.com

Nikos Parotsidis

Google Research

nikosp@google.com

Abstract

We consider the classic facility location problem in fully dynamic data streams,

where elements can be both inserted and deleted. In this problem, one is interested

in maintaining a stable and high quality solution throughout the data stream while

using only little time per update (insertion or deletion). We study the problem and

provide the ﬁrst algorithm that at the same time maintains a constant approximation

and incurs polylogarithmic amortized recourse per update. We complement our

theoretical results with an experimental analysis showing the practical efﬁciency of

our method.

1 Introduction

Clustering is a fundamental problem in unsupervised learning with many practical applications in

community detection, spam detection, image segmentation and many others. A central problem in

this space with a long and rich history in computer science and operations research Korte and Vygen

[2018] is the facility location problem(which can also be seen as the Lagrangian relaxation of the

classic k–median clustering). Due to its scientiﬁc importance and practical applications the problem

has been studied extensively and several approximation algorithms are known for the problem Guha

and Khuller [1999], Jain et al. [2003], Li [2013]. In addition, the problem has also been extensively

studied in various computational model as the streaming model Indyk [2004], Lammersen and

Sohler [2008], Czumaj et al. [2013], the online model Meyerson [2001], Fotakis [2008] and dynamic

algorithm model Cygan et al. [2018], Goranci et al. [2018], Cohen-Addad et al. [2019], Guo et al.

[2020] and many more.

Real world applications nowadays often process evolving data-sets, i.e. social networks continuously

evolve in time, videos and pictures are constantly uploaded and taken down from media platforms,

news article and blog posted are uploaded or taken down and so on so for. For this reason, it is

important to design algorithms that are able to maintain a stable and high quality solution and

that at the same time can process updates efﬁciently. As a consequence, the dynamic and online

model have been extensively studied and several interesting results are known for classic learning

problems Lattanzi and Vassilvitskii [2017], Chan et al. [2018], Cohen-Addad et al. [2019], Jaghargh

et al. [2019], Lattanzi et al. [2020], Fichtenberger et al. [2021], Guo et al. [2021]. In this paper, we

extend this line of work by studying the classic facility location problem in the fully dynamic setting.

Problem deﬁnition.

The input to our problem consists of a collection

of facilities and a collection

of clients in a general metric space. For all

i∈F, j ∈D

, we let

dij

denote the distance between

facility

and client

. These distances satisfy triangle inequality. Let

denote the opening cost of

facility

i∈F

. Our goal is to open a subset of facilities

F0⊆F

and then connect every client

j∈D

to some open facility

ij∈F0

, so as to minimise the objective

Pi∈F0fi+Pj∈Ddij,j

. The ﬁrst sum

1For simplicity, we assume that all the distances between facility iand client jare polynomially bounded.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.13880v1 [cs.DS] 25 Oct 2022

in the objective denotes the sum of the opening costs of the facilities in

, whereas the second sum

denotes the sum of the connection costs for the clients. Throughout the rest of this paper, we will let

m=|F|and n=|D|respectively denote the number of facilities and clients.

We consider a dynamic setting, where the input to the problem keeps changing via a sequence of

updates. Each update inserts or deletes a client in the metric space. Throughout this sequence of

updates, we wish to maintain an approximately optimal solution to the current input instance with

small recourse. The recourse of the algorithm is deﬁned as follows. Whenever we open a new facility

or close an already open facility, we incur one unit of facility-recourse. Similarly, whenever we

reassign a client from one facility to another, we incur one unit of client-recourse. The recourse of the

algorithm after an update is the sum of facility-recourse and client-recourse. Note that the recourse

of an algorithm naturally captures the consistency of the solution throughout the input updates. So

having small recourse is particularly important for real world application where a clustering is served

to the user or used in a machine learning pipeline Lattanzi and Vassilvitskii [2017]. Our main result

in this paper is summarized in the theorem below.

Theorem 1.

There is a deterministic

O(1)

-approximation algorithm for fully dynamic facility location

problem that has

O(log m)

amortized recourse, under the natural assumption that the distances

between facilities and clients and the facility opening-costs are bounded by some polynomial in m.

Our Technique.

To the best of our knowledge, this is the ﬁrst algorithm for fully dynamic facility

location that can handle non-uniform facility costs and maintains a

O(1)

approximation with polylog-

arithmic recourse per update. To obtain our result, we consider a relaxed version of the natural greedy

algorithm in the static setting. This relaxed greedy algorithm proceeds in iterations. In each iteration

the algorithm picks a cluster, which speciﬁes a set of currently unassigned clients and the facility

they will get assigned to, that has approximately minimum average marginal cost. The algorithm

stops when every client gets assigned to some facility. In the dynamic setting, we ensure that the

solution maintained by our algorithm always corresponds to an output of the relaxed greedy algorithm

(assuming the relaxed-greedy algorithm receives the current input instance). To be a bit more precise,

we formulate a set of invariants which capture the workings of the relaxed greedy algorithm, and ﬁx

the invariants using a simple heuristic whenever one or more of them get violated in the dynamic

setting. The approximation guarantee now follows from the observation that the relaxed greedy

algorithm returns a

O(1)

approximation in the static setting. The recourse bound, on the other hand,

follows from a careful token-based argument that is inspired by the analysis of a fully dynamic greedy

algorithm for minimum set cover Gupta et al. [2017]. In addition, using standard data structures it is

easy to show that the amortized update time of our algorithm is

O(m)

. We note that this update time

is near-optimum, as each arriving client needs to reveal its distances to the facilities, which also holds

for the uniform facility opening cost as pointed out in Cohen-Addad et al. [2019]2.

We also complement our theoretical analysis with an in-depth study of the performance of our

algorithm showing that our algorithm is very efﬁcient and effective in practice.

Related Works.

Facility location and dynamic algorithms have been extensively studied in the

algorithm and machine learning literature. Here we give an overview of closely related results.

Facility Location. The classic ofﬂine metric facility location problem has been extensively studied

(see Korte and Vygen [2018]). The best-known approximation for the problem algorithm gives a

1.488

approximation Li [2013]. The problem is known to be NP-hard, and it cannot be approximated

within a factor of 1.463 unless NP ⊆DTIME(nlog log(n))Guha and Khuller [1999].

Dynamic algorithm for clustering and facility location. Recently several dynamic algorithms have

been introduced for clustering and facility location problems Lattanzi and Vassilvitskii [2017], Chan

et al. [2018], Cygan et al. [2018], Goranci et al. [2018], Cohen-Addad et al. [2019], Henzinger

et al. [2020], Guo et al. [2020], Goranci et al. [2021], Fichtenberger et al. [2021]. The recourse for

clustering problems was studied in Lattanzi and Vassilvitskii [2017], then several papers followed up

with interesting results on recourse and dynamic algorithms Chan et al. [2018], Cohen-Addad et al.

[2019], Henzinger et al. [2020], Fichtenberger et al. [2021].

Efﬁcient algorithm for the fully-dynamic facility location problem are known either when the doubling

dimension of the problem is bounded Goranci et al. [2018, 2021] or when the facility have uniform

In some works, some assumptions on the form of the input are make, which allow avoiding an

Ω(m)

factor,

see e.g. Guo et al. [2020] where each arriving client reports its nearest facility uppon arrival.

cost Cygan et al. [2018], Cohen-Addad et al. [2019]. When the facilities can have arbitrary non-

uniform costs in a general metric space, prior to our work the best known fully dynamic algorithm

achieved a O(log |F|)approximation with polylogarithmic recourse Guo et al. [2020].

2 Our Dynamic Algorithm

In Section 2.1, we present an intuitive overview of our algorithm. We formally describe our algorithm

in Section 2.2 and Section 2.3, and analyze its recourse and approximation guarantee in Section 2.4.

2.1 An Informal Overview of Our Algorithm

We start by recalling a natural greedy algorithm for the facility location problem in the static setting.

This (static) algorithm is known to return a constant approximation for the problem Jain et al. [2003].

The algorithm proceeds in rounds. Throughout the duration of the algorithm, let

F∗⊆F

and

D∗⊆D

respectively denoted the set of opened facilities and the set of clients that are already

assigned to some open facility. Initially, before the start of the ﬁrst round, we set

F∗← ∅

and

D∗← ∅. The next paragraph describes what happens during a given round of this algorithm.

Say that a cluster

is an ordered pair

(i, A)

, where

i∈F

and

A⊆D

. A cluster is either

satellite or critical. If a cluster

C= (i, A)

is satellite (resp. critical), then its cost is given by

cost(C) := Pj∈Adij

(resp.

cost(C) := fi+Pj∈Adij

). Thus, the only difference between an

satellite cluster and a critical cluster is that the cost of the former (resp. latter) does not (resp. does)

include the opening cost of the concerned facility. The average cost of a cluster

C= (i, A)

is deﬁned

costavg(C) := cost(C)/|A|

. Let

Cord(F∗, D∗)

be the set of all satellite clusters

C= (i, A)

with

i∈F∗

and

A⊆D\D∗

, and let

Ccrit(F∗, D∗)

be the set of all critical clusters

C= (i, A)

with

i∈F\F∗

and

A⊆D\D∗

. In the current round, we pick a cluster

C= (i, A)∈

Ccrit(F∗, D∗)∪ Cord(F∗, D∗)

with minimum average cost, set

F∗←F∗∪ {i}

and

D∗←D∗∪A

and assign all the clients

j∈A

to the facility

. At this point, if

D∗=D

, then we terminate the

algorithm. Otherwise, we proceed to the next round.

Intuitively, in each round the above greedy algorithm picks a cluster with minimum average cost,

which is deﬁned in such a way that takes into account the opening cost of a facility the ﬁrst time some

client gets assigned to it. Let

denote the collection of all clusters picked by this algorithm, across all

the rounds. Note: (1) Every client

j∈A

belongs to exactly one cluster in

. (2) Let

C(i)⊆ C

denote

the subset of clusters in

which contain a given facility

i∈F

. If

C(i)6=∅

, then exactly one of the

clusters in

C(i)

is critical. We refer to a collection

which satisfy these two properties as a clustering.

It is easy to check that a clustering

deﬁnes a valid solution to the concerned instance of the facility

location problem, where the objective value of the solution is given by obj(C) := PC∈C cost(C).

Note that if the above algorithm picks an satellite cluster

C= (i, A)

in a given round, then w.l.o.g. we

can assume that

|A|= 1

. This is because if

|A|>1

, then we can only decrease the average cost of

by deleting from

all but the one client that is closest to facility

. Accordingly, from now on we

will only consider those satellite clusters that have exactly one client.

Our dynamic algorithm is based on a relaxation of the above static greedy algorithm, where in each

round we have the ﬂexibility of picking a cluster with approximately minimum average cost. The

output of this relaxed greedy algorithm corresponds to what we call a nice clustering (see Section 2.2

for a formal deﬁnition). We now present a high-level, informal overview of our dynamic algorithm.

Preprocessing:

Upon receiving an input

(F, D)

at preprocessing, we run the relaxed greedy algorithm

which returns a nice clustering

. We assign each cluster

C∈ C

to an (not necessarily positive) integer

level

`(C)∈Z

such that

costavg(C) = Θ(2`(C))

. Thus, the level of a cluster encodes its average

cost upto a constant multiplicative factor. Deﬁne the level of a client j∈Cto be `(j) := `(C).

Handling an update:

When a client

gets deleted from

, we simply delete the concerned client

from the cluster in

it appears in. Similarly, when a client

gets inserted into

, we arbitrarily

assign it to any open facility

by creating an satellite cluster

(C= (i, {j})

at the minimum possible

level

k≥Θ(log dij )

. At this point, we check if the clustering

being maintained by our algorithm

still remains nice, i.e., whether

can correspond to an output of the relaxed greedy algorithm on the

current input. If the answer is yes, then our dynamic algorithm is done with handling the current

update. Thus, for the rest of this discussion, assume that the clustering Cis no longer nice.

It turns out that in this event we can always identify a cluster

C= (i, A)

(not necessarily part of

) and a level

k∈Z

such that:

costavg(C)≤O(2k)

and

`(j)> k

for all clients

j∈A

. We refer

to such a cluster

C= (i, A)

as a blocking cluster. Intuitively, the existence of a blocking cluster is

a certiﬁcate that the current clustering

is not nice. This is because the relaxed greedy algorithm

constructs the levels in a bottom-up manner, in increasing order of average costs of the clusters that

get added to the solution in successive rounds. Accordingly, the relaxed greedy algorithm would have

added the cluster

to its solution at level

≤k

before proceeding to level

k+ 1

, and this contradicts

the fact that a client j∈Aappears at level > k in the current clustering C.

As long as there is a blocking cluster

C= (i, A)

at some level

, we update the current clustering

by calling a subroutine FIX-BLOCKING

(C, k)

which works as follows: (1) it adds the cluster

level k, and (2) it removes each client j∈Afrom the cluster in Cit belonged to prior to this step.

Next, note that because of step (2) above, some cluster

C0= (i0, A0)

might lose one or more clients

as they move from

to the newly formed cluster

, and this might increase the average cost of

the cluster

. Thus, we might potentially end up in a situation where a cluster

C0= (i0, A0)

has

costavg(C0)2`(C0)

. As long as this happens to be the case, we call a subroutine FIX-LEVEL

(C)

whose job is to increase the level of the affected cluster C0by one.

Our algorithm repeatedly calls the subroutines FIX-BLOCKING

(., .)

and FIX-LEVEL

(.)

until there

is no blocking cluster and every cluster

C∈ C

has

costavg(C) = Θ(2`(C))

. At this point, we are

guaranteed that the current clustering is nice and we are done with processing the concerned update.

The approximation ratio of our algorithm follows from the observation that a nice clustering cor-

responds to the output of the relaxed greedy algorithm in the static setting, which in turn gives a

O(1)-approximation. We bound the amortized recourse by a careful token-based argument.

2.2 Nice clustering

In Section 2.1, we explained that a nice clustering corresponds to an output of the relaxed greedy

algorithm in the static setting. We now give a formal deﬁnition of this concept by introducing four

invariants. Speciﬁcally, we deﬁne a clustering Cto be nice iff it satisﬁes Invariants 1, 2, 3 and 4.

Every cluster

C∈ C

is assigned a (not necessarily positive) integer level

`(C)∈Z

. For any cluster

C= (i, A)∈ C and any client j∈A, we deﬁne the level of the client jto be `(j) := `(C).

Invariant 1. For every cluster C∈ C, we have costavg (C)<2`(C).

The relaxed greedy algorithm constructs these levels in a “bottom-up” manner: it assigns the relevant

clusters to a level

before moving on to level

k+ 1

. This holds because the algorithm picks the

clusters in (approximately) increasing order of their average costs. Also, before picking an satellite

cluster C= (i, {j}), the algorithm ensures that facility iis open. This leads to the invariant below.

Invariant 2.

Consider any facility

i∈F

with

C(i)6=∅

. Then the unique critical cluster

C∗∈ C(i)

satisﬁes: `(C∗)≤`(C)for all C∈ C(i).

The next invariant captures the fact that in an output of the relaxed greedy algorithm, if a client

j∈D

gets assigned to a facility i∈Fthen `(j)≥Θ(log dij ).

Invariant 3.

For every cluster

C= (i, A)∈ C

and every client

j∈A

, we have

`(j)≥κ∗

, where

κ∗

ij is the unique level s.t. 2κ∗

ij −4≤dij <2κ∗

ij −3.

Finally, we formulate an invariant which captures the fact that the relaxed greedy algorithm picks the

clusters in an (approximately) increasing order of their average costs. Towards this end, we deﬁne the

notion of a blocking cluster, as described below.

Deﬁnition 1.

Consider any clustering

, any level

k∈Z

, and any cluster

C= (i, A)

that does not

necessarily belong to

. The cluster

is a blocking cluster at level

w.r.t.

iff three conditions hold.

(a) We have

costavg(C)<2k−3

. (b) For every

j∈A

, we have

`(j)> k ≥κ∗

, where

κ∗

is the

unique level s.t.

2κ∗

ij −4≤dij <2κ∗

ij −3

. (c) If

is an satellite cluster, then there is a critical cluster

C∗∈ C(i)with `(C∗)≤k. Else if Cis a critical cluster, then k≤`(C0)for all C0∈ C(i).

Intuitively, if

C= (i, A)

is a blocking cluster at level

w.r.t.

, then the relaxed greedy algorithm

should have formed the cluster

at some level

k0< k

before proceeding to construct the subsequent

levels which currently contain all the clients in A. This leads us to the invariant below.3

Invariant 4. There is no blocking cluster w.r.t. the clustering Cat any level k∈Z.

Remark.

Consider a cluster

C= (i, A)∈ C

at level

`(C) = k

. If

costavg(C)2k

, then it is not

difﬁcult to show that there exists a subset of clients

A0⊆A

such that

C0= (i, A0)

is a blocking

cluster at level

k−1

. This, along with Invariant 1, implies that

costavg(C) = Θ(2`(C))

for all

C∈ C

The next theorem holds since a nice clustering corresponds to the output of the relaxed greedy

algorithm, and since the (original) greedy algorithm is known to have an approximation ratio of

O(1)

We defer the proof of Theorem 2 to Appendix B.

Theorem 2.

Any nice clustering forms a

O(1)

-approximate optimal solution to the concerned input

instance of the facility location problem.

2.3 Description of Our Dynamic Algorithm

In the dynamic setting, we simply maintain a nice clustering. To be more speciﬁc, w.l.o.g. we assume

that

D=∅

at preprocessing. Subsequently, during each update, a client gets inserted into / deleted

from the set D. We now describe how our algorithm handles an update.

Handling the deletion of a client j:

Suppose that client

belonged to the cluster

C= (i, A)∈ C

just before getting deleted. We set

A←A\ {j}

, and then we call the subroutine described in

Figure 1.

Handling the insertion of a client j:

We identify any open facility

i∈F

Let

κ∗

ij ∈Z

be the

unique level such that

2κ∗

ij −4≤dij <2κ∗

ij −3

. Let

C= (i, A)

be the unique critical cluster with

facility

. If

κ∗

ij ≤`(C)

, then we set

A←A∪ {j}

. In this case the client

becomes part of the

critical cluster

at level

`(C)

. Else if

κ∗

ij > `(C)

, then we create a new satellite cluster

(i, {j})

level

`((i, {j})) = κ∗

, and set

C ← C ∪ {(i, j)}

. Next, we call the subroutine described in Figure 1.

The subroutine

FIX-CLUSTERING

(.):

The insertion / deletion of a client might lead to a violation

of Invariant 4 or Invariant 1 (the procedure described in the preceding two paragraphs ensure that

Invariant 2 and Invariant 3 continue to remain satisﬁed). The subroutine FIX-CLUSTERING(.) handles

this issue. It repeatedly checks whether Invariant 4 or Invariant 1 is violated, and accordingly calls

FIX-BLOCKING

(C, k)

or FIX-LEVEL

(C)

. Each of these last two subroutines has the property that

it does not lead to any new violation of Invariant 2 or Invariant 3. Thus, when the WHILE loop in

Figure 1 terminates, all the invariants are restored and we end up with a nice clustering.

1. WHILE either Invariant 4 or Invariant 1 is violated:

2. IFInvariant 4 is violated, THEN

3. Find a cluster C= (i, A)that is blocking at level k(say).

4. Call the subroutine FIX-BLOCKING(C, k).

5. ELSE IFInvariant 1 is violated, THEN

6. Find a cluster C= (i, A)(say) that violates Invariant 1.

7. Call the subroutine FIX-LEVEL(C).

Figure 1: FIX-CLUSTERING(.).

The subroutine

FIX-BLOCKING

(C, k):

This subroutine works as follows. We ﬁrst add the cluster

C= (i, A)

to the current clustering. Towards this end, we set

`(C)←k

C ← C ∪ {C}

, and perform

the following operations for all clients j∈A.

•

Let

j= (i0

j, A0

be the cluster the client

belonged to just before the concerned call to

the subroutine FIX-BLOCKING(C, k). Set A0

j←A0

j\ {j}.

The condition

`(j)≥κ∗

guarantees that

, in some sense, is a minimal blocking cluster. This is because if

we remove a client violating this condition from the cluster

, then

costavg (C)

can never become

>2k−µ

. We

will use this condition while analyzing the amortized recourse of our dynamic algorithm.

is the ﬁrst client being inserted, then there is no existing open facility. In this case, we identify the

facility

i∈F

which minimizes

dij +fi

, create a critical cluster

C= (i, {j})

and assign it to the level

k∈Z

such that dij +fi∈2k−4,2k−3.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientandStableFullyDynamicFacilityLocationSayanBhattacharyaDepartmentofComputerScienceUniversityofWarwickCoventry,CV47AL,UnitedKingdoms.bhattacharya@warwick.ac.ukSilvioLattanziGoogleResearchsilviol@google.comNikosParotsidisGoogleResearchnikosp@google.comAbstractWeconsidertheclassicfacilitylocatio...

展开>> 收起<<

Efﬁcient and Stable Fully Dynamic Facility Location Sayan Bhattacharya Department of Computer Science.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient and Stable Fully Dynamic Facility Location Sayan Bhattacharya Department of Computer Science

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: