Efficient and Stable Fully Dynamic Facility Location Sayan Bhattacharya Department of Computer Science

2025-05-03 0 0 1012.6KB 25 页 10玖币
侵权投诉
Efficient and Stable Fully Dynamic Facility Location
Sayan Bhattacharya
Department of Computer Science
University of Warwick
Coventry, CV47AL, United Kingdom
s.bhattacharya@warwick.ac.uk
Silvio Lattanzi
Google Research
silviol@google.com
Nikos Parotsidis
Google Research
nikosp@google.com
Abstract
We consider the classic facility location problem in fully dynamic data streams,
where elements can be both inserted and deleted. In this problem, one is interested
in maintaining a stable and high quality solution throughout the data stream while
using only little time per update (insertion or deletion). We study the problem and
provide the first algorithm that at the same time maintains a constant approximation
and incurs polylogarithmic amortized recourse per update. We complement our
theoretical results with an experimental analysis showing the practical efficiency of
our method.
1 Introduction
Clustering is a fundamental problem in unsupervised learning with many practical applications in
community detection, spam detection, image segmentation and many others. A central problem in
this space with a long and rich history in computer science and operations research Korte and Vygen
[2018] is the facility location problem(which can also be seen as the Lagrangian relaxation of the
classic k–median clustering). Due to its scientific importance and practical applications the problem
has been studied extensively and several approximation algorithms are known for the problem Guha
and Khuller [1999], Jain et al. [2003], Li [2013]. In addition, the problem has also been extensively
studied in various computational model as the streaming model Indyk [2004], Lammersen and
Sohler [2008], Czumaj et al. [2013], the online model Meyerson [2001], Fotakis [2008] and dynamic
algorithm model Cygan et al. [2018], Goranci et al. [2018], Cohen-Addad et al. [2019], Guo et al.
[2020] and many more.
Real world applications nowadays often process evolving data-sets, i.e. social networks continuously
evolve in time, videos and pictures are constantly uploaded and taken down from media platforms,
news article and blog posted are uploaded or taken down and so on so for. For this reason, it is
important to design algorithms that are able to maintain a stable and high quality solution and
that at the same time can process updates efficiently. As a consequence, the dynamic and online
model have been extensively studied and several interesting results are known for classic learning
problems Lattanzi and Vassilvitskii [2017], Chan et al. [2018], Cohen-Addad et al. [2019], Jaghargh
et al. [2019], Lattanzi et al. [2020], Fichtenberger et al. [2021], Guo et al. [2021]. In this paper, we
extend this line of work by studying the classic facility location problem in the fully dynamic setting.
Problem definition.
The input to our problem consists of a collection
F
of facilities and a collection
D
of clients in a general metric space. For all
iF, j D
, we let
dij
denote the distance between
facility
i
and client
j1
. These distances satisfy triangle inequality. Let
fi
denote the opening cost of
facility
iF
. Our goal is to open a subset of facilities
F0F
and then connect every client
jD
to some open facility
ijF0
, so as to minimise the objective
PiF0fi+PjDdij,j
. The first sum
1For simplicity, we assume that all the distances between facility iand client jare polynomially bounded.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.13880v1 [cs.DS] 25 Oct 2022
in the objective denotes the sum of the opening costs of the facilities in
F0
, whereas the second sum
denotes the sum of the connection costs for the clients. Throughout the rest of this paper, we will let
m=|F|and n=|D|respectively denote the number of facilities and clients.
We consider a dynamic setting, where the input to the problem keeps changing via a sequence of
updates. Each update inserts or deletes a client in the metric space. Throughout this sequence of
updates, we wish to maintain an approximately optimal solution to the current input instance with
small recourse. The recourse of the algorithm is defined as follows. Whenever we open a new facility
or close an already open facility, we incur one unit of facility-recourse. Similarly, whenever we
reassign a client from one facility to another, we incur one unit of client-recourse. The recourse of the
algorithm after an update is the sum of facility-recourse and client-recourse. Note that the recourse
of an algorithm naturally captures the consistency of the solution throughout the input updates. So
having small recourse is particularly important for real world application where a clustering is served
to the user or used in a machine learning pipeline Lattanzi and Vassilvitskii [2017]. Our main result
in this paper is summarized in the theorem below.
Theorem 1.
There is a deterministic
O(1)
-approximation algorithm for fully dynamic facility location
problem that has
O(log m)
amortized recourse, under the natural assumption that the distances
between facilities and clients and the facility opening-costs are bounded by some polynomial in m.
Our Technique.
To the best of our knowledge, this is the first algorithm for fully dynamic facility
location that can handle non-uniform facility costs and maintains a
O(1)
approximation with polylog-
arithmic recourse per update. To obtain our result, we consider a relaxed version of the natural greedy
algorithm in the static setting. This relaxed greedy algorithm proceeds in iterations. In each iteration
the algorithm picks a cluster, which specifies a set of currently unassigned clients and the facility
they will get assigned to, that has approximately minimum average marginal cost. The algorithm
stops when every client gets assigned to some facility. In the dynamic setting, we ensure that the
solution maintained by our algorithm always corresponds to an output of the relaxed greedy algorithm
(assuming the relaxed-greedy algorithm receives the current input instance). To be a bit more precise,
we formulate a set of invariants which capture the workings of the relaxed greedy algorithm, and fix
the invariants using a simple heuristic whenever one or more of them get violated in the dynamic
setting. The approximation guarantee now follows from the observation that the relaxed greedy
algorithm returns a
O(1)
approximation in the static setting. The recourse bound, on the other hand,
follows from a careful token-based argument that is inspired by the analysis of a fully dynamic greedy
algorithm for minimum set cover Gupta et al. [2017]. In addition, using standard data structures it is
easy to show that the amortized update time of our algorithm is
˜
O(m)
. We note that this update time
is near-optimum, as each arriving client needs to reveal its distances to the facilities, which also holds
for the uniform facility opening cost as pointed out in Cohen-Addad et al. [2019]2.
We also complement our theoretical analysis with an in-depth study of the performance of our
algorithm showing that our algorithm is very efficient and effective in practice.
Related Works.
Facility location and dynamic algorithms have been extensively studied in the
algorithm and machine learning literature. Here we give an overview of closely related results.
Facility Location. The classic offline metric facility location problem has been extensively studied
(see Korte and Vygen [2018]). The best-known approximation for the problem algorithm gives a
1.488
approximation Li [2013]. The problem is known to be NP-hard, and it cannot be approximated
within a factor of 1.463 unless NP DTIME(nlog log(n))Guha and Khuller [1999].
Dynamic algorithm for clustering and facility location. Recently several dynamic algorithms have
been introduced for clustering and facility location problems Lattanzi and Vassilvitskii [2017], Chan
et al. [2018], Cygan et al. [2018], Goranci et al. [2018], Cohen-Addad et al. [2019], Henzinger
et al. [2020], Guo et al. [2020], Goranci et al. [2021], Fichtenberger et al. [2021]. The recourse for
clustering problems was studied in Lattanzi and Vassilvitskii [2017], then several papers followed up
with interesting results on recourse and dynamic algorithms Chan et al. [2018], Cohen-Addad et al.
[2019], Henzinger et al. [2020], Fichtenberger et al. [2021].
Efficient algorithm for the fully-dynamic facility location problem are known either when the doubling
dimension of the problem is bounded Goranci et al. [2018, 2021] or when the facility have uniform
2
In some works, some assumptions on the form of the input are make, which allow avoiding an
Ω(m)
factor,
see e.g. Guo et al. [2020] where each arriving client reports its nearest facility uppon arrival.
2
cost Cygan et al. [2018], Cohen-Addad et al. [2019]. When the facilities can have arbitrary non-
uniform costs in a general metric space, prior to our work the best known fully dynamic algorithm
achieved a O(log |F|)approximation with polylogarithmic recourse Guo et al. [2020].
2 Our Dynamic Algorithm
In Section 2.1, we present an intuitive overview of our algorithm. We formally describe our algorithm
in Section 2.2 and Section 2.3, and analyze its recourse and approximation guarantee in Section 2.4.
2.1 An Informal Overview of Our Algorithm
We start by recalling a natural greedy algorithm for the facility location problem in the static setting.
This (static) algorithm is known to return a constant approximation for the problem Jain et al. [2003].
The algorithm proceeds in rounds. Throughout the duration of the algorithm, let
FF
and
DD
respectively denoted the set of opened facilities and the set of clients that are already
assigned to some open facility. Initially, before the start of the first round, we set
F← ∅
and
D← ∅. The next paragraph describes what happens during a given round of this algorithm.
Say that a cluster
C
is an ordered pair
(i, A)
, where
iF
and
AD
. A cluster is either
satellite or critical. If a cluster
C= (i, A)
is satellite (resp. critical), then its cost is given by
cost(C) := PjAdij
(resp.
cost(C) := fi+PjAdij
). Thus, the only difference between an
satellite cluster and a critical cluster is that the cost of the former (resp. latter) does not (resp. does)
include the opening cost of the concerned facility. The average cost of a cluster
C= (i, A)
is defined
as
costavg(C) := cost(C)/|A|
. Let
Cord(F, D)
be the set of all satellite clusters
C= (i, A)
with
iF
and
AD\D
, and let
Ccrit(F, D)
be the set of all critical clusters
C= (i, A)
with
iF\F
and
AD\D
. In the current round, we pick a cluster
C= (i, A)
Ccrit(F, D)∪ Cord(F, D)
with minimum average cost, set
FF∪ {i}
and
DDA
,
and assign all the clients
jA
to the facility
i
. At this point, if
D=D
, then we terminate the
algorithm. Otherwise, we proceed to the next round.
Intuitively, in each round the above greedy algorithm picks a cluster with minimum average cost,
which is defined in such a way that takes into account the opening cost of a facility the first time some
client gets assigned to it. Let
C
denote the collection of all clusters picked by this algorithm, across all
the rounds. Note: (1) Every client
jA
belongs to exactly one cluster in
C
. (2) Let
C(i)⊆ C
denote
the subset of clusters in
C
which contain a given facility
iF
. If
C(i)6=
, then exactly one of the
clusters in
C(i)
is critical. We refer to a collection
C
which satisfy these two properties as a clustering.
It is easy to check that a clustering
C
defines a valid solution to the concerned instance of the facility
location problem, where the objective value of the solution is given by obj(C) := PC∈C cost(C).
Note that if the above algorithm picks an satellite cluster
C= (i, A)
in a given round, then w.l.o.g. we
can assume that
|A|= 1
. This is because if
|A|>1
, then we can only decrease the average cost of
C
by deleting from
A
all but the one client that is closest to facility
i
. Accordingly, from now on we
will only consider those satellite clusters that have exactly one client.
Our dynamic algorithm is based on a relaxation of the above static greedy algorithm, where in each
round we have the flexibility of picking a cluster with approximately minimum average cost. The
output of this relaxed greedy algorithm corresponds to what we call a nice clustering (see Section 2.2
for a formal definition). We now present a high-level, informal overview of our dynamic algorithm.
Preprocessing:
Upon receiving an input
(F, D)
at preprocessing, we run the relaxed greedy algorithm
which returns a nice clustering
C
. We assign each cluster
C∈ C
to an (not necessarily positive) integer
level
`(C)Z
such that
costavg(C) = Θ(2`(C))
. Thus, the level of a cluster encodes its average
cost upto a constant multiplicative factor. Define the level of a client jCto be `(j) := `(C).
Handling an update:
When a client
j
gets deleted from
D
, we simply delete the concerned client
from the cluster in
C
it appears in. Similarly, when a client
j
gets inserted into
D
, we arbitrarily
assign it to any open facility
i
by creating an satellite cluster
(C= (i, {j})
at the minimum possible
level
kΘ(log dij )
. At this point, we check if the clustering
C
being maintained by our algorithm
still remains nice, i.e., whether
C
can correspond to an output of the relaxed greedy algorithm on the
current input. If the answer is yes, then our dynamic algorithm is done with handling the current
update. Thus, for the rest of this discussion, assume that the clustering Cis no longer nice.
3
It turns out that in this event we can always identify a cluster
C= (i, A)
(not necessarily part of
C
) and a level
kZ
such that:
costavg(C)O(2k)
and
`(j)> k
for all clients
jA
. We refer
to such a cluster
C= (i, A)
as a blocking cluster. Intuitively, the existence of a blocking cluster is
a certificate that the current clustering
C
is not nice. This is because the relaxed greedy algorithm
constructs the levels in a bottom-up manner, in increasing order of average costs of the clusters that
get added to the solution in successive rounds. Accordingly, the relaxed greedy algorithm would have
added the cluster
C
to its solution at level
k
before proceeding to level
k+ 1
, and this contradicts
the fact that a client jAappears at level > k in the current clustering C.
As long as there is a blocking cluster
C= (i, A)
at some level
k
, we update the current clustering
C
by calling a subroutine FIX-BLOCKING
(C, k)
which works as follows: (1) it adds the cluster
C
to
level k, and (2) it removes each client jAfrom the cluster in Cit belonged to prior to this step.
Next, note that because of step (2) above, some cluster
C0= (i0, A0)
might lose one or more clients
j
as they move from
C0
to the newly formed cluster
C
, and this might increase the average cost of
the cluster
C0
. Thus, we might potentially end up in a situation where a cluster
C0= (i0, A0)
has
costavg(C0)2`(C0)
. As long as this happens to be the case, we call a subroutine FIX-LEVEL
(C)
whose job is to increase the level of the affected cluster C0by one.
Our algorithm repeatedly calls the subroutines FIX-BLOCKING
(., .)
and FIX-LEVEL
(.)
until there
is no blocking cluster and every cluster
C∈ C
has
costavg(C) = Θ(2`(C))
. At this point, we are
guaranteed that the current clustering is nice and we are done with processing the concerned update.
The approximation ratio of our algorithm follows from the observation that a nice clustering cor-
responds to the output of the relaxed greedy algorithm in the static setting, which in turn gives a
O(1)-approximation. We bound the amortized recourse by a careful token-based argument.
2.2 Nice clustering
In Section 2.1, we explained that a nice clustering corresponds to an output of the relaxed greedy
algorithm in the static setting. We now give a formal definition of this concept by introducing four
invariants. Specifically, we define a clustering Cto be nice iff it satisfies Invariants 1, 2, 3 and 4.
Every cluster
C∈ C
is assigned a (not necessarily positive) integer level
`(C)Z
. For any cluster
C= (i, A)∈ C and any client jA, we define the level of the client jto be `(j) := `(C).
Invariant 1. For every cluster C∈ C, we have costavg (C)<2`(C).
The relaxed greedy algorithm constructs these levels in a “bottom-up” manner: it assigns the relevant
clusters to a level
k
before moving on to level
k+ 1
. This holds because the algorithm picks the
clusters in (approximately) increasing order of their average costs. Also, before picking an satellite
cluster C= (i, {j}), the algorithm ensures that facility iis open. This leads to the invariant below.
Invariant 2.
Consider any facility
iF
with
C(i)6=
. Then the unique critical cluster
C∈ C(i)
satisfies: `(C)`(C)for all C∈ C(i).
The next invariant captures the fact that in an output of the relaxed greedy algorithm, if a client
jD
gets assigned to a facility iFthen `(j)Θ(log dij ).
Invariant 3.
For every cluster
C= (i, A)∈ C
and every client
jA
, we have
`(j)κ
ij
, where
κ
ij is the unique level s.t. 2κ
ij 4dij <2κ
ij 3.
Finally, we formulate an invariant which captures the fact that the relaxed greedy algorithm picks the
clusters in an (approximately) increasing order of their average costs. Towards this end, we define the
notion of a blocking cluster, as described below.
Definition 1.
Consider any clustering
C
, any level
kZ
, and any cluster
C= (i, A)
that does not
necessarily belong to
C
. The cluster
C
is a blocking cluster at level
k
w.r.t.
C
iff three conditions hold.
(a) We have
costavg(C)<2k3
. (b) For every
jA
, we have
`(j)> k κ
ij
, where
κ
ij
is the
unique level s.t.
2κ
ij 4dij <2κ
ij 3
. (c) If
C
is an satellite cluster, then there is a critical cluster
C∈ C(i)with `(C)k. Else if Cis a critical cluster, then k`(C0)for all C0∈ C(i).
4
Intuitively, if
C= (i, A)
is a blocking cluster at level
k
w.r.t.
C
, then the relaxed greedy algorithm
should have formed the cluster
C
at some level
k0< k
before proceeding to construct the subsequent
levels which currently contain all the clients in A. This leads us to the invariant below.3
Invariant 4. There is no blocking cluster w.r.t. the clustering Cat any level kZ.
Remark.
Consider a cluster
C= (i, A)∈ C
at level
`(C) = k
. If
costavg(C)2k
, then it is not
difficult to show that there exists a subset of clients
A0A
such that
C0= (i, A0)
is a blocking
cluster at level
k1
. This, along with Invariant 1, implies that
costavg(C) = Θ(2`(C))
for all
C∈ C
.
The next theorem holds since a nice clustering corresponds to the output of the relaxed greedy
algorithm, and since the (original) greedy algorithm is known to have an approximation ratio of
O(1)
.
We defer the proof of Theorem 2 to Appendix B.
Theorem 2.
Any nice clustering forms a
O(1)
-approximate optimal solution to the concerned input
instance of the facility location problem.
2.3 Description of Our Dynamic Algorithm
In the dynamic setting, we simply maintain a nice clustering. To be more specific, w.l.o.g. we assume
that
D=
at preprocessing. Subsequently, during each update, a client gets inserted into / deleted
from the set D. We now describe how our algorithm handles an update.
Handling the deletion of a client j:
Suppose that client
j
belonged to the cluster
C= (i, A)∈ C
just before getting deleted. We set
AA\ {j}
, and then we call the subroutine described in
Figure 1.
Handling the insertion of a client j:
We identify any open facility
iF
.
4
Let
κ
ij Z
be the
unique level such that
2κ
ij 4dij <2κ
ij 3
. Let
C= (i, A)
be the unique critical cluster with
facility
i
. If
κ
ij `(C)
, then we set
AA∪ {j}
. In this case the client
j
becomes part of the
critical cluster
C
at level
`(C)
. Else if
κ
ij > `(C)
, then we create a new satellite cluster
(i, {j})
at
level
`((i, {j})) = κ
ij
, and set
C C ∪ {(i, j)}
. Next, we call the subroutine described in Figure 1.
The subroutine
FIX-CLUSTERING
(.):
The insertion / deletion of a client might lead to a violation
of Invariant 4 or Invariant 1 (the procedure described in the preceding two paragraphs ensure that
Invariant 2 and Invariant 3 continue to remain satisfied). The subroutine FIX-CLUSTERING(.) handles
this issue. It repeatedly checks whether Invariant 4 or Invariant 1 is violated, and accordingly calls
FIX-BLOCKING
(C, k)
or FIX-LEVEL
(C)
. Each of these last two subroutines has the property that
it does not lead to any new violation of Invariant 2 or Invariant 3. Thus, when the WHILE loop in
Figure 1 terminates, all the invariants are restored and we end up with a nice clustering.
1. WHILE either Invariant 4 or Invariant 1 is violated:
2. IFInvariant 4 is violated, THEN
3. Find a cluster C= (i, A)that is blocking at level k(say).
4. Call the subroutine FIX-BLOCKING(C, k).
5. ELSE IFInvariant 1 is violated, THEN
6. Find a cluster C= (i, A)(say) that violates Invariant 1.
7. Call the subroutine FIX-LEVEL(C).
Figure 1: FIX-CLUSTERING(.).
The subroutine
FIX-BLOCKING
(C, k):
This subroutine works as follows. We first add the cluster
C= (i, A)
to the current clustering. Towards this end, we set
`(C)k
,
C C ∪ {C}
, and perform
the following operations for all clients jA.
Let
C0
j= (i0
j, A0
j)
be the cluster the client
j
belonged to just before the concerned call to
the subroutine FIX-BLOCKING(C, k). Set A0
jA0
j\ {j}.
3
The condition
`(j)κ
ij
guarantees that
C
, in some sense, is a minimal blocking cluster. This is because if
we remove a client violating this condition from the cluster
C
, then
costavg (C)
can never become
>2kµ
. We
will use this condition while analyzing the amortized recourse of our dynamic algorithm.
4
If
j
is the first client being inserted, then there is no existing open facility. In this case, we identify the
facility
iF
which minimizes
dij +fi
, create a critical cluster
C= (i, {j})
and assign it to the level
kZ
such that dij +fi2k4,2k3.
5
摘要:

EfcientandStableFullyDynamicFacilityLocationSayanBhattacharyaDepartmentofComputerScienceUniversityofWarwickCoventry,CV47AL,UnitedKingdoms.bhattacharya@warwick.ac.ukSilvioLattanziGoogleResearchsilviol@google.comNikosParotsidisGoogleResearchnikosp@google.comAbstractWeconsidertheclassicfacilitylocatio...

展开>> 收起<<
Efficient and Stable Fully Dynamic Facility Location Sayan Bhattacharya Department of Computer Science.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:1012.6KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注