Bures-Wasserstein Barycenters and Low-Rank Matrix Re- covery Tyler Maunu maunubrandeis.edu

2025-05-06 0 0 2.12MB 31 页 10玖币
侵权投诉
Bures-Wasserstein Barycenters and Low-Rank Matrix Re-
covery
Tyler Maunu maunu@brandeis.edu
Brandeis University
Thibaut Le Gouic thibaut.le_gouic@math.cnrs.fr
École Centrale de Marseille
Philippe Rigollet rigollet@math.mit.edu
MIT
Abstract
We revisit the problem of recovering a low-rank positive semidefinite matrix from rank-one
projections using tools from optimal transport. More specifically, we show that a variational
formulation of this problem is equivalent to computing a Wasserstein barycenter. In turn,
this new perspective enables the development of new geometric first-order methods with
strong convergence guarantees in Bures-Wasserstein distance. Experiments on simulated
data demonstrate the advantages of our new methodology over existing methods.
1 Introduction
Recovering a low-rank matrix is a fundamental primitive across many settings, such as matrix completion
(Fazel,2002;Candès and Recht,2009;Candès and Tao,2010), phase retrieval (Candès et al.,2015), principal
component analysis (Pearson,1901;Hotelling,1933), robust subspace recovery (Lerman and Maunu,2018),
and robust principal component analysis (Chandrasekaran et al.,2009;Candès et al.,2011;Xu et al.,2010).
This line of work can be understood as a generalization of the classical compressed sensing question (Donoho,
2006;Candès et al.,2006), where the goal is the recovery of a sparse vector. This problem can be cast as
a low-rank recovery problem over diagonal matrices. In all of these settings, the assumption of a low-rank
structure is essential for efficient estimation and optimization in high-dimensional settings.
While the above applications all aim at recovering a low-rank matrix S, the observational—a.k.a sensing—
mechanism that governs access to Scomes in many declinations. For the purpose of applications, it is often
sufficient to focus on linear measurements of the form hS,Aifor some given sensing matrix A. This setup
covers a wide variety of applications ranging from covariance sketching (Chen et al.,2015) and low-rank
matrix completion (Candès and Plan,2010;Recht et al.,2010) to phase retrieval (Fienup,1978;Candès
et al.,2013;2015) and quantum state tomography (Gross et al.,2010). New solutions to this problem can
have many practical implications.
In this paper, we focus on a specific instantiation of this problem, where the measurement matrix A=xx>
is rank-one and positive semidefinite (PSD) so that hxx>,Si=x>Sx. This important case of the low-rank
matrix recovery problem has received significant attention over the past few years (Cai and Zhang,2015;
Chen et al.,2015;Sanghavi et al.,2017;Li et al.,2019).
arXiv:2210.14671v1 [math.OC] 26 Oct 2022
Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery
Assume that we observe
yi=x>
iSxi, i = 1, . . . , n (1.1)
where SRd×dis an unknown rank rPSD matrix and x1,...,xnare i.i.d from some distribution. Our goal
is to recover or estimate Sfrom the pairs (yi,xi),i= 1, . . . , n. Throughout, we denote by Sd
+the set of d×d
PSD matrices and Sd
++ is the set of positive definite (PD) matrices.
Finding a low-rank matrix Ssubject to constraints (1.1) is a semidefinite program (SDP) that can be
implemented in polynomial time using general-purpose solvers. Furthermore, the specific structure of this
SDP may be leveraged to derive faster algorithms. Such solutions include the Burer-Monteiro approach to
solving semidefinite programs (Burer and Monteiro,2003) or nonconvex gradient descent methods for low-
rank programs (Sanghavi et al.,2017). Often, these approaches result in nonconvex optimization programs
for which theoretical results are limited.
In this work, we take a principled approach to solving this problem by eliciting convexity using a specific
geometry on the the space of PSD matrices. More precisely, we employ the Bures-Wasserstein (hereafter BW)
geometry, which comes independently from optimal transport and quantum information theory (Bures,1969;
Bhatia et al.,2019). This geometry allows us to solve the original problem by computing a BW barycenter
(Agueh and Carlier,2011;Álvarez-Esteban et al.,2016;Chewi et al.,2020;Altschuler et al.,2021). In turn,
we employ geodesic gradient descent on the BW manifold to compute said barycenter. We propose both full
gradient and stochastic gradient based methods that are guaranteed to efficiently recover a low-rank matrix.
These methods have low computational cost (per iteration complexity of O(ndr)for gradient descent and
O(dr)for stochastic gradient descent), have minimal parameter tuning, are easily implemented, and show
excellent practical performance. We demonstrate an example application of phase retrieval in Figure 1. In
this set-up, BW gradient descent recovers the image faster than Wirtinger Flow (WF) (Candès et al.,2015),
and BW gradient descent needs no parameter tuning.
Main contributions. The main results of this paper are:
1. We prove that the barycenter of a certain distribution of rank-one Gaussians exactly recovers the
underlying low-rank matrix.
2. With this connection, we give novel geodesic gradient descent and stochastic geodesic gradient descent
algorithms for solving the low-rank PSD matrix recovery problem using existing first-order algorithms
for computing BW barycenters.
3. Existing first-order algorithms for computing BW barycenters are only guaranteed to work for full
rank distributions. Since our method considers barycenters of rank-one PSD matrices, we develop
new theory and give a guarantee of local linear convergence in BW distance for the gradient descent
method. We also discuss initialization of our method.
4. We demonstrate the competitive edge of our algorithms in a few experimental settings.
Related Work. Many methods have been proposed to solve variants of the matrix recovery problem. Orig-
inal ideas for this problem trace back to linear systems theory, low-rank matrix completion, low-dimensional
Euclidean embeddings, and image compression (Recht et al.,2010).
We focus here on rank-one projections of positive semidefinite matrices as in (1.1). This setup is either
specifically considered or a special case of a large number of works including (Candès et al.,2015;Cai and
Zhang,2015;Chen et al.,2015;Zhong et al.,2015;Wang and Giannakis,2016;Wang et al.,2017;Sanghavi
et al.,2017;Li et al.,2019). These methods can be clustered into two families. The first one aims at
2
Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery
Figure 1: Recovered images after 140 iterations of Wirtinger Flow (Candès et al.,2015) and BW gradient
descent (our method). BW gradient descent recovers a much sharper image within the same number of
iterations; note that the iteration complexity of both methods is the same.
minimizing convex relaxations of an energy functional that often based on the nuclear norm (Cai and Zhang,
2015;Chen et al.,2015). Such convex programs can be solved via standard solvers. Another family of methods
directly implement the low-rank constraint into a nonconvex constraint (Li et al.,2019) thus manipulating
candidate matrices with smaller representations and thereby boosting computational efficiency; see Chi et al.
(2019) for an overview of such algorithms. The special case where Shas rank one corresponds to the classical
phase retrieval problem and has received much attention with dedicated algorithms (Fienup,1978;Candès
et al.,2015;Wang and Giannakis,2016;Wang et al.,2017;Chi et al.,2019).
Note that the methods introduced in the present paper can be readily extended to the covariance recovery
problem framed in Cai and Zhang (2015), and that is similar to estimation in a random effects model.
Under this model, rather than (1.1), we observe yi=hxi,wii+i,where wiN(0,S), where N(0,S)
is the centered Gaussian distribution on Rdwith PSD covariance matrix S. The goal here is to recover
the covariance matrix Sof the weight vectors from observations of (yi,xi),i= 1, . . . , n. This problem has
natural connections to mixture of regressions models as well (De Veaux,1989;Yi et al.,2014;Zhong et al.,
2016;Sedghi et al.,2016).
In terms of the complexity of various methods, solving the semidefinite program using off-the-shelf solvers
takes O(nd2+d3)complexity. Gradient descent on PSD matrices (see the initialization phase in (Tu et al.,
2016)) can solve this with per-iteration complexity O(nd2), and one can prove linear convergence under certain
assumptions. The most directly comparable methods are nonconvex gradient descent (Li et al.,2019), which
3
Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery
have per-iteration complexity O(ndr)and utilize a Burer-Monteiro factorization (Burer and Monteiro,2003).
As we will see, our method also has per-iteration complexity of O(ndr).
Finally, we mention several works dedicated to the computation of the nonconvex Wasserstein barycenter
problem (Agueh and Carlier,2011;Álvarez-Esteban et al.,2016;Zemel and Panaretos,2019;Chewi et al.,
2020;Altschuler et al.,2021). No theoretical study in these works allows for rank deficiency.
Notation. Bold capital letters denote matrices while bold lower-case letters denote vectors. The Hilbert-
Schmidt inner product is ,·i, and ,·iγΣ=,·iΣis the Riemannian metric associated to the (fixed-rank)
BW manifold at Σ. Their corresponding norms are written as k·kand k·kΣ, respectively. The orthogonal
projection onto the column span of Σis PΣ=PSp(Σ). Similarly, P
Σis the projection onto null-space of Σ
(orthogonal complement of Sp(Σ)). The Dirac distribution at a point xdenoted by δx.
Outline. We begin by outlining our approach to matrix recovery in Section 2. We then give the main theo-
retical results for our approach in Section 3. After this, we give experiments demonstrating these advantages
in Section 4, and discuss the limitations of our work in Section 5.
2 The Bures-Wasserstein Barycenter Approach
Recall that we aim at find the rank rmatrix SSd
+given the observations (1.1). We will use the notation
yi=A(S)i=hxix>
i,Si,i= 1, . . . , n. To recover a low-rank matrix Sfrom these measurements, most past
work has focused on some form of energy minimization. For example, some works have looked at convex
nuclear norm minimization methods. Cai and Zhang (2015) and Chen et al. (2015) concurrently developed
a nuclear norm minimization procedure that solves
min
ΣSd
+,A(Σ)=y
Tr(Σ),(2.1)
where y= [y1, . . . , yn]>and A(Σ) = [A(Σ)1,...,A(Σ)n]>. One can directly solve the semidefinite program
using standard convex optimization packages. A Lagrangian formulation of (2.1) yields the energy 1
2kA(Σ)
yk2
2+λTr(Σ),which can also be minimized using a variety of methods.
In the following, we lay out our approach to the low-rank matrix recovery problem, which focuses on a
new energy minimization procedure. In Section 2.1 we discuss the common nonconvex approaches to matrix
recovery and outline our novel optimization program. Then, in Section 2.2, we discuss the BW barycenter
problem, and show how it recovers solutions to the energy minimization we propose. After this, in Section 2.3
we outline the first-order algorithms for computing BW barycenters. We finish in Section 2.4 by discussing a
regularization procedure that allows one to estimate higher rank proxies, from which it is possible to recover
S.
2.1 Nonconvex Approaches for Matrix Recovery
Suppose that we know an upper bound for the rank of the underlying matrix S. We could utilize this
information in a nonconvex optimization program such as
min
ΣSd
+,rank(Σ)r
1
2nkA(Σ)yk2
2.(2.2)
Without the rank restriction (i.e., r=d) this problem is in fact convex. For any fixed rd, we can
parameterize the rank rmatrices in Sd
+by U U >, for URd×r, which is now commonly referred to as
Burer-Monteiro factorization (Burer and Monteiro,2003). We thus define the set of PSD matrices of rank at
4
Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery
most rusing this factorization: Sd,r
+:= {ΣSd
+:Σ=U U >,URd×r}. With this parametrization, the
matrix recovery problem in (2.2) is equivalent to
min
URd×r
1
2kA(U U >)yk2
2.(2.3)
While past work has focused on these least squares formulations, there have not been many modifications
of this energy. We propose the following modifications to the energies (2.2) and (2.3):
min
ΣSd
+,rank(Σ)r
1
2nkpA(Σ)yk2
2= min
URd×r
1
2nkqA(U U >)yk2
2,(2.4)
where the square root is taken componentwise. As we demonstrate in the following sections, this problem
has a natural solution as a BW barycenter.
2.2 The Bures-Wasserstein Barycenter Problem
To explain the connection of (2.4) to BW barycenters, we will first explain how BW space arises from the
perspective of optimal transport (Villani,2009). Let P2(Rd)be the set of all measures on Rdwith finite
second moment. The 2-Wasserstein distance between measures µand νP2(Rd)is defined by
W2
2(µ, ν) = inf
πΠ(µ,ν)
E(x,y)πkxyk2,(2.5)
where Π(µ, ν)denotes the set of all couplings between µand ν(i.e., the set of all joint distributions on
Rd×Rdwith marginals µand ν). The 2-Wasserstein distance defines a metric over P2(Rd), and the resulting
geodesic metric space is referred to as 2-Wasserstein space.
Let N(Rd)denote the set of Gaussian distributions on Rd, and N0(Rd)be the set of centered Gaussian
distributions. Both are geodesically weakly convex subsets of 2-Wasserstein space, meaning there always
exist 2-Wasserstein geodesics between points in these sets that are contained within these sets. Letting
N(0,Σ)N0(Rd)denote the Gaussian distribution on Rdwith mean zero and covariance matrix ΣSd
+,
the 2-Wasserstein distance between N(0,Σ0)and N(0,Σ1)N0(Rd)has the explicit form
W2
2(N(0,Σ0), N(0,Σ1)) = Tr Σ0+Σ12(Σ1/2
0Σ1Σ1/2
0)1/2.(2.6)
Notice that this is purely a function of the covariance matrices, and so the Wasserstein distance induces a
distance metric on PSD matrices called the Bures-Wasserstein distance (Bhatia et al.,2019).To refer to this
distance over PSD matrices rather than the Gaussian distributions, we will write
dBW(Σ0,Σ1) = W2(N(0,Σ0), N (0,Σ1)).(2.7)
More than just giving the set of PSD matrices a distance metric, this identification endows Sd
+with a natural
Riemannian structure that it inherits from (N0(Rd), W2).
The barycenter problem seeks to generalize the notion of averages to non-Euclidean spaces. In the 2-
Wasserstein barycenter problem, one seeks a solution to
min
bP2(Rd)
1
2EµQW2
2(µ, b),(2.8)
where Qis a distribution over P2(Rd)with finite second moment, which we write as QP2(P2(Rd)). When
Qis supported on Gaussians, the minimum is achieved on Gaussians (Knott and Smith,1994;Agueh and
5
摘要:

Bures-WassersteinBarycentersandLow-RankMatrixRe-coveryTylerMaunumaunu@brandeis.eduBrandeisUniversityThibautLeGouicthibaut.le_gouic@math.cnrs.frÉcoleCentraledeMarseillePhilippeRigolletrigollet@math.mit.eduMITAbstractWerevisittheproblemofrecoveringalow-rankpositivesemidenitematrixfromrank-oneprojecti...

展开>> 收起<<
Bures-Wasserstein Barycenters and Low-Rank Matrix Re- covery Tyler Maunu maunubrandeis.edu.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:2.12MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注