Bures-Wasserstein Barycenters and Low-Rank Matrix Re- covery Tyler Maunu maunubrandeis.edu

2025-05-06 0 0 2.12MB 31 页 10玖币

侵权投诉

Bures-Wasserstein Barycenters and Low-Rank Matrix Re-

covery

Tyler Maunu maunu@brandeis.edu

Brandeis University

Thibaut Le Gouic thibaut.le_gouic@math.cnrs.fr

École Centrale de Marseille

Philippe Rigollet rigollet@math.mit.edu

MIT

Abstract

We revisit the problem of recovering a low-rank positive semideﬁnite matrix from rank-one

projections using tools from optimal transport. More speciﬁcally, we show that a variational

formulation of this problem is equivalent to computing a Wasserstein barycenter. In turn,

this new perspective enables the development of new geometric ﬁrst-order methods with

strong convergence guarantees in Bures-Wasserstein distance. Experiments on simulated

data demonstrate the advantages of our new methodology over existing methods.

1 Introduction

Recovering a low-rank matrix is a fundamental primitive across many settings, such as matrix completion

(Fazel,2002;Candès and Recht,2009;Candès and Tao,2010), phase retrieval (Candès et al.,2015), principal

component analysis (Pearson,1901;Hotelling,1933), robust subspace recovery (Lerman and Maunu,2018),

and robust principal component analysis (Chandrasekaran et al.,2009;Candès et al.,2011;Xu et al.,2010).

This line of work can be understood as a generalization of the classical compressed sensing question (Donoho,

2006;Candès et al.,2006), where the goal is the recovery of a sparse vector. This problem can be cast as

a low-rank recovery problem over diagonal matrices. In all of these settings, the assumption of a low-rank

structure is essential for eﬃcient estimation and optimization in high-dimensional settings.

While the above applications all aim at recovering a low-rank matrix S, the observational—a.k.a sensing—

mechanism that governs access to Scomes in many declinations. For the purpose of applications, it is often

suﬃcient to focus on linear measurements of the form hS,Aifor some given sensing matrix A. This setup

covers a wide variety of applications ranging from covariance sketching (Chen et al.,2015) and low-rank

matrix completion (Candès and Plan,2010;Recht et al.,2010) to phase retrieval (Fienup,1978;Candès

et al.,2013;2015) and quantum state tomography (Gross et al.,2010). New solutions to this problem can

have many practical implications.

In this paper, we focus on a speciﬁc instantiation of this problem, where the measurement matrix A=xx>

is rank-one and positive semideﬁnite (PSD) so that hxx>,Si=x>Sx. This important case of the low-rank

matrix recovery problem has received signiﬁcant attention over the past few years (Cai and Zhang,2015;

Chen et al.,2015;Sanghavi et al.,2017;Li et al.,2019).

arXiv:2210.14671v1 [math.OC] 26 Oct 2022

Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery

Assume that we observe

yi=x>

iSxi, i = 1, . . . , n (1.1)

where S∈Rd×dis an unknown rank rPSD matrix and x1,...,xnare i.i.d from some distribution. Our goal

is to recover or estimate Sfrom the pairs (yi,xi),i= 1, . . . , n. Throughout, we denote by Sd

+the set of d×d

PSD matrices and Sd

++ is the set of positive deﬁnite (PD) matrices.

Finding a low-rank matrix Ssubject to constraints (1.1) is a semideﬁnite program (SDP) that can be

implemented in polynomial time using general-purpose solvers. Furthermore, the speciﬁc structure of this

SDP may be leveraged to derive faster algorithms. Such solutions include the Burer-Monteiro approach to

solving semideﬁnite programs (Burer and Monteiro,2003) or nonconvex gradient descent methods for low-

rank programs (Sanghavi et al.,2017). Often, these approaches result in nonconvex optimization programs

for which theoretical results are limited.

In this work, we take a principled approach to solving this problem by eliciting convexity using a speciﬁc

geometry on the the space of PSD matrices. More precisely, we employ the Bures-Wasserstein (hereafter BW)

geometry, which comes independently from optimal transport and quantum information theory (Bures,1969;

Bhatia et al.,2019). This geometry allows us to solve the original problem by computing a BW barycenter

(Agueh and Carlier,2011;Álvarez-Esteban et al.,2016;Chewi et al.,2020;Altschuler et al.,2021). In turn,

we employ geodesic gradient descent on the BW manifold to compute said barycenter. We propose both full

gradient and stochastic gradient based methods that are guaranteed to eﬃciently recover a low-rank matrix.

These methods have low computational cost (per iteration complexity of O(ndr)for gradient descent and

O(dr)for stochastic gradient descent), have minimal parameter tuning, are easily implemented, and show

excellent practical performance. We demonstrate an example application of phase retrieval in Figure 1. In

this set-up, BW gradient descent recovers the image faster than Wirtinger Flow (WF) (Candès et al.,2015),

and BW gradient descent needs no parameter tuning.

Main contributions. The main results of this paper are:

1. We prove that the barycenter of a certain distribution of rank-one Gaussians exactly recovers the

underlying low-rank matrix.

2. With this connection, we give novel geodesic gradient descent and stochastic geodesic gradient descent

algorithms for solving the low-rank PSD matrix recovery problem using existing ﬁrst-order algorithms

for computing BW barycenters.

3. Existing ﬁrst-order algorithms for computing BW barycenters are only guaranteed to work for full

rank distributions. Since our method considers barycenters of rank-one PSD matrices, we develop

new theory and give a guarantee of local linear convergence in BW distance for the gradient descent

method. We also discuss initialization of our method.

4. We demonstrate the competitive edge of our algorithms in a few experimental settings.

Related Work. Many methods have been proposed to solve variants of the matrix recovery problem. Orig-

inal ideas for this problem trace back to linear systems theory, low-rank matrix completion, low-dimensional

Euclidean embeddings, and image compression (Recht et al.,2010).

We focus here on rank-one projections of positive semideﬁnite matrices as in (1.1). This setup is either

speciﬁcally considered or a special case of a large number of works including (Candès et al.,2015;Cai and

Zhang,2015;Chen et al.,2015;Zhong et al.,2015;Wang and Giannakis,2016;Wang et al.,2017;Sanghavi

et al.,2017;Li et al.,2019). These methods can be clustered into two families. The ﬁrst one aims at

Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery

Figure 1: Recovered images after 140 iterations of Wirtinger Flow (Candès et al.,2015) and BW gradient

descent (our method). BW gradient descent recovers a much sharper image within the same number of

iterations; note that the iteration complexity of both methods is the same.

minimizing convex relaxations of an energy functional that often based on the nuclear norm (Cai and Zhang,

2015;Chen et al.,2015). Such convex programs can be solved via standard solvers. Another family of methods

directly implement the low-rank constraint into a nonconvex constraint (Li et al.,2019) thus manipulating

candidate matrices with smaller representations and thereby boosting computational eﬃciency; see Chi et al.

(2019) for an overview of such algorithms. The special case where Shas rank one corresponds to the classical

phase retrieval problem and has received much attention with dedicated algorithms (Fienup,1978;Candès

et al.,2015;Wang and Giannakis,2016;Wang et al.,2017;Chi et al.,2019).

Note that the methods introduced in the present paper can be readily extended to the covariance recovery

problem framed in Cai and Zhang (2015), and that is similar to estimation in a random eﬀects model.

Under this model, rather than (1.1), we observe yi=hxi,wii+i,where wi∼N(0,S), where N(0,S)

is the centered Gaussian distribution on Rdwith PSD covariance matrix S. The goal here is to recover

the covariance matrix Sof the weight vectors from observations of (yi,xi),i= 1, . . . , n. This problem has

natural connections to mixture of regressions models as well (De Veaux,1989;Yi et al.,2014;Zhong et al.,

2016;Sedghi et al.,2016).

In terms of the complexity of various methods, solving the semideﬁnite program using oﬀ-the-shelf solvers

takes O(nd2+d3)complexity. Gradient descent on PSD matrices (see the initialization phase in (Tu et al.,

2016)) can solve this with per-iteration complexity O(nd2), and one can prove linear convergence under certain

assumptions. The most directly comparable methods are nonconvex gradient descent (Li et al.,2019), which

Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery

have per-iteration complexity O(ndr)and utilize a Burer-Monteiro factorization (Burer and Monteiro,2003).

As we will see, our method also has per-iteration complexity of O(ndr).

Finally, we mention several works dedicated to the computation of the nonconvex Wasserstein barycenter

problem (Agueh and Carlier,2011;Álvarez-Esteban et al.,2016;Zemel and Panaretos,2019;Chewi et al.,

2020;Altschuler et al.,2021). No theoretical study in these works allows for rank deﬁciency.

Notation. Bold capital letters denote matrices while bold lower-case letters denote vectors. The Hilbert-

Schmidt inner product is h·,·i, and h·,·iγΣ=h·,·iΣis the Riemannian metric associated to the (ﬁxed-rank)

BW manifold at Σ. Their corresponding norms are written as k·kand k·kΣ, respectively. The orthogonal

projection onto the column span of Σis PΣ=PSp(Σ). Similarly, P⊥

Σis the projection onto null-space of Σ

(orthogonal complement of Sp(Σ)). The Dirac distribution at a point xdenoted by δx.

Outline. We begin by outlining our approach to matrix recovery in Section 2. We then give the main theo-

retical results for our approach in Section 3. After this, we give experiments demonstrating these advantages

in Section 4, and discuss the limitations of our work in Section 5.

2 The Bures-Wasserstein Barycenter Approach

Recall that we aim at ﬁnd the rank rmatrix S∈Sd

+given the observations (1.1). We will use the notation

yi=A(S)i=hxix>

i,Si,i= 1, . . . , n. To recover a low-rank matrix Sfrom these measurements, most past

work has focused on some form of energy minimization. For example, some works have looked at convex

nuclear norm minimization methods. Cai and Zhang (2015) and Chen et al. (2015) concurrently developed

a nuclear norm minimization procedure that solves

min

Σ∈Sd

+,A(Σ)=y

Tr(Σ),(2.1)

where y= [y1, . . . , yn]>and A(Σ) = [A(Σ)1,...,A(Σ)n]>. One can directly solve the semideﬁnite program

using standard convex optimization packages. A Lagrangian formulation of (2.1) yields the energy 1

2kA(Σ)−

yk2

2+λTr(Σ),which can also be minimized using a variety of methods.

In the following, we lay out our approach to the low-rank matrix recovery problem, which focuses on a

new energy minimization procedure. In Section 2.1 we discuss the common nonconvex approaches to matrix

recovery and outline our novel optimization program. Then, in Section 2.2, we discuss the BW barycenter

problem, and show how it recovers solutions to the energy minimization we propose. After this, in Section 2.3

we outline the ﬁrst-order algorithms for computing BW barycenters. We ﬁnish in Section 2.4 by discussing a

regularization procedure that allows one to estimate higher rank proxies, from which it is possible to recover

2.1 Nonconvex Approaches for Matrix Recovery

Suppose that we know an upper bound for the rank of the underlying matrix S. We could utilize this

information in a nonconvex optimization program such as

min

Σ∈Sd

+,rank(Σ)≤r

2nkA(Σ)−yk2

2.(2.2)

Without the rank restriction (i.e., r=d) this problem is in fact convex. For any ﬁxed r≤d, we can

parameterize the rank rmatrices in Sd

+by U U >, for U∈Rd×r, which is now commonly referred to as

Burer-Monteiro factorization (Burer and Monteiro,2003). We thus deﬁne the set of PSD matrices of rank at

Bures-Wasserstein Barycenters and Low-Rank Matrix Recovery

most rusing this factorization: Sd,r

+:= {Σ∈Sd

+:Σ=U U >,U∈Rd×r}. With this parametrization, the

matrix recovery problem in (2.2) is equivalent to

min

U∈Rd×r

2kA(U U >)−yk2

2.(2.3)

While past work has focused on these least squares formulations, there have not been many modiﬁcations

of this energy. We propose the following modiﬁcations to the energies (2.2) and (2.3):

min

Σ∈Sd

+,rank(Σ)≤r

2nkpA(Σ)−√yk2

2= min

U∈Rd×r

2nkqA(U U >)−√yk2

2,(2.4)

where the square root is taken componentwise. As we demonstrate in the following sections, this problem

has a natural solution as a BW barycenter.

2.2 The Bures-Wasserstein Barycenter Problem

To explain the connection of (2.4) to BW barycenters, we will ﬁrst explain how BW space arises from the

perspective of optimal transport (Villani,2009). Let P2(Rd)be the set of all measures on Rdwith ﬁnite

second moment. The 2-Wasserstein distance between measures µand ν∈P2(Rd)is deﬁned by

2(µ, ν) = inf

π∈Π(µ,ν)

E(x,y)∼πkx−yk2,(2.5)

where Π(µ, ν)denotes the set of all couplings between µand ν(i.e., the set of all joint distributions on

Rd×Rdwith marginals µand ν). The 2-Wasserstein distance deﬁnes a metric over P2(Rd), and the resulting

geodesic metric space is referred to as 2-Wasserstein space.

Let N(Rd)denote the set of Gaussian distributions on Rd, and N0(Rd)be the set of centered Gaussian

distributions. Both are geodesically weakly convex subsets of 2-Wasserstein space, meaning there always

exist 2-Wasserstein geodesics between points in these sets that are contained within these sets. Letting

N(0,Σ)∈N0(Rd)denote the Gaussian distribution on Rdwith mean zero and covariance matrix Σ∈Sd

the 2-Wasserstein distance between N(0,Σ0)and N(0,Σ1)∈N0(Rd)has the explicit form

2(N(0,Σ0), N(0,Σ1)) = Tr Σ0+Σ1−2(Σ1/2

0Σ1Σ1/2

0)1/2.(2.6)

Notice that this is purely a function of the covariance matrices, and so the Wasserstein distance induces a

distance metric on PSD matrices called the Bures-Wasserstein distance (Bhatia et al.,2019).To refer to this

distance over PSD matrices rather than the Gaussian distributions, we will write

dBW(Σ0,Σ1) = W2(N(0,Σ0), N (0,Σ1)).(2.7)

More than just giving the set of PSD matrices a distance metric, this identiﬁcation endows Sd

+with a natural

Riemannian structure that it inherits from (N0(Rd), W2).

The barycenter problem seeks to generalize the notion of averages to non-Euclidean spaces. In the 2-

Wasserstein barycenter problem, one seeks a solution to

min

b∈P2(Rd)

2Eµ∼QW2

2(µ, b),(2.8)

where Qis a distribution over P2(Rd)with ﬁnite second moment, which we write as Q∈P2(P2(Rd)). When

Qis supported on Gaussians, the minimum is achieved on Gaussians (Knott and Smith,1994;Agueh and

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Bures-WassersteinBarycentersandLow-RankMatrixRe-coveryTylerMaunumaunu@brandeis.eduBrandeisUniversityThibautLeGouicthibaut.le_gouic@math.cnrs.frÉcoleCentraledeMarseillePhilippeRigolletrigollet@math.mit.eduMITAbstractWerevisittheproblemofrecoveringalow-rankpositivesemidenitematrixfromrank-oneprojecti...

展开>> 收起<<

Bures-Wasserstein Barycenters and Low-Rank Matrix Re- covery Tyler Maunu maunubrandeis.edu.pdf

共31页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bures-Wasserstein Barycenters and Low-Rank Matrix Re- covery Tyler Maunu maunubrandeis.edu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: