Learning Graphical Factor Models with Riemannian Optimization Alexandre Hippert-Ferrer 10000 000277405415 Florent

2025-05-02 0 0 6.32MB 30 页 10玖币

侵权投诉

Learning Graphical Factor Models with

Riemannian Optimization

Alexandre Hippert-Ferrer 1[0000−0002−7740−5415], Florent

Bouchard2[0000−0003−3003−7317], Ammar Mian3[0000−0003−1796−8707], Titouan

Vayer4[0000−0002−8115−572X], and Arnaud Breloy5[0000−0002−3802−9015]

1Univ Gustave Eiﬀel, IGN, ENSG, LASTIG, F-77454 Marne-la-Vall´ee, France

alexandre.hippert-ferrer@univ-eiffel.fr

2Univ Paris-Saclay, CNRS, CentraleSup´elec, Laboratoire des signaux et syst`emes,

Gif-sur-Yvette, France

3Univ Savoie Mont Blanc, LISTIC, Annecy, France

4Univ Lyon, Inria, CNRS, ENS de Lyon, UCB Lyon 1, LIP UMR 5668, F-69342,

Lyon, France

5Univ Paris-Nanterre, LEME, IUT Ville d’Avray, Ville d’Avray, France

Abstract. Graphical models and factor analysis are well-established

tools in multivariate statistics. While these models can be both linked to

structures exhibited by covariance and precision matrices, they are gen-

erally not jointly leveraged within graph learning processes. This paper

therefore addresses this issue by proposing a ﬂexible algorithmic frame-

work for graph learning under low-rank structural constraints on the

covariance matrix. The problem is expressed as penalized maximum like-

lihood estimation of an elliptical distribution (a generalization of Gaus-

sian graphical models to possibly heavy-tailed distributions), where the

covariance matrix is optionally constrained to be structured as low-rank

plus diagonal (low-rank factor model). The resolution of this class of

problems is then tackled with Riemannian optimization, where we lever-

age geometries of positive deﬁnite matrices and positive semi-deﬁnite

matrices of ﬁxed rank that are well suited to elliptical models. Numerical

experiments on synthetic and real-world data sets illustrate the eﬀective-

ness of the proposed approach.

Keywords: Graph learning ·Low-rank factor models ·Riemannian op-

timization.

1 Introduction

Graphical models allow us to represent speciﬁc correlation structures between

any two variables (entries) of multivariate observations. Inferring the topology of

this structure directly from the data is referred to as graph learning, which has

been increasingly leveraged in numerous applications, such as biology (Li and

Gui,2006;Smith et al.,2011;Stegle et al.,2015), ﬁnance (Marti et al.,2021),

or signal processing (Shuman et al.,2013;Kalofolias,2016).

arXiv:2210.11950v2 [stat.ML] 1 Aug 2023

2 A. Hippert-Ferrer et al.

Within Gaussian graphical models (GGMs), graph learning boils down to

the problem of estimating the precision (inverse covariance) matrix of a Gaus-

sian Markov random ﬁeld (Friedman et al.,2008;Lake and Tenenbaum,2010).

In practice, achieving an accurate covariance matrix estimation is often a dif-

ﬁcult task due to low sample support. Thus, it is common to introduce prior

assumptions on the structure of this matrix that guarantee a correct estimation

with fewer samples. A popular approach is related to low-rank factorizations,

which relies on the assumption that the data is driven by an underlying low-

dimensional linear model, corrupted by an independent perturbation. The re-

sulting covariance matrix decomposition then involves a core that is a low-rank

positive semi-deﬁnite matrix. Such model is ubiquitous in statistics, and for ex-

ample, at the heart of probabilistic principal component analysis (Tipping and

Bishop,1999), low-rank factor analysis (Robertson and Symons,2007;Khamaru

and Mazumder,2019;Rubin and Thayer,1982), and their many generalizations.

Since GGMs and low-rank factorizations share a common root in structured

covariance (or precision) matrix estimation, it appears desirable to leverage both

approaches in a uniﬁed graph learning formulation. On one hand, linear dimen-

sion reduction approaches rely on particular spectral structures that can be

beneﬁcial for graph learning (Kumar et al.,2020). On the other hand, it also

opens the way to graph-oriented view of sparse principal component analysis

(Yoshida and West,2010;Meng et al.,2014). Though theoretically appealing,

such uniﬁcation is challenging because it formulates optimization problems with

objective functions and constraints that apply both on the covariance matrix

and its inverse. Thus, deriving single-step learning algorithms for these models

has only recently been addressed (Chandra et al.,2021).

In this paper, we propose a new family of methods for graph learning with

low-rank constraints on the covariance matrix, hereafter referred to as graphical

factor models (GFM). First, we reformulate graph learning as a problem that

encompasses both elliptical distributions and low-rank factor models. The main

interest of generalizing Gaussian graphical models to elliptical ones is to en-

sure robustness to underlying heavy-tailed distributions (Vogel and Fried,2011;

Finegold and Drton,2014;Zhang et al.,2013;de Miranda Cardoso et al.,2021).

Moreover, additionally considering low-rank factor models allows for an eﬀective

dimensionality reduction. The main novelty of our approach is to tackle the re-

sulting class of constrained and penalized maximum likelihood estimation in a

uniﬁed way with Riemannian optimization (Absil et al.,2008;Boumal,2020). To

do so, we leverage geometries of of both the positive deﬁnite matrices (Bhatia,

2009), and positive semi-deﬁnite matrices of ﬁxed rank (Bonnabel and Sepul-

chre,2009;Bouchard et al.,2021) that are well suited to the considered models.

The corresponding tools allows us to develop optimization methods that ensure

the desired structures for the covariance matrix, thus providing a ﬂexible and

numerically eﬃcient framework for learning graphical factor models.

Finally, experiments1conducted on synthetic and real-world data sets demon-

strate the interest of considering both elliptical distributions and factor model

1The code is available at: https://github.com/ahippert/graphfactormodel.

Learning Graphical Factor Models with Riemannian Optimization 3

structures in a graph learning process. We observe that the proposed algorithms

lead to more interpretable graphs compared to unstructured models. Notably,

the factor model approaches compare well with the current state-of-the-art on

Laplacian-constrained graph learning methods that require to set the number of

components as additional prior information (Kumar et al.,2020;de Miranda Car-

doso et al.,2021). The interest of our method is twofold: i) it requires less su-

pervision to unveil meaningful clusters in the conditional correlation structure

of the data; ii) the computational bottleneck is greatly reduced, as the proposed

algorithm iterations only requires the thin-SVD of a low-rank factor, rather than

the whole SVD of the Laplacian (or adjacency) matrix.

2 Background and proposed framework

2.1 Gaussian graphical and related models

Gaussian graphical models assume that each observation is a centered mul-

tivariate Gaussian random vector x= [x1, . . . , xp]⊤with covariance matrix

E[xx⊤] = Σ, denoted x∼ N (0,Σ). For the corresponding Gaussian Markov

random ﬁeld, an undirected graph is matched to the variables as follows: each

variable corresponds to a vertex, and an edge is present between two vertices

if the corresponding random variables are conditionally dependent given the

others (Dempster,1972;Lauritzen,1996). The support of the precision matrix

Θ=Σ−1directly accounts for this conditional dependency, since

corr xqxℓ|x[[1,p]]\{q,ℓ}=−Θqℓ/pΘqqΘℓℓ.(1)

Hence, a non-zero entry Θqℓ implies a conditional dependency between the

variables xqand xℓ, underlined by an edge between vertices qand ℓof the

graph. Within Gaussian graphical models, graph learning is therefore tied to

the problem of estimating the precision matrix Θfrom a set of observations

{xi}n

i=1 ∈(Rp)n. In order to exhibit such correlation structure, a standard ap-

proach is to resort to regularized maximum likelihood estimation, i.e., solving

the problem:

maximize

Θ∈S++

log det(Θ)−Tr{SΘ} − λh(Θ),(2)

where S=1

nPn

i=1 xix⊤

iis the sample covariance matrix, his a regulariza-

tion penalty, and λ∈R+is a regularization parameter. The ℓ1-norm if often

used as penalty in order to promote a sparse structure in Θ, which is at the

foundation of the well-known GLasso algorithm (Friedman et al.,2008;Lake

and Tenenbaum,2010;Mazumder and Hastie,2012;Fattahi and Sojoudi,2019).

Many other convex on non-convex penalties have been considered in this stet-

ting (Lam and Fan,2009;Shen et al.,2012;Benfenati et al.,2020). Depending

on the assumptions, it can also be beneﬁcial to consider penalties that promote

certain types of structured sparsity patterns (Hein¨avaara et al.,2016;Tarzanagh

and Michailidis,2018). Another main example of structure within the conditional

4 A. Hippert-Ferrer et al.

correlations is total positivity, also known as attractive Gaussian graphical mod-

els, that assumes Θqℓ ≤0,∀q̸=ℓ(Fallat et al.,2017;Lauritzen et al.,2019). In

attractive Gaussian graphical models, the identiﬁability of the precision matrix

to the graph Laplacian (Chung,1997), has also attracted recent interest in graph

learning (Egilmez et al.,2017;Ying et al.,2020;Kumar et al.,2020).

2.2 Elliptical distributions

A ﬁrst shortcoming of graph learning as formulated in (2) is its lack of robust-

ness to outliers, or heavy-tailed distributed samples. This is a consequence of

the underlying Gaussian assumption, that cannot eﬃciently describe such data.

A possible remedy is to consider a larger family of multivariate distributions.

In this context, elliptical distributions (Anderson and Fang,1990;Kai-Tai and

Yao-Ting,1990) oﬀer a good alternative, that are well-known to provide robust

estimators of the covariance matrix (Maronna,1976;Tyler,1987;Wald et al.,

2019). In our present context, this framework has been successfully used to ex-

tend graphical models (Vogel and Fried,2011;Finegold and Drton,2014;Zhang

et al.,2013;de Miranda Cardoso et al.,2021), as well as low-rank structured

covariance matrices models (Zhao and Jiang,2006;Zhou et al.,2019;Bouchard

et al.,2021) for corrupted or heavy-tailed data.

A vector is said to follow a centered elliptically symmetric distribution of

scatter matrix Σand density generator g, denoted x∼ ES(0,Σ, g), if its density

has the form

f(x)∝det(Σ)−1/2g(x⊤

iΣ−1xi),(3)

which yields the negative log-likelihood

L(Σ)∝1

i=1

ρ(x⊤

iΣ−1xi) + 1

2log |Σ|+ const.(4)

for the sample set {xi}n

i=1, where ρ(t) = −log g(t). Notice that using g(t) =

exp(−t/2) allow us to recover the Gaussian case. However, the density generator

genables more ﬂexibility, and notably to encompass many heavy-tailed multi-

variate distributions. Among popular choices, elliptical distributions include the

multivariate t-distribution with degree of freedom ν > 2, which is obtained with

g(t) = (1+ t/ν)−ν+p

2. For this distribution, the parameter νmeasures the rate of

decay of the tails. The Gaussian case also corresponds to the limit case ν→ ∞.

2.3 Low-rank factor models

A second limitation of (2) is that it does not account for other potential structure

exhibited by the covariance or precision matrices. This can problematic when

the sample support is low (n≃p, or n < p) as the sample covariance matrix is

not a reliable estimate in these setups (Ledoit and Wolf,2004;Smith,2005;Ver-

shynin,2012). In this context, a popular approach consists in imposing low-rank

Learning Graphical Factor Models with Riemannian Optimization 5

structures on the covariance matrix. These structures arise from the assump-

tion that the data is driven by an underlying low-dimensional linear model, i.e.,

x=W s+ϵ, where Wis a rank-kfactor loading matrix, and s∈Rkand ϵ∈Rp

are independent random variables. Thus the resulting covariance matrix is of the

form Σ∆

=H+Ψ, where H=WE[ss⊤]W⊤belongs to the set of positive semi-

deﬁnite matrices of rank k, denoted S+

p,k ={Σ∈ Sp,Σ≽0,rank(Σ) = k}. This

model is well known in statistics, and for example, at the core of probabilistic

principal component analysis, that assumes Ψ∝Ip(Tipping and Bishop,1999).

Also notice that in Laplacian-constrained models, a rank-kprecision matrix im-

plies a (p−k)-component graph (Chung,1997). Hence it is also interesting to

leverage such spectral structures from the graph learning perspective (Kumar

et al.,2020).

In this paper, we will focus on the low-rank factor analysis models (Robertson

and Symons,2007;Rubin and Thayer,1982;Khamaru and Mazumder,2019),

that consider the general case Ψ∈ D++

p, where D++

p=Σ= diag(d),d∈Rp

+∗

denotes the space of positive deﬁnite diagonal matrices. Given this model, the

covariance matrix belongs to the space of rank-kplus diagonal matrices, denoted

Mp,k =nΣ=H+Ψ,H∈ S+

p,k,Ψ∈ D++

po.(5)

Notice that this parameterization reduces the dimension of the estimation prob-

lem reduces from p(p+ 1)/2 to p(k+ 1) −k(k−3)/2, which is why it is often

used in regimes with few samples, or high dimensional settings.

2.4 Learning elliptical graphical factor models

We cast the problem of graph learning for elliptical graphical factor models as

minimize

Σ∈S++

L(Σ) + λh(Σ)

subject to Σ∈ Mp,k,(6)

where Lis the negative-log likelihood in (4), and Mp,k is the space of rank-kplus

diagonal positive deﬁnite matrices in (5). The penalty his a smooth function

that promotes a sparse structure on the graph, and λ∈R+is a regularization

parameter. Although this formulation take into account many options, we focus

on the usual element-wise penalty applied to the precision matrix Θ=Σ−1,

deﬁned as:

h(Σ) = X

q̸=ℓ

ϕ([Σ−1]qℓ) (7)

It is important to notice that the considered optimization framework will require

hto be smooth. This is why we presently use a surrogate of the ℓ1-norm deﬁned

ϕ(t) = εlog(cosh(t/ε)),(8)

with ε > 0 (limε→0ϕ(t) yields the ℓ1-norm). In practice, we use ε= 1e−12.

However, note that the output of the algorithm is not critically sensitive to this

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningGraphicalFactorModelswithRiemannianOptimizationAlexandreHippert-Ferrer1[0000−0002−7740−5415],FlorentBouchard2[0000−0003−3003−7317],AmmarMian3[0000−0003−1796−8707],TitouanVayer4[0000−0002−8115−572X],andArnaudBreloy5[0000−0002−3802−9015]1UnivGustaveEiffel,IGN,ENSG,LASTIG,F-77454Marne-la-Vall´e...

展开>> 收起<<

Learning Graphical Factor Models with Riemannian Optimization Alexandre Hippert-Ferrer 10000 000277405415 Florent.pdf

共30页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Graphical Factor Models with Riemannian Optimization Alexandre Hippert-Ferrer 10000 000277405415 Florent

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: