Learning Graphical Factor Models with Riemannian Optimization Alexandre Hippert-Ferrer 10000 000277405415 Florent

2025-05-02 0 0 6.32MB 30 页 10玖币
侵权投诉
Learning Graphical Factor Models with
Riemannian Optimization
Alexandre Hippert-Ferrer 1[0000000277405415], Florent
Bouchard2[0000000330037317], Ammar Mian3[0000000317968707], Titouan
Vayer4[000000028115572X], and Arnaud Breloy5[0000000238029015]
1Univ Gustave Eiffel, IGN, ENSG, LASTIG, F-77454 Marne-la-Vall´ee, France
alexandre.hippert-ferrer@univ-eiffel.fr
2Univ Paris-Saclay, CNRS, CentraleSup´elec, Laboratoire des signaux et syst`emes,
Gif-sur-Yvette, France
3Univ Savoie Mont Blanc, LISTIC, Annecy, France
4Univ Lyon, Inria, CNRS, ENS de Lyon, UCB Lyon 1, LIP UMR 5668, F-69342,
Lyon, France
5Univ Paris-Nanterre, LEME, IUT Ville d’Avray, Ville d’Avray, France
Abstract. Graphical models and factor analysis are well-established
tools in multivariate statistics. While these models can be both linked to
structures exhibited by covariance and precision matrices, they are gen-
erally not jointly leveraged within graph learning processes. This paper
therefore addresses this issue by proposing a flexible algorithmic frame-
work for graph learning under low-rank structural constraints on the
covariance matrix. The problem is expressed as penalized maximum like-
lihood estimation of an elliptical distribution (a generalization of Gaus-
sian graphical models to possibly heavy-tailed distributions), where the
covariance matrix is optionally constrained to be structured as low-rank
plus diagonal (low-rank factor model). The resolution of this class of
problems is then tackled with Riemannian optimization, where we lever-
age geometries of positive definite matrices and positive semi-definite
matrices of fixed rank that are well suited to elliptical models. Numerical
experiments on synthetic and real-world data sets illustrate the effective-
ness of the proposed approach.
Keywords: Graph learning ·Low-rank factor models ·Riemannian op-
timization.
1 Introduction
Graphical models allow us to represent specific correlation structures between
any two variables (entries) of multivariate observations. Inferring the topology of
this structure directly from the data is referred to as graph learning, which has
been increasingly leveraged in numerous applications, such as biology (Li and
Gui,2006;Smith et al.,2011;Stegle et al.,2015), finance (Marti et al.,2021),
or signal processing (Shuman et al.,2013;Kalofolias,2016).
arXiv:2210.11950v2 [stat.ML] 1 Aug 2023
2 A. Hippert-Ferrer et al.
Within Gaussian graphical models (GGMs), graph learning boils down to
the problem of estimating the precision (inverse covariance) matrix of a Gaus-
sian Markov random field (Friedman et al.,2008;Lake and Tenenbaum,2010).
In practice, achieving an accurate covariance matrix estimation is often a dif-
ficult task due to low sample support. Thus, it is common to introduce prior
assumptions on the structure of this matrix that guarantee a correct estimation
with fewer samples. A popular approach is related to low-rank factorizations,
which relies on the assumption that the data is driven by an underlying low-
dimensional linear model, corrupted by an independent perturbation. The re-
sulting covariance matrix decomposition then involves a core that is a low-rank
positive semi-definite matrix. Such model is ubiquitous in statistics, and for ex-
ample, at the heart of probabilistic principal component analysis (Tipping and
Bishop,1999), low-rank factor analysis (Robertson and Symons,2007;Khamaru
and Mazumder,2019;Rubin and Thayer,1982), and their many generalizations.
Since GGMs and low-rank factorizations share a common root in structured
covariance (or precision) matrix estimation, it appears desirable to leverage both
approaches in a unified graph learning formulation. On one hand, linear dimen-
sion reduction approaches rely on particular spectral structures that can be
beneficial for graph learning (Kumar et al.,2020). On the other hand, it also
opens the way to graph-oriented view of sparse principal component analysis
(Yoshida and West,2010;Meng et al.,2014). Though theoretically appealing,
such unification is challenging because it formulates optimization problems with
objective functions and constraints that apply both on the covariance matrix
and its inverse. Thus, deriving single-step learning algorithms for these models
has only recently been addressed (Chandra et al.,2021).
In this paper, we propose a new family of methods for graph learning with
low-rank constraints on the covariance matrix, hereafter referred to as graphical
factor models (GFM). First, we reformulate graph learning as a problem that
encompasses both elliptical distributions and low-rank factor models. The main
interest of generalizing Gaussian graphical models to elliptical ones is to en-
sure robustness to underlying heavy-tailed distributions (Vogel and Fried,2011;
Finegold and Drton,2014;Zhang et al.,2013;de Miranda Cardoso et al.,2021).
Moreover, additionally considering low-rank factor models allows for an effective
dimensionality reduction. The main novelty of our approach is to tackle the re-
sulting class of constrained and penalized maximum likelihood estimation in a
unified way with Riemannian optimization (Absil et al.,2008;Boumal,2020). To
do so, we leverage geometries of of both the positive definite matrices (Bhatia,
2009), and positive semi-definite matrices of fixed rank (Bonnabel and Sepul-
chre,2009;Bouchard et al.,2021) that are well suited to the considered models.
The corresponding tools allows us to develop optimization methods that ensure
the desired structures for the covariance matrix, thus providing a flexible and
numerically efficient framework for learning graphical factor models.
Finally, experiments1conducted on synthetic and real-world data sets demon-
strate the interest of considering both elliptical distributions and factor model
1The code is available at: https://github.com/ahippert/graphfactormodel.
Learning Graphical Factor Models with Riemannian Optimization 3
structures in a graph learning process. We observe that the proposed algorithms
lead to more interpretable graphs compared to unstructured models. Notably,
the factor model approaches compare well with the current state-of-the-art on
Laplacian-constrained graph learning methods that require to set the number of
components as additional prior information (Kumar et al.,2020;de Miranda Car-
doso et al.,2021). The interest of our method is twofold: i) it requires less su-
pervision to unveil meaningful clusters in the conditional correlation structure
of the data; ii) the computational bottleneck is greatly reduced, as the proposed
algorithm iterations only requires the thin-SVD of a low-rank factor, rather than
the whole SVD of the Laplacian (or adjacency) matrix.
2 Background and proposed framework
2.1 Gaussian graphical and related models
Gaussian graphical models assume that each observation is a centered mul-
tivariate Gaussian random vector x= [x1, . . . , xp]with covariance matrix
E[xx] = Σ, denoted x N (0,Σ). For the corresponding Gaussian Markov
random field, an undirected graph is matched to the variables as follows: each
variable corresponds to a vertex, and an edge is present between two vertices
if the corresponding random variables are conditionally dependent given the
others (Dempster,1972;Lauritzen,1996). The support of the precision matrix
Θ=Σ1directly accounts for this conditional dependency, since
corr xqx|x[[1,p]]\{q,ℓ}=Θq/pΘqqΘℓℓ.(1)
Hence, a non-zero entry Θqimplies a conditional dependency between the
variables xqand x, underlined by an edge between vertices qand of the
graph. Within Gaussian graphical models, graph learning is therefore tied to
the problem of estimating the precision matrix Θfrom a set of observations
{xi}n
i=1 (Rp)n. In order to exhibit such correlation structure, a standard ap-
proach is to resort to regularized maximum likelihood estimation, i.e., solving
the problem:
maximize
Θ∈S++
p
log det(Θ)Tr{SΘ} − λh(Θ),(2)
where S=1
nPn
i=1 xix
iis the sample covariance matrix, his a regulariza-
tion penalty, and λR+is a regularization parameter. The 1-norm if often
used as penalty in order to promote a sparse structure in Θ, which is at the
foundation of the well-known GLasso algorithm (Friedman et al.,2008;Lake
and Tenenbaum,2010;Mazumder and Hastie,2012;Fattahi and Sojoudi,2019).
Many other convex on non-convex penalties have been considered in this stet-
ting (Lam and Fan,2009;Shen et al.,2012;Benfenati et al.,2020). Depending
on the assumptions, it can also be beneficial to consider penalties that promote
certain types of structured sparsity patterns (Hein¨avaara et al.,2016;Tarzanagh
and Michailidis,2018). Another main example of structure within the conditional
4 A. Hippert-Ferrer et al.
correlations is total positivity, also known as attractive Gaussian graphical mod-
els, that assumes Θq0,q̸=(Fallat et al.,2017;Lauritzen et al.,2019). In
attractive Gaussian graphical models, the identifiability of the precision matrix
to the graph Laplacian (Chung,1997), has also attracted recent interest in graph
learning (Egilmez et al.,2017;Ying et al.,2020;Kumar et al.,2020).
2.2 Elliptical distributions
A first shortcoming of graph learning as formulated in (2) is its lack of robust-
ness to outliers, or heavy-tailed distributed samples. This is a consequence of
the underlying Gaussian assumption, that cannot efficiently describe such data.
A possible remedy is to consider a larger family of multivariate distributions.
In this context, elliptical distributions (Anderson and Fang,1990;Kai-Tai and
Yao-Ting,1990) offer a good alternative, that are well-known to provide robust
estimators of the covariance matrix (Maronna,1976;Tyler,1987;Wald et al.,
2019). In our present context, this framework has been successfully used to ex-
tend graphical models (Vogel and Fried,2011;Finegold and Drton,2014;Zhang
et al.,2013;de Miranda Cardoso et al.,2021), as well as low-rank structured
covariance matrices models (Zhao and Jiang,2006;Zhou et al.,2019;Bouchard
et al.,2021) for corrupted or heavy-tailed data.
A vector is said to follow a centered elliptically symmetric distribution of
scatter matrix Σand density generator g, denoted x∼ ES(0,Σ, g), if its density
has the form
f(x)det(Σ)1/2g(x
iΣ1xi),(3)
which yields the negative log-likelihood
L(Σ)1
n
n
X
i=1
ρ(x
iΣ1xi) + 1
2log |Σ|+ const.(4)
for the sample set {xi}n
i=1, where ρ(t) = log g(t). Notice that using g(t) =
exp(t/2) allow us to recover the Gaussian case. However, the density generator
genables more flexibility, and notably to encompass many heavy-tailed multi-
variate distributions. Among popular choices, elliptical distributions include the
multivariate t-distribution with degree of freedom ν > 2, which is obtained with
g(t) = (1+ t/ν)ν+p
2. For this distribution, the parameter νmeasures the rate of
decay of the tails. The Gaussian case also corresponds to the limit case ν→ ∞.
2.3 Low-rank factor models
A second limitation of (2) is that it does not account for other potential structure
exhibited by the covariance or precision matrices. This can problematic when
the sample support is low (np, or n < p) as the sample covariance matrix is
not a reliable estimate in these setups (Ledoit and Wolf,2004;Smith,2005;Ver-
shynin,2012). In this context, a popular approach consists in imposing low-rank
Learning Graphical Factor Models with Riemannian Optimization 5
structures on the covariance matrix. These structures arise from the assump-
tion that the data is driven by an underlying low-dimensional linear model, i.e.,
x=W s+ϵ, where Wis a rank-kfactor loading matrix, and sRkand ϵRp
are independent random variables. Thus the resulting covariance matrix is of the
form Σ
=H+Ψ, where H=WE[ss]Wbelongs to the set of positive semi-
definite matrices of rank k, denoted S+
p,k ={Σ∈ Sp,Σ0,rank(Σ) = k}. This
model is well known in statistics, and for example, at the core of probabilistic
principal component analysis, that assumes ΨIp(Tipping and Bishop,1999).
Also notice that in Laplacian-constrained models, a rank-kprecision matrix im-
plies a (pk)-component graph (Chung,1997). Hence it is also interesting to
leverage such spectral structures from the graph learning perspective (Kumar
et al.,2020).
In this paper, we will focus on the low-rank factor analysis models (Robertson
and Symons,2007;Rubin and Thayer,1982;Khamaru and Mazumder,2019),
that consider the general case Ψ∈ D++
p, where D++
p=Σ= diag(d),dRp
+
denotes the space of positive definite diagonal matrices. Given this model, the
covariance matrix belongs to the space of rank-kplus diagonal matrices, denoted
as
Mp,k =nΣ=H+Ψ,H∈ S+
p,k,Ψ∈ D++
po.(5)
Notice that this parameterization reduces the dimension of the estimation prob-
lem reduces from p(p+ 1)/2 to p(k+ 1) k(k3)/2, which is why it is often
used in regimes with few samples, or high dimensional settings.
2.4 Learning elliptical graphical factor models
We cast the problem of graph learning for elliptical graphical factor models as
minimize
Σ∈S++
p
L(Σ) + λh(Σ)
subject to Σ∈ Mp,k,(6)
where Lis the negative-log likelihood in (4), and Mp,k is the space of rank-kplus
diagonal positive definite matrices in (5). The penalty his a smooth function
that promotes a sparse structure on the graph, and λR+is a regularization
parameter. Although this formulation take into account many options, we focus
on the usual element-wise penalty applied to the precision matrix Θ=Σ1,
defined as:
h(Σ) = X
q̸=
ϕ([Σ1]q) (7)
It is important to notice that the considered optimization framework will require
hto be smooth. This is why we presently use a surrogate of the 1-norm defined
as
ϕ(t) = εlog(cosh(t/ε)),(8)
with ε > 0 (limε0ϕ(t) yields the 1-norm). In practice, we use ε= 1e12.
However, note that the output of the algorithm is not critically sensitive to this
摘要:

LearningGraphicalFactorModelswithRiemannianOptimizationAlexandreHippert-Ferrer1[0000−0002−7740−5415],FlorentBouchard2[0000−0003−3003−7317],AmmarMian3[0000−0003−1796−8707],TitouanVayer4[0000−0002−8115−572X],andArnaudBreloy5[0000−0002−3802−9015]1UnivGustaveEiffel,IGN,ENSG,LASTIG,F-77454Marne-la-Vall´e...

展开>> 收起<<
Learning Graphical Factor Models with Riemannian Optimization Alexandre Hippert-Ferrer 10000 000277405415 Florent.pdf

共30页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:30 页 大小:6.32MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 30
客服
关注