Adaptive Estimation of Graphical Models under Total Positivity Jiaxi YingJosé Vinícius de M. CardosoDaniel P. Palomar Abstract

2025-04-30 0 0 1.12MB 26 页 10玖币
侵权投诉
Adaptive Estimation of Graphical Models under Total Positivity
Jiaxi Ying José Vinícius de M. Cardoso Daniel P. Palomar
Abstract
We consider the problem of estimating (diagonally dominant) M-matrices as precision matrices in
Gaussian graphical models. These models exhibit intriguing properties, such as the existence of the
maximum likelihood estimator with merely two observations for M-matrices [Lauritzen et al.,2019;
Slawski & Hein,2015] and even one observation for diagonally dominant M-matrices [Truell et al.,2021].
We propose an adaptive multiple-stage estimation method that refines the estimate by solving a weighted
1
-regularized problem at each stage. Furthermore, we develop a unified framework based on the gradient
projection method to solve the regularized problem, incorporating distinct projections to handle the
constraints of M-matrices and diagonally dominant M-matrices. A theoretical analysis of the estimation
error is provided. Our method outperforms state-of-the-art methods in precision matrix estimation and
graph edge identification, as evidenced by synthetic and financial time-series data sets.
1 Introduction
Total positivity, as a strong form of positive dependence, has found its applications in various fields such
as factor analysis in psychometrics [Lauritzen et al.,2019], taxonomic reasoning Lake & Tenenbaum [2010],
graph signal processing [Egilmez et al.,2017], and financial markets [Agrawal et al.,2020]. For instance, stock
markets exemplify total positivity, as stocks generally exhibit positive dependence due to the influence of
market factors [Agrawal et al.,2020]. A distribution that satisfies total positivity is also known as multivariate
totally positive of order two (MTP
2
). For a more in-depth understanding, we refer the reader to Fallat et al.
[2017]. In the Gaussian context, the concept of MTP
2
becomes more straightforward: a Gaussian distribution
is MTP
2
if and only if the precision matrix (i.e., inverse covariance matrix) is an M-matrix [Karlin & Rinott,
1983]. In this paper, our focus lies on estimating (diagonally dominant) M-matrices as precision matrices in
Gaussian graphical models.
Estimation of precision matrices in general Gaussian graphical models have been extensively studied in
the literature. A well-known method, known as graphical lasso [Banerjee et al.,2008;d’Aspremont et al.,
2008], is formulated as an
1
-regularized Gaussian maximum likelihood estimation. Numerous extensions
of graphical lasso have been explored [Fan et al.,2014;Friedman et al.,2008;Hsieh et al.,2014;Lam &
Fan,2009;Loh & Wainwright,2015,2017;Ravikumar et al.,2011;Sun et al.,2018]. Other notable, though
not exhaustive, works include graphical Dantzig [Yuan,2010], CLIME [Cai et al.,2011], and TIGER [Liu &
Wang,2017]. Recent research has illuminated intriguing properties of incorporating additional M-matrix
constraint, proving advantageous in the realms of high-dimensional statistics and signal processing on graphs.
Recent studies [Lauritzen et al.,2019,2021;Slawski & Hein,2015] show that the M-matrix constraint
significantly reduces the sample size required for the maximum likelihood estimator (MLE) to exist. Specifically,
Department of Mathematics, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, and
HKUST Shenzhen-Hong Kong Collaborative Innovation Research Institute, Shenzhen, China; e-mail:
jx.ying@connect.ust.hk
.
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water
Bay, Hong Kong SAR; e-mail: jvdmc@connect.ust.hk.
Department of Electronic and Computer Engineering, Department of Industrial Engineering and Data Analytics, The Hong
Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR; e-mail: palomar@ust.hk.
1
arXiv:2210.15471v2 [stat.ML] 8 Jun 2023
Lauritzen et al. [2019]; Slawski & Hein [2015] established that the MLE under the M-matrix constraint
exists if the sample size
n
2, regardless of the underlying dimension
p
, substantially reducing the
np
requirement for general Gaussian distributions. Soloff et al. [2020] revealed that the M-matrix constraint,
serving as implicit regularization, is vital for achieving the minimax rate of
qlog p
n
; without this constraint,
the rate cannot exceed
pp
n
. Remarkably, Lauritzen et al. [2021] proved that under the MTP
2
constraint,
the MLE in binary distributions may exist with only
n
=
p
observations, in contrast to the 2
p
observations
required in the absence of this constraint.
The diagonally dominant M-matrix constraint is increasingly drawing interest in Gaussian models. In these
distributions, diagonally dominant M-matrices, as precision matrices, exhibit a property called log-
L
-concave
(LLC) [Robeva et al.,2021] or strong MTP
2
[Röttger et al.,2021]. This property is essential for analyzing
positive dependence in multivariate Pareto distributions. A recent study [Truell et al.,2021] revealed that
Gaussian models under LLC/strong MTP
2
act as a convex relaxation for Brownian motion tree models—a
class of Gaussian models on trees—and showed that its MLE exists almost surely when
n
= 1. This finding
was established by connecting Laplacian constrained Gaussian Markov random fields (LGMRF) [Cardoso
et al.,2021,2022;Kumar et al.,2020;Ying et al.,2020b] and leveraging results on MLE existence in LGMRF
with a single observation [Ying et al.,2021a].
The estimation of MTP
2
graphical models has become a growing area of interest in the field of signal
processing on graphs [Dong et al.,2019;Ortega et al.,2018;Shuman et al.,2013], which focuses on handling
data defined on irregular graph domains. Precision matrices in MTP
2
graphical models belong to the class of
generalized Laplacian matrices [Biyikoglu et al.,2007], each of which is a symmetric matrix with non-positive
off-diagonal entries that are associated with a graph. The eigenvectors of a generalized Laplacian define a
graph Fourier transform [Shuman et al.,2013], supported by the discrete nodal domain theorem [Davies
et al.,2001] that an eigenvector corresponding to a larger eigenvalue is associated with a higher frequency.
We note that such a spectral property does not hold for general positive definite matrices.
One approach to estimating MTP
2
graphical models is MLE [Lauritzen et al.,2019;Slawski & Hein,
2015], which implicitly promotes sparsity using the M-matrix constraint. However, it cannot adjust sparsity
levels to specific values, a feature often desired in practice. The
1
-norm regularized MLE [Cai et al.,2021;
Deng & So,2020;Egilmez et al.,2017;Pavez & Ortega,2016] offers improved sparsity control and can be
solved using techniques like block coordinate descent [Egilmez et al.,2017;Pavez & Ortega,2016], proximal
point algorithm [Deng & So,2020], and projected Newton-like methods [Cai et al.,2021]. Notably, Pavez
et al. [2018] found that graph components and some zero patterns can be determined through thresholded
sample covariance matrices. Estimating diagonally dominant M-matrices as precision matrices is explored in
Egilmez et al. [2017]. Additionally, Wang et al. [2020] proposed a graph structure learning algorithm based
on conditional independence testing, eliminating the need for adjustment of tuning parameters.
This paper focuses on estimating (diagonally dominant) M-matrices as precision matrices in MTP
2
Gaussian graphical models. The main contributions of this paper are threefold:
We propose an adaptive multiple-stage estimation method that refines the estimate by solving a weighted
1
-regularized problem at each stage. Then we develop a unified framework based on the gradient
projection method to solve the regularized problem, equipped with distinct projections to handle the
constraints of M-matrices and diagonally dominant M-matrices.
We provide a thorough theoretical analysis of the estimation error for our method, which comprises two
components: optimization error and statistical error. The optimization error decays at a linear rate,
highlighting the progressive refinement of the estimate across iterative stages, whereas the statistical
error captures the intrinsic statistical rate.
Experiments on synthetic and financial time-series data demonstrate that our method outperforms state-
of-the-art methods, concurrently achieving lower precision matrix estimation error and higher graph
edge selection accuracy. Additionally, we observe that incorporating M-matrix constraint enhances the
robustness regarding the selection of the regularization parameter.
2
Lower and upper case bold letters denote vectors and matrices, respectively. [
p
]denotes the set
{
1
, . . . , p}
.
xmax
=
maxi|xi|
.
Sp
++
denotes the set of
p×p
positive definite matrices.
λmax
(
X
)and
λmin
(
X
)denote
the maximum and minimum eigenvalues of
X
, respectively. For matrix
X
and set
S
,
XS
denotes a vector
with dimension |S|, containing the entries of Xindexed by S.
2 Preliminaries and Problem Formulation
We first introduce preliminaries about MTP
2
Gaussian graphical models, then present the problem formulation.
2.1 MTP2Gaussian Graphical Models
Let
G
= (
V,E
)be an undirected graph with the set of nodes
V
and the set of edges
E
. Associating a random
vector x∼ N(0,Σ)with graph Gforms a Gaussian graphical model that satisfies the following properties:
Θ
ij ̸= 0 ⇒ {i, j} ∈ E i̸=j,
Θ
ij = 0 xixj|x[p]\{i,j},
where
xixj|x[p]\{i,j}
indicates that
xi
is conditionally independent of
xj
given the other random variables.
Consequently, graph
G
characterizes the sparsity pattern of the precision matrix Θ
, where missing edges
represent conditional independence between random variables. Estimating the precision matrix is a crucial
task in Gaussian graphical models.
We concentrate on estimating precision matrices for random variables following an MTP
2
Gaussian
distribution. A multivariate Gaussian distribution with a positive definite covariance matrix Σ
is MTP
2
if
and only if its precision matrix Θ
is a symmetric M-matrix [Karlin & Rinott,1983], i.e.,Θ
ij
0for all
i̸
=
j
. This is equivalent to stating that all partial correlations are nonnegative, where the partial correlation
between any two variables xiand xjconditioned on all other variables equals Θ
ij /pΘ
iiΘ
jj .
2.2 Problem Formulation
Consider a zero-mean random vector
x
=
(x1, . . . , xp)
following a Gaussian distribution
x∼ N
(0
,
Σ
). Our
goal is to estimate the precision matrix Θ
:= (Σ
)
1
, which is a (diagonally dominant) M-matrix, given
n
independent and identically distributed observations x(1),...,x(n)Rp.
To estimate the precision matrix under MTP
2
Gaussian distributions, we can solve the penalized Gaussian
maximum likelihood estimation problem as follows:
minimize
Θ∈S log det(Θ) + tr Θb
Σ+X
i̸=j
hλ(|Θij |),(1)
where
hλ
is a sparsity penalty function, such as the
1
-norm considered in Egilmez et al. [2017],
b
Σ
=
1
nPn
i=1 x(i)(x(i))is the sample covariance matrix, and Sis a feasible set. We consider two cases for S:
Mp=ΘSp
++ |Θij = Θji 0,i̸=j,(2)
and
Mp
D=ΘSp
++|Θij = Θji 0,i̸=j, Θ1 0,(3)
where
Mp
represents the set of all symmetric positive definite M-matrices, and
Mp
D
adds constraint Θ1
0
compared to
Mp
, leading Θto be a diagonally dominant M-matrix. We propose an adaptive method for
estimating M-matrices and diagonally dominant M-matrices.
3
3 Proposed Method
In this section, we first propose an adaptive multiple-stage estimation method, then develop a unified
framework based on projected gradient descent for solving each stage. We incorporate distinct projections for
estimating M-matrices and diagonally dominant M-matrices as precision matrices.
3.1 Adaptive Estimation
Our method comprises multiple stages, where each stage refines the previous stage’s estimate by solving a
weighted
1
-regularized problem. Specifically, at the
k
-th stage, we obtain the estimate
b
Θ(k)
by solving the
following problem:
minimize
Θ∈S log det(Θ) + tr Θb
Σ+X
i̸=j
λ(k)
ij |Θij |,(4)
where each
λ(k)
ij
=
pλ
(
|b
Θ(k1)
ij |
)with
pλ
the weight-updating function, and
b
Θ(k1)
is the estimate obtained at
the (k1)-th stage.
When the estimate
b
Θ(k1)
from the previous stage exhibits a large coefficient for its (
i, j
)entry, it is
reasonable to assign a small
λ(k)
ij
in Problem
(4)
. Therefore,
pλ
should be monotonically non-increasing. In
the initial stage, we set each
λ(1)
ij
=
λ
, reducing Problem
(4)
to the well-studied
1
-regularized estimation
[Egilmez et al.,2017;Ying et al.,2021b]. Our method refines the
1
-norm estimate in subsequent stages,
offering an improvement over the standard 1-norm approach.
By choosing
pλ
as the derivative of the sparsity penalty function
hλ
in Problem
(1)
, the sequence
b
Θ(k)k1
converges to a stationary point of Problem
(1)
with a nonconvex penalty. In this context, our method aligns
with the local linear approximation method [Zou & Li,2008] and the broader framework of multi-stage convex
relaxation [Zhang,2010b]. Nonconvex approaches may encounter issues with sensitivity to the regularization
parameter selection, as illustrated in Figure 6. However, we demonstrate that incorporating the M-matrix
constraint considerably enhances the robustness regarding the selection of regularization parameter.
3.2 Gradient Projection Method
For each Θ
∈ S
, where
S
=
Mp
or
Mp
D
, all off-diagonal entries are nonpositive, leading to the following
representation of Problem (4):
minimize
Θ∈S log det(Θ) + tr Θb
ΣX
i̸=j
λ(k)
ij Θij .(5)
We design a gradient projection method to solve Problem (5). Initially, we perform a gradient descent step,
Θ
tηtf
(Θ
t
), with
f
representing the gradient of the objective function
f
. As this step may exceed
the constraint set
S
, we subsequently project it back onto
S
. However, such projection onto
S
does not
exist due to its nonclosedness, resulting from the nonclosedness of
Sp
++
. Instead, we express the set
S
as the
intersection of a closed set Sdand the open set Sp
++:
S=SdSp
++.(6)
We handle the constraints of
Sd
and
Sp
++
separately. The closed set
Sd
constraint is manageable through
projection, leading to the following projected gradient descent:
Θt+1 =PSdΘtηtf(Θt),(7)
where PSddenotes the projection onto the set Sdwith respect to the Frobenius norm.
The constraint of
S++
cannot be managed by projection due to its nonclosedness. One efficient approach
to enforce positive definiteness is using a line search procedure [Hsieh et al.,2014]. Specifically, at the
t
-th
4
iteration, we try the step size
ηtσβ0, β1, β2, . . .
, with
β
(0
,
1) and
σ >
0, until we find the smallest
mNsuch that the iterate Θt+1 in (7), with ηt=σβm, ensures Θt+1 Sp
++ and satisfies:
f(Θt+1)f(Θt)αηtG1
ηt
(Θt)2
F,(8)
where
α
(0
,
1), and
G1
ηt
(Θ
t
) =
1
ηt
(Θ
t
Θ
t+1
). We present our algorithm in Algorithm 1, where Λ
(k)
contains all weights λ(k)
ij , and diagonal entries are set to zero.
Theorem 3.1. The sequence
Θ
tt0
established by Algorithm 1 converges to the optimal solution of
Problem (5).
The proof is provided in Appendix B.2. It is important to note that the existence of the minimizer for
Problem
(5)
is assumed throughout this paper, which can be guaranteed almost surely if the sample size
n
2
for the case of
S
=
Mp
[Lauritzen et al.,2019;Slawski & Hein,2015] and
n
1for the case of
S
=
Mp
D
[Truell et al.,2021].
Although our method needs to solve a sequence of log-determinant programs
(5)
, numerical results indicate
a rapid decrease in the number of iterations required for solving
(5)
as
k
increases (see Figure 8). This
accelerated convergence can be attributed to the use of the estimate from the previous stage as an initial
point, thereby providing a warm start for faster convergence.
We now present notable extensions of the proposed gradient projection algorithm. First, the algorithm
can be readily extended to solve Problem
(1)
with a nonconvex penalty. Second, the algorithm can be further
adapted to solve the following problem for the case of multivariate t-distribution:
minimize
Θ∈S log det(Θ) + p+ν
n
n
X
i=1
log 1 + 1
ν(x(i))Θx(i),
where
ν
denotes the number of degrees of freedom, and each
x(i)
is an observation. This formulation can
be employed to learn positive partial correlation graphs. Notably, elliptical MTP
2
distributions are highly
restrictive, and the total positivity property is not applicable to t-distributions [Rossell & Zwiernik,2021]. It
is worth mentioning that, unlike block coordinate descent [Egilmez et al.,2017] and projected Newton-like
methods [Cai et al.,2021], the proposed algorithm can handle these extensions, resulting in a more flexible
and versatile approach.
Algorithm 1 Solve Problem (5)
1: Input: Sample covariance matrix b
Σ,Λ(k), σ, α, β.
2: for t= 0,1,2, . . . do
3: Compute f(Θt) = Θ1
t+b
ΣΛ(k);
4: m0;
5: repeat
6: Update Θt+1 =PSd(Θtσβmf(Θt));
7: mm+ 1;
8: until Θt+1 Sp
++ and
9: f(Θt+1)f(Θt)ασβmG1
ηt
(Θt)2
F;
10: end for
11: Output: b
Θ(k).
3.3 Computation of Projections
We present the computation of projection
PSd
in
(7)
for both cases of estimating M-matrices and diagonally
dominant M-matrices, i.e.,S=Mpand Mp
Din Problem (5).
5
摘要:

AdaptiveEstimationofGraphicalModelsunderTotalPositivityJiaxiYing∗JoséViníciusdeM.Cardoso†DanielP.Palomar‡AbstractWeconsidertheproblemofestimating(diagonallydominant)M-matricesasprecisionmatricesinGaussiangraphicalmodels.Thesemodelsexhibitintriguingproperties,suchastheexistenceofthemaximumlikelihoode...

展开>> 收起<<
Adaptive Estimation of Graphical Models under Total Positivity Jiaxi YingJosé Vinícius de M. CardosoDaniel P. Palomar Abstract.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:1.12MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注