Adaptive Estimation of Graphical Models under Total Positivity Jiaxi YingJosé Vinícius de M. CardosoDaniel P. Palomar Abstract

2025-04-30 1 0 1.12MB 26 页 10玖币

侵权投诉

Adaptive Estimation of Graphical Models under Total Positivity

Jiaxi Ying ∗José Vinícius de M. Cardoso †Daniel P. Palomar ‡

Abstract

We consider the problem of estimating (diagonally dominant) M-matrices as precision matrices in

Gaussian graphical models. These models exhibit intriguing properties, such as the existence of the

maximum likelihood estimator with merely two observations for M-matrices [Lauritzen et al.,2019;

Slawski & Hein,2015] and even one observation for diagonally dominant M-matrices [Truell et al.,2021].

We propose an adaptive multiple-stage estimation method that reﬁnes the estimate by solving a weighted

ℓ1

-regularized problem at each stage. Furthermore, we develop a uniﬁed framework based on the gradient

projection method to solve the regularized problem, incorporating distinct projections to handle the

constraints of M-matrices and diagonally dominant M-matrices. A theoretical analysis of the estimation

error is provided. Our method outperforms state-of-the-art methods in precision matrix estimation and

graph edge identiﬁcation, as evidenced by synthetic and ﬁnancial time-series data sets.

1 Introduction

Total positivity, as a strong form of positive dependence, has found its applications in various ﬁelds such

as factor analysis in psychometrics [Lauritzen et al.,2019], taxonomic reasoning Lake & Tenenbaum [2010],

graph signal processing [Egilmez et al.,2017], and ﬁnancial markets [Agrawal et al.,2020]. For instance, stock

markets exemplify total positivity, as stocks generally exhibit positive dependence due to the inﬂuence of

market factors [Agrawal et al.,2020]. A distribution that satisﬁes total positivity is also known as multivariate

totally positive of order two (MTP

). For a more in-depth understanding, we refer the reader to Fallat et al.

[2017]. In the Gaussian context, the concept of MTP

becomes more straightforward: a Gaussian distribution

is MTP

if and only if the precision matrix (i.e., inverse covariance matrix) is an M-matrix [Karlin & Rinott,

1983]. In this paper, our focus lies on estimating (diagonally dominant) M-matrices as precision matrices in

Gaussian graphical models.

Estimation of precision matrices in general Gaussian graphical models have been extensively studied in

the literature. A well-known method, known as graphical lasso [Banerjee et al.,2008;d’Aspremont et al.,

2008], is formulated as an

ℓ1

-regularized Gaussian maximum likelihood estimation. Numerous extensions

of graphical lasso have been explored [Fan et al.,2014;Friedman et al.,2008;Hsieh et al.,2014;Lam &

Fan,2009;Loh & Wainwright,2015,2017;Ravikumar et al.,2011;Sun et al.,2018]. Other notable, though

not exhaustive, works include graphical Dantzig [Yuan,2010], CLIME [Cai et al.,2011], and TIGER [Liu &

Wang,2017]. Recent research has illuminated intriguing properties of incorporating additional M-matrix

constraint, proving advantageous in the realms of high-dimensional statistics and signal processing on graphs.

Recent studies [Lauritzen et al.,2019,2021;Slawski & Hein,2015] show that the M-matrix constraint

signiﬁcantly reduces the sample size required for the maximum likelihood estimator (MLE) to exist. Speciﬁcally,

∗

Department of Mathematics, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, and

HKUST Shenzhen-Hong Kong Collaborative Innovation Research Institute, Shenzhen, China; e-mail:

jx.ying@connect.ust.hk

†

Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water

Bay, Hong Kong SAR; e-mail: jvdmc@connect.ust.hk.

‡

Department of Electronic and Computer Engineering, Department of Industrial Engineering and Data Analytics, The Hong

Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR; e-mail: palomar@ust.hk.

arXiv:2210.15471v2 [stat.ML] 8 Jun 2023

Lauritzen et al. [2019]; Slawski & Hein [2015] established that the MLE under the M-matrix constraint

exists if the sample size

n≥

2, regardless of the underlying dimension

, substantially reducing the

n≥p

requirement for general Gaussian distributions. Soloﬀ et al. [2020] revealed that the M-matrix constraint,

serving as implicit regularization, is vital for achieving the minimax rate of

qlog p

; without this constraint,

the rate cannot exceed

. Remarkably, Lauritzen et al. [2021] proved that under the MTP

constraint,

the MLE in binary distributions may exist with only

observations, in contrast to the 2

observations

required in the absence of this constraint.

The diagonally dominant M-matrix constraint is increasingly drawing interest in Gaussian models. In these

distributions, diagonally dominant M-matrices, as precision matrices, exhibit a property called log-

L♮

-concave

(LLC) [Robeva et al.,2021] or strong MTP

[Röttger et al.,2021]. This property is essential for analyzing

positive dependence in multivariate Pareto distributions. A recent study [Truell et al.,2021] revealed that

Gaussian models under LLC/strong MTP

act as a convex relaxation for Brownian motion tree models—a

class of Gaussian models on trees—and showed that its MLE exists almost surely when

= 1. This ﬁnding

was established by connecting Laplacian constrained Gaussian Markov random ﬁelds (LGMRF) [Cardoso

et al.,2021,2022;Kumar et al.,2020;Ying et al.,2020b] and leveraging results on MLE existence in LGMRF

with a single observation [Ying et al.,2021a].

The estimation of MTP

graphical models has become a growing area of interest in the ﬁeld of signal

processing on graphs [Dong et al.,2019;Ortega et al.,2018;Shuman et al.,2013], which focuses on handling

data deﬁned on irregular graph domains. Precision matrices in MTP

graphical models belong to the class of

generalized Laplacian matrices [Biyikoglu et al.,2007], each of which is a symmetric matrix with non-positive

oﬀ-diagonal entries that are associated with a graph. The eigenvectors of a generalized Laplacian deﬁne a

graph Fourier transform [Shuman et al.,2013], supported by the discrete nodal domain theorem [Davies

et al.,2001] that an eigenvector corresponding to a larger eigenvalue is associated with a higher frequency.

We note that such a spectral property does not hold for general positive deﬁnite matrices.

One approach to estimating MTP

graphical models is MLE [Lauritzen et al.,2019;Slawski & Hein,

2015], which implicitly promotes sparsity using the M-matrix constraint. However, it cannot adjust sparsity

levels to speciﬁc values, a feature often desired in practice. The

ℓ1

-norm regularized MLE [Cai et al.,2021;

Deng & So,2020;Egilmez et al.,2017;Pavez & Ortega,2016] oﬀers improved sparsity control and can be

solved using techniques like block coordinate descent [Egilmez et al.,2017;Pavez & Ortega,2016], proximal

point algorithm [Deng & So,2020], and projected Newton-like methods [Cai et al.,2021]. Notably, Pavez

et al. [2018] found that graph components and some zero patterns can be determined through thresholded

sample covariance matrices. Estimating diagonally dominant M-matrices as precision matrices is explored in

Egilmez et al. [2017]. Additionally, Wang et al. [2020] proposed a graph structure learning algorithm based

on conditional independence testing, eliminating the need for adjustment of tuning parameters.

This paper focuses on estimating (diagonally dominant) M-matrices as precision matrices in MTP

Gaussian graphical models. The main contributions of this paper are threefold:

•

We propose an adaptive multiple-stage estimation method that reﬁnes the estimate by solving a weighted

ℓ1

-regularized problem at each stage. Then we develop a uniﬁed framework based on the gradient

projection method to solve the regularized problem, equipped with distinct projections to handle the

constraints of M-matrices and diagonally dominant M-matrices.

•

We provide a thorough theoretical analysis of the estimation error for our method, which comprises two

components: optimization error and statistical error. The optimization error decays at a linear rate,

highlighting the progressive reﬁnement of the estimate across iterative stages, whereas the statistical

error captures the intrinsic statistical rate.

•

Experiments on synthetic and ﬁnancial time-series data demonstrate that our method outperforms state-

of-the-art methods, concurrently achieving lower precision matrix estimation error and higher graph

edge selection accuracy. Additionally, we observe that incorporating M-matrix constraint enhances the

robustness regarding the selection of the regularization parameter.

Lower and upper case bold letters denote vectors and matrices, respectively. [

]denotes the set

{

, . . . , p}

∥x∥max

maxi|xi|

denotes the set of

p×p

positive deﬁnite matrices.

λmax

(

)and

λmin

(

)denote

the maximum and minimum eigenvalues of

, respectively. For matrix

and set

denotes a vector

with dimension |S|, containing the entries of Xindexed by S.

2 Preliminaries and Problem Formulation

We ﬁrst introduce preliminaries about MTP

Gaussian graphical models, then present the problem formulation.

2.1 MTP2Gaussian Graphical Models

Let

= (

V,E

)be an undirected graph with the set of nodes

and the set of edges

. Associating a random

vector x∼ N(0,Σ⋆)with graph Gforms a Gaussian graphical model that satisﬁes the following properties:

Θ⋆

ij ̸= 0 ⇐⇒ {i, j} ∈ E ∀ i̸=j,

Θ⋆

ij = 0 ⇐⇒ xi⊥⊥ xj|x[p]\{i,j},

where

xi⊥⊥ xj|x[p]\{i,j}

indicates that

is conditionally independent of

given the other random variables.

Consequently, graph

characterizes the sparsity pattern of the precision matrix Θ

⋆

, where missing edges

represent conditional independence between random variables. Estimating the precision matrix is a crucial

task in Gaussian graphical models.

We concentrate on estimating precision matrices for random variables following an MTP

Gaussian

distribution. A multivariate Gaussian distribution with a positive deﬁnite covariance matrix Σ

⋆

is MTP

and only if its precision matrix Θ

⋆

is a symmetric M-matrix [Karlin & Rinott,1983], i.e.,Θ

⋆

ij ≤

0for all

i̸

. This is equivalent to stating that all partial correlations are nonnegative, where the partial correlation

between any two variables xiand xjconditioned on all other variables equals −Θ⋆

ij /pΘ⋆

iiΘ⋆

jj .

2.2 Problem Formulation

Consider a zero-mean random vector

(x1, . . . , xp)

following a Gaussian distribution

x∼ N

⋆

). Our

goal is to estimate the precision matrix Θ

⋆

:= (Σ

⋆

)

−1

, which is a (diagonally dominant) M-matrix, given

independent and identically distributed observations x(1),...,x(n)∈Rp.

To estimate the precision matrix under MTP

Gaussian distributions, we can solve the penalized Gaussian

maximum likelihood estimation problem as follows:

minimize

Θ∈S −log det(Θ) + tr Θb

Σ+X

i̸=j

hλ(|Θij |),(1)

where

hλ

is a sparsity penalty function, such as the

ℓ1

-norm considered in Egilmez et al. [2017],

nPn

i=1 x(i)(x(i))⊤is the sample covariance matrix, and Sis a feasible set. We consider two cases for S:

Mp=Θ∈Sp

++ |Θij = Θji ≤0,∀i̸=j,(2)

and

D=Θ∈Sp

++|Θij = Θji ≤0,∀i̸=j, Θ1 ≥0,(3)

where

represents the set of all symmetric positive deﬁnite M-matrices, and

adds constraint Θ1

≥

compared to

, leading Θto be a diagonally dominant M-matrix. We propose an adaptive method for

estimating M-matrices and diagonally dominant M-matrices.

3 Proposed Method

In this section, we ﬁrst propose an adaptive multiple-stage estimation method, then develop a uniﬁed

framework based on projected gradient descent for solving each stage. We incorporate distinct projections for

estimating M-matrices and diagonally dominant M-matrices as precision matrices.

3.1 Adaptive Estimation

Our method comprises multiple stages, where each stage reﬁnes the previous stage’s estimate by solving a

weighted

ℓ1

-regularized problem. Speciﬁcally, at the

-th stage, we obtain the estimate

Θ(k)

by solving the

following problem:

minimize

Θ∈S −log det(Θ) + tr Θb

Σ+X

i̸=j

λ(k)

ij |Θij |,(4)

where each

λ(k)

pλ

(

Θ(k−1)

ij |

)with

pλ

the weight-updating function, and

Θ(k−1)

is the estimate obtained at

the (k−1)-th stage.

When the estimate

Θ(k−1)

from the previous stage exhibits a large coeﬃcient for its (

i, j

)entry, it is

reasonable to assign a small

λ(k)

in Problem

(4)

. Therefore,

pλ

should be monotonically non-increasing. In

the initial stage, we set each

λ(1)

, reducing Problem

(4)

to the well-studied

ℓ1

-regularized estimation

[Egilmez et al.,2017;Ying et al.,2021b]. Our method reﬁnes the

ℓ1

-norm estimate in subsequent stages,

oﬀering an improvement over the standard ℓ1-norm approach.

By choosing

pλ

as the derivative of the sparsity penalty function

hλ

in Problem

(1)

, the sequence

b

Θ(k)k≥1

converges to a stationary point of Problem

(1)

with a nonconvex penalty. In this context, our method aligns

with the local linear approximation method [Zou & Li,2008] and the broader framework of multi-stage convex

relaxation [Zhang,2010b]. Nonconvex approaches may encounter issues with sensitivity to the regularization

parameter selection, as illustrated in Figure 6. However, we demonstrate that incorporating the M-matrix

constraint considerably enhances the robustness regarding the selection of regularization parameter.

3.2 Gradient Projection Method

For each Θ

∈ S

, where

, all oﬀ-diagonal entries are nonpositive, leading to the following

representation of Problem (4):

minimize

Θ∈S −log det(Θ) + tr Θb

Σ−X

i̸=j

λ(k)

ij Θij .(5)

We design a gradient projection method to solve Problem (5). Initially, we perform a gradient descent step,

t−ηt∇f

(Θ

), with

∇f

representing the gradient of the objective function

. As this step may exceed

the constraint set

, we subsequently project it back onto

. However, such projection onto

does not

exist due to its nonclosedness, resulting from the nonclosedness of

. Instead, we express the set

as the

intersection of a closed set Sdand the open set Sp

++:

S=Sd∩Sp

++.(6)

We handle the constraints of

and

separately. The closed set

constraint is manageable through

projection, leading to the following projected gradient descent:

Θt+1 =PSdΘt−ηt∇f(Θt),(7)

where PSddenotes the projection onto the set Sdwith respect to the Frobenius norm.

The constraint of

S++

cannot be managed by projection due to its nonclosedness. One eﬃcient approach

to enforce positive deﬁniteness is using a line search procedure [Hsieh et al.,2014]. Speciﬁcally, at the

-th

iteration, we try the step size

ηt∈σβ0, β1, β2, . . .

, with

β∈

1) and

σ >

0, until we ﬁnd the smallest

m∈Nsuch that the iterate Θt+1 in (7), with ηt=σβm, ensures Θt+1 ∈Sp

++ and satisﬁes:

f(Θt+1)≤f(Θt)−αηt∥G1

ηt

(Θt)∥2

F,(8)

where

α∈

1), and

ηt

(Θ

) =

ηt

(Θ

t−

t+1

). We present our algorithm in Algorithm 1, where Λ

(k)

contains all weights λ(k)

ij , and diagonal entries are set to zero.

Theorem 3.1. The sequence



tt≥0

established by Algorithm 1 converges to the optimal solution of

Problem (5).

The proof is provided in Appendix B.2. It is important to note that the existence of the minimizer for

Problem

(5)

is assumed throughout this paper, which can be guaranteed almost surely if the sample size

n≥

for the case of

[Lauritzen et al.,2019;Slawski & Hein,2015] and

n≥

1for the case of

[Truell et al.,2021].

Although our method needs to solve a sequence of log-determinant programs

(5)

, numerical results indicate

a rapid decrease in the number of iterations required for solving

(5)

increases (see Figure 8). This

accelerated convergence can be attributed to the use of the estimate from the previous stage as an initial

point, thereby providing a warm start for faster convergence.

We now present notable extensions of the proposed gradient projection algorithm. First, the algorithm

can be readily extended to solve Problem

(1)

with a nonconvex penalty. Second, the algorithm can be further

adapted to solve the following problem for the case of multivariate t-distribution:

minimize

Θ∈S −log det(Θ) + p+ν

i=1

log 1 + 1

ν(x(i))⊤Θx(i),

where

denotes the number of degrees of freedom, and each

x(i)

is an observation. This formulation can

be employed to learn positive partial correlation graphs. Notably, elliptical MTP

distributions are highly

restrictive, and the total positivity property is not applicable to t-distributions [Rossell & Zwiernik,2021]. It

is worth mentioning that, unlike block coordinate descent [Egilmez et al.,2017] and projected Newton-like

methods [Cai et al.,2021], the proposed algorithm can handle these extensions, resulting in a more ﬂexible

and versatile approach.

Algorithm 1 Solve Problem (5)

1: Input: Sample covariance matrix b

Σ,Λ(k), σ, α, β.

2: for t= 0,1,2, . . . do

3: Compute ∇f(Θt) = −Θ−1

t+b

Σ−Λ(k);

4: m←0;

5: repeat

6: Update Θt+1 =PSd(Θt−σβm∇f(Θt));

7: m←m+ 1;

8: until Θt+1 ∈Sp

++ and

9: f(Θt+1)≤f(Θt)−ασβm∥G1

ηt

(Θt)∥2

10: end for

11: Output: b

Θ(k).

3.3 Computation of Projections

We present the computation of projection

PSd

(7)

for both cases of estimating M-matrices and diagonally

dominant M-matrices, i.e.,S=Mpand Mp

Din Problem (5).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdaptiveEstimationofGraphicalModelsunderTotalPositivityJiaxiYing∗JoséViníciusdeM.Cardoso†DanielP.Palomar‡AbstractWeconsidertheproblemofestimating(diagonallydominant)M-matricesasprecisionmatricesinGaussiangraphicalmodels.Thesemodelsexhibitintriguingproperties,suchastheexistenceofthemaximumlikelihoode...

展开>> 收起<<

Adaptive Estimation of Graphical Models under Total Positivity Jiaxi YingJosé Vinícius de M. CardosoDaniel P. Palomar Abstract.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adaptive Estimation of Graphical Models under Total Positivity Jiaxi YingJosé Vinícius de M. CardosoDaniel P. Palomar Abstract

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: