R-NL Covariance Matrix Estimation for Elliptical Distributions based on Nonlinear Shrinkage Simon HedigeraJerey N afbMichael Wolfa

2025-05-04 0 0 1.63MB 33 页 10玖币

侵权投诉

R-NL: Covariance Matrix Estimation for Elliptical

Distributions based on Nonlinear Shrinkage

Simon HedigeraJeﬀrey N¨afb∗Michael Wolfa

aDepartment of Economics, University of Zurich, Switzerland

bPreMeDICaL, Inria-Inserm, Montpellier, France

May 4, 2023

Abstract

We combine Tyler’s robust estimator of the dispersion matrix with nonlinear

shrinkage. This approach delivers a simple and fast estimator of the dispersion matrix

in elliptical models that is robust against both heavy tails and high dimensions. We

prove convergence of the iterative part of our algorithm and demonstrate the favorable

performance of the estimator in a wide range of simulation scenarios. Finally, an

empirical application demonstrates its state-of-the-art performance on real data.

Keywords: Heavy Tails, Nonlinear Shrinkage, Portfolio Optimization

∗Corresponding author at: PreMeDICaL, Inria-Inserm, Montpellier, France.

E-mail address: jeﬀrey.naf@inria.fr.

arXiv:2210.14854v4 [stat.ME] 3 May 2023

1 Introduction

Many statistical applications rely on covariance matrix estimation. Two common chal-

lenges are (1) the presence of heavy tails and (2) the high-dimensional nature of the data.

Both problems lead to suboptimal performance or even inconsistency of the usual sam-

ple covariance estimator ˆ

S. Consequently, there is a vast literature on addressing these

problems.

Two prominent ways to address (1) are (Maronna’s) M-estimators of scatter (Kent and

Tyler (1991)), as well as truncation of the sample covariance matrix; for example, see Ke

et al. (2019). There also appear to be two main approaches to solving problem (2). The

ﬁrst is to assume a speciﬁc structure on the covariance matrix to reduce the number of

parameters. One example of this is the “spiked covariance model”, as explored e.g., in

Johnstone (2001); Johnstone and Lu (2009); Donoho et al. (2018), a second is to assume

(approximate) sparsity and to use thresholding estimators (Bickel and Levina (2008a,b);

Rothman et al. (2009); Cai and Liu (2011)). We also refer to Ke et al. (2019) who present

a range of general estimators under heavy tails and extend to the case nąp, by assuming

speciﬁc structures on the covariance matrix. If one is not willing to assume such structure,

a second approach is to leave the eigenvectors of the sample covariance matrix unchanged

and to only adapt the eigenvalues. This leads to the class of estimators of Stein (1975,

1986). Linear shrinkage (Ledoit and Wolf (2004)) as well as nonlinear shrinkage developed

in Ledoit and Wolf (2012,2015,2020,2022b) are part of this class.

One promising line of research to address both problems at once is to extend (Maronna’s)

M-estimators of scatter (Kent and Tyler,1991) with a form of shrinkage for high dimen-

sions. This approach is in particular popular with a speciﬁc example of M-estimators called

“Tyler’s estimator” (Tyler (1987a)), which is derived in the context of elliptical distribu-

tions. Several papers have studied this approach, using a convex combination of the base

estimator and a target matrix, usually the (scaled) identity matrix. We generally refer

to such approaches as robust linear shrinkage estimators. For instance, Ollila and Tyler

(2014); Auguin et al. (2016); Ollila et al. (2021); Ashurbekova et al. (2021) combine the

linear shrinkage with Maronna’s M-estimators, whereas Abramovich and Spencer (2007);

Chen et al. (2011); Yang et al. (2014); Zhang and Wiesel (2016) do so with Tyler’s esti-

mator. Since this approach of combining linear shrinkage with a robust estimator entails

choosing a hyperparameter determining the amount of shrinkage, the second step often

consists of deriving some (asymptotically) optimal parameter that then can be estimated

from data. The approach results in estimation methods that are generally computation-

ally inexpensive and it also enables strong theoretical results on the convergence of the

underlying iterative algorithms.

Despite these advantages, several problems remain. First, the performance of these

robust methods sometimes does not exceed the performance of the basic linear shrinkage

estimator of Ledoit and Wolf (2004) in heavy-tailed models, except for small sample sizes

n(say nă100). In fact, the theoretical analysis of Couillet and McKay (2014); Auguin

et al. (2016) shows that robust M-estimators using linear shrinkage are asymptotically

equivalent to scaled versions of the linear shrinkage estimator of Ledoit and Wolf (2004).

Depending on how the data-adaptive hyperparameter is chosen, the performance can even

deteriorate quickly as the tails get lighter, as we demonstrate in our simulation study in

Section 4. Second, some robust methods cannot handle the case when the dimension pis

larger than the sample size n, such as Ollila et al. (2021). Third, some methods propose a

choice of hyperparameter(s) through cross-validation, such as Yu et al. (2017); Yi and Tyler

(2021), which can be computationally expensive. In this paper, we address these problems

by developing a simple algorithm based on nonlinear shrinkage (Ledoit and Wolf (2012,

2015,2020,2022b)), inspired by the above robust approaches and the work of Hediger and

N¨af (2022). In essence, the algorithm applies the quadratic inverse shrinkage (QIS) method

of Ledoit and Wolf (2022b) to appropriately standardized data, thereby greatly increasing

its ﬁnite-sample performance in heavy-tailed models. Thus, we refer to the new method

as “Robust Nonlinear Shrinkage” (R-NL); in particular, we extend the proposal of Hediger

and N¨af (2022) from a parametric model to general elliptical distributions. This approach

includes an iteration over the space of orthogonal matrices, which we prove converges to a

stationary point. We motivate our approach using properties of elliptical distributions along

the lines of Chen et al. (2011); Zhang and Wiesel (2016); Ashurbekova et al. (2021) and

demonstrate the favorable performance of our method in a wide range of settings. Notably,

our approach (i) greatly improves the performance of (standard) nonlinear shrinkage in

heavy-tailed settings; does not deteriorate when moving from heavy to Gaussian tails; (iii)

can handle the case pąn; and (iv) does not require the choice of a tuning parameter.

The remainder of the article is organized as follows. Section 1.1 lists our contribu-

tions. Section 2presents an example to motivate our methodology. Section 3describes

the proposed new methodology and provides results concerning the convergence of the new

algorithm. Section 4showcases the performance of our method in a simulation study using

various settings for both pănand pąn. Section 5applies our method to ﬁnancial data,

illustrating the performance of the method on real data.

1.1 Contributions

To the best of our knowledge, no paper has so far attempted to combine nonlinear shrinkage

of Ledoit and Wolf (2012,2015,2020,2022b) with Tyler’s method. As such, our approach

diﬀers markedly from previous ones. It is partly based on an M-estimator interpretation,

but also adds the nonparametric nonlinear shrinkage approach. A downside of this approach

is that theoretical convergence results are harder to come by. Nonetheless, we are able to

show that the iterative part of our algorithm converges to a stationary point, a crucial

result for the practical usefulness of the algorithm.

Maybe the closest paper to our method is Breloy et al. (2019), where the eigenvalues of

Tyler’s estimator are iteratively shrunken towards predetermined target eigenvalues, with a

parameter αdetermining the shrinkage strength. Through diﬀerent objectives, they arrive

at an algorithm from which the iterative part of our Algorithm 2can be recovered when

setting α“ 8. Additionally, using the eigenvalues from nonlinear shrinkage as the target

eigenvalues, their method presents an alternative way of combining Tyler’s estimator with

nonlinear shrinkage. Though they did not originally propose this, this was suggested by

an anonymous reviewer. However, while there is an overlap in the two algorithms for the

corner case of α“ 8, they arrive at their Algorithm 1 from a diﬀerent angle than we do.

Consequently, their theoretical results cannot be applied in our analysis. Moreover, they

do not suggest how to choose the tuning parameter α. In Appendix A, simulations indicate

that when the target eigenvalues are obtained from nonlinear shrinkage, setting α“ 8, and

thus maximally shrinking towards the nonlinear shrinkage eigenvalues, is usually beneﬁcial.

Table 1: Notation

Symbol Description

nSample size

pDimensionality

Σ.

.“VarpYqThe covariance matrix of the random vector Y.

TrpAqTrace of a square matrix A

}A}FFrobenius norm aTrpAJAqof a sqaure matrix A

Hdispersion matrix

Othe orthogonal group

O0equivalence class in O

Uarbitrary element of O

VEigenvectors of H“VΛVJ

Vr`s`th iteration of the algorithm

Vcritical point/solution/estimate

Vsubset of critical points of O

ΛTrue ordered eigenvalues of H, up to scaling

Λ0Initial (shrunken) estimate of Λ

ΛRFinal R-NL (shrunken) estimate of Λ

ΛEigenvalues of F´ˆ

V¯

Λr``1sEigenvalues of F`Vr`s˘

diagpq Transforms a vector aPRpinto an pˆpdiagonal matrix diagpaq

In addition, these simulations show that the updating of eigenvalues we propose after the

iterations converged can lead to an additional boost in performance over their method.

Whereas many of the aforementioned robust linear shrinkage papers have important

theoretical results, the empirical examination of their estimators in simulations and real

data applications is often limited. We attempt to give a more comprehensive empirical

overview in this paper. Contrary to most of the previous papers, we also consider a com-

paratively large sample size of n“300 in our simulation study. Compared to 6 competing

methods, our new approach displays a superior performance over a wide range of scenarios.

We also provide a Matlab implementation of our method, as well as the code to replicate

all simulations on https://github.com/hedigers/RNL_Code.

2 Motivational Example

For a collection of nindependent and identically distributed (i.i.d.) random vectors with

values in Rp, let ˆ

V“`ˆv1,...,ˆvp˘be the matrix of eigenvectors of the sample covariance

matrix ˆ

S. Nonlinear shrinkage, just as the linear shrinkage of Ledoit and Wolf (2004), only

changes the eigenvalues of the sample covariance matrix, while keeping the eigenvectors ˆ

That is, nonlinear shrinkage is also in the class of estimators of the form ˆ

V∆ˆ

VJ, with ∆

diagonal, a class that goes back to Stein (1975,1986). It is well known that

arg min

∆ diagonal

}Σ´ˆ

V∆ˆ

VJ}F“diag´`δ1. . . δN˘J¯

with δj:“ˆvJ

jΣˆvj;

for example, see (Ledoit and Wolf,2022a, Section 3.1). Nonlinear shrinkage takes the

sample covariance matrix ˆ

Sas an input and outputs a shrunken estimate of Σof the form

VΛ0ˆ

VJ, where Λ0“diagpˆ

δ1,...,ˆ

δNqis a diagonal matrix. Although there are diﬀerent

schemes to come with estimates tˆ

δju, each scheme uses as the only inputs p,n, and the

set of eigenvalues of ˆ

S. In this paper we derive a new estimator that is not in the class

of Stein (1975,1986) but applies nonlinear shrinkage to a transformation of the data. It

thereby implicitly uses more information than just the sample covariance matrix (together

with pand n). Since we focus in the following on the class of elliptical distributions, we will

diﬀerentiate between the dispersion matrix Hand the covariance matrix Σ. The former

will be deﬁned in Section 3, but the main diﬀerence between the two population quantities

is that Σmight not exist. If it does exist, Σis simply given by cH, with cą0 depending

on the underlying distribution.

To illustrate the advantage of our method, we now present a motivational toy example

before moving on to the general methodology. We ﬁrst consider a multivariate Gaussian

distribution in dimension p“200 with mean µ“0and covariance matrix Σ“H, where

the pi, jqelement of His 0.7|i´j|, as in Chen et al. (2011). We simulate n“300 i.i.d.

observations from this distribution. For j“1, . . . , p, the left panel of Figure 1displays the

theoretical optimum δj,ˆvJ

jˆ

VΛ0ˆ

Vˆvj“.

.ˆ

δj, as well as ˆvJ

jˆ

Hˆvj, where ˆ

His the proposed R-

NL estimator. Importantly, the estimated values are very close to the theoretical optimum

δj,j“1, . . . , p, for both nonlinear shrinkage and our proposed method.

We next consider the same setting, but instead simulate from a multivariate tdistribu-

tion with 4 degrees of freedom and dispersion matrix H, such that the covariance matrix Σ

is 4{p4´2q ¨ H. In particular the ˆvJ

jˆ

Hˆvjare multiplied by c“2 in this case to obtain an

estimate of ˆvJ

jΣˆvj. (The value c“2 would not be known in practice but is ‘fair’ to use it

in this toy example, since doing so does not favor one estimation method over the other).

The left panel of Figure 1displays the results. It can be seen that nonlinear shrinkage

overestimates large values of δj(by a lot) and underestimates small values of δj; on the

other hand, our new method does not have this problem and its performance (almost)

matches the one from the Gaussian case.

3 Methodology

We assume to observe an i.i.d. sample Y.

.“ tY1,...,Ynufrom a p-dimensional elliptical

distribution. If Yhas an elliptical distribution it can be represented as

“µ`RH1{2ξ,(1)

where Ris a positive random variable, and ξis uniformly distributed on the p-dimensional

unit sphere, independently of R, and D

“denotes equality in distribution (Cambanis et al.,

1981). The dispersion matrix His assumed to be symmetric positive-deﬁnite (pd), with

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

R-NL:CovarianceMatrixEstimationforEllipticalDistributionsbasedonNonlinearShrinkageSimonHedigeraJereyNafb*MichaelWolfaaDepartmentofEconomics,UniversityofZurich,SwitzerlandbPreMeDICaL,Inria-Inserm,Montpellier,FranceMay4,2023AbstractWecombineTyler'srobustestimatorofthedispersionmatrixwithnonlinearshr...

展开>> 收起<<

R-NL Covariance Matrix Estimation for Elliptical Distributions based on Nonlinear Shrinkage Simon HedigeraJerey N afbMichael Wolfa.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

R-NL Covariance Matrix Estimation for Elliptical Distributions based on Nonlinear Shrinkage Simon HedigeraJerey N afbMichael Wolfa

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: