
The rank rand the leading theoretical eigenvalues (ℓi)r
i=1, which we refer to as “spiked” eigenvalues, are
fixed and independent of n. Let λi≡λi,n denote the eigenvalues of S, ordered decreasingly λ1≥ ··· ≥ λp.
Inconsistency of Sunder proportional growth stems from several phenomena absent under classical fixed-
plarge-nasymptotic studies. Their discovery is due to Marchenko and Pastur [28], Baik, Ben Arous, and
P´ech´e [6], Baik and Silverstein [5], and Paul [31].
1. Eigenvalue spreading. In the standard normal case Σ = I, where I≡Ipdenotes the p-dimensional
identity matrix, the empirical spectral measure of Sconverges under (1.1) weakly almost surely to
the Marchenko-Pastur distribution with parameter γ. For γ∈(0,1], this distribution, or bulk, is
non-degenerate, absolutely continuous, and has support [(1 −√γ)2,(1 + √γ)2]=[λ−(γ), λ+(γ)].
Intuitively, empirical eigenvalues, rather than concentrating near their theoretical counterparts (which
in this case are all simply 1), spread out across a fixed-size interval, preventing consistency of Sfor Σ.
2. Eigenvalue bias. As it turns out, the leading empirical eigenvalues (λi)r
i=1 do not converge to their
theoretical counterparts (ℓi)r
i=1, rather, they are biased upwards. Under (1.1) and (1.2), for fixed i≥1,
λi
a.s.
−−→ λ(ℓi),(1.3)
where λ(ℓ)≡λ(ℓ, γ) is the “eigenvalue mapping” function, given piecewise by
λ(ℓ) =
ℓ+γℓ
ℓ−1ℓ > 1 + √γ
(1 + √γ)2ℓ≤1 + √γ
.(1.4)
The transition point ℓ+(γ) = 1+√γbetween the two behaviors is known as the Baik-Ben Arous-P´ech´e
(BBP) transition. Below the transition, 1 < ℓ ≤ℓ+(γ), “weak signal” leads to a limiting eigenvalue
independent of ℓ. For fixed isuch that ℓi≤ℓ+(γ), λitends to λ+(γ) = (1 + √γ)2, the upper bulk-edge
of the Marchenko-Pastur distribution with parameter γ.
Above the transition, ℓ>ℓ+(γ), “strong signal” produces an empirical eigenvalue dependent on ℓ,
though biased upwards. For fixed isuch that ℓi> ℓ+(γ), λi“emerges from the bulk,” approaching a
limit λ(ℓi)> ℓi. This asymptotic bias in extreme eigenvalues is a further cause of inconsistency of S
in several loss measures, including operator norm loss.
3. Eigenvector inconsistency. The eigenvectors v1, . . . , vpof Sdo not align asymptotically with the
corresponding eigenvectors u1, . . . , upof Σ. Under (1.1) and (1.2), assuming supercritical spiked
eigenvalues—those with ℓi> ℓ+(γ)—are distinct, the limiting angles are deterministic and obey
|⟨ui, vj⟩| a.s.
−−→ δij ·c(ℓi),1≤i, j ≤r; (1.5)
here the “cosine” function c(ℓ)≡c(ℓ, γ) is given piecewise by
c2(ℓ) =
1−γ/(ℓ−1)2
1 + γ/(ℓ−1) ℓ > 1 + √γ
0ℓ≤1 + √γ
.(1.6)
Again, a phase transition occurs at ℓ+(γ). This misalignment of empirical and theoretical eigenvectors
further contributes to inconsistency; this is easiest to see for Frobenius loss.
1.2 Shrinkage Estimation
Charles Stein proposed eigenvalue shrinkage as an alternative to traditional covariance estimation [35, 36].
Let S=VΛV′be an eigendecomposition, where Vis orthogonal and Λ = diag(λ1, . . . , λp). Let η:
[0,∞)→[0,∞) denote a scalar “rule” or “nonlinearity” or “shrinker,” and adopt the convention η(Λ) ≡
diag(η(λ1), . . . , η(λp)).1Estimators of the form b
Ση=V η(Λ)V′are studied in hundreds of papers; see the
works of Donoho, Gavish, and Johnstone [16] (and the extensive references therein) and Ledoit and Wolf
[24, 25]. Note that despite possible ambiguities in the choice of eigenvectors V,b
Σηis well defined.2
1These are common synonyms in shrinkage literature. Note that a nonlinearity may in fact act linearly and a shrinker may
act not as a contraction.
2The signs of eigenvectors are arbitrary. In the case of degenerate eigenvalues, there is additional eigenvector ambiguity.
2