Fundamental limits in structured PCA and how to reach them Jean Barbier Francesco Camilli

2025-05-06 1 0 1.56MB 107 页 10玖币

侵权投诉

Fundamental limits in structured PCA,

and how to reach them

Jean Barbier†,∗, Francesco Camilli†,∗,

Marco Mondelli⋄, Manuel S´aenz▽

†Abdus Salam International Center for Theoretical Physics, Italy

⋄Institute of Science and Technology, Austria

▽Universidad de La Rep´ublica, Uruguay

Abstract

How do statistical dependencies in measurement noise inﬂuence high-dimensional in-

ference? To answer this, we study the paradigmatic spiked matrix model of principal

components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We

go beyond the usual independence assumption on the noise entries, by drawing the noise

from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations

make the setting relevant for applications but analytically challenging. We provide the

ﬁrst characterization of the Bayes-optimal limits of inference in this model. If the spike

is rotation-invariant, we show that standard spectral PCA is optimal. However, for more

general priors, both PCA and the existing approximate message passing algorithm (AMP)

fall short of achieving the information-theoretic limits, which we compute using the replica

method from statistical mechanics. We thus propose a novel AMP, inspired by the theory

of Adaptive Thouless-Anderson-Palmer equations, which saturates the theoretical limit.

This AMP comes with a rigorous state evolution analysis tracking its performance. Al-

though we focus on speciﬁc noise distributions, our methodology can be generalized to a

wide class of trace matrix ensembles at the cost of more involved expressions. Finally,

despite the seemingly strong assumption of rotation-invariant noise, our theory empiri-

cally predicts algorithmic performance on real data, pointing at remarkable universality

properties.

∗Corresponding authors: jbarbier@ictp.it,fcamilli@ictp.it

arXiv:2210.01237v2 [cs.IT] 2 Jun 2023

Contents

I Main part 3

1 Introduction 4

2 Setting and main results 6

2.1 Result 1: Information-theoretical limits . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Result 2: Optimality of PCA for rotationally invariant priors . . . . . . . . . . . 8

2.3 Result 3a: Optimal pre-processing of the data . . . . . . . . . . . . . . . . . . . 8

2.4 Result 3b: Bayes-optimal AMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Numerical results and discussion 11

3.1 BAMP vs the replica prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Universality of the rotational invariance assumption . . . . . . . . . . . . . . . . 12

4 Methods 12

4.1 Outline of the replica computation . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.2 Auxiliary AMP and Onsager coeﬃcients . . . . . . . . . . . . . . . . . . . . . . 14

II Supplementary Information 16

5 Introduction, problem setting and main results 16

5.1 Introduction and related works . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.2 Probabilistic model of PCA with structured noise . . . . . . . . . . . . . . . . . 19

5.3 Concrete examples: the quartic and sestic ensembles . . . . . . . . . . . . . . . . 21

5.4 Mainresults...................................... 23

6 The inhomogeneous spherical integral 29

6.1 Deﬁnition and variational characterization . . . . . . . . . . . . . . . . . . . . . 30

6.2 Specialcases...................................... 31

6.2.1 Low-rank HCIZ integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.2.2 Low-rank spherical integral with external ﬁeld and diagonal replica coupling 31

6.2.3 Low-rank spherical integral with non-diagonal replica coupling and replica

symmetricoverlap .............................. 31

6.3 Derivation of the variational formula . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Information-theoretic analysis by the replica

method 35

7.1 An equivalent quadratic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.2 Replica symmetric free entropy using the inhomogeneous spherical integral . . . 38

7.3 Replica saddle point equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.4 The replica formula for the pure sestic potential . . . . . . . . . . . . . . . . . . 44

7.5 Replica saddle point equations for the pure sestic potential . . . . . . . . . . . . 46

7.6 Spectral PCA is optimal for rotation-invariant signals . . . . . . . . . . . . . . . 49

8 Sub-optimality of the previously proposed AMP 52

8.1 Analysis of the one-step AMP ﬁxed point performance . . . . . . . . . . . . . . 53

8.2 Analysis of the replica Bayes-optimal ﬁxed point . . . . . . . . . . . . . . . . . . 54

8.3 What is actually doing this sub-optimal AMP?

Mismatched estimation with Gaussian likelihood . . . . . . . . . . . . . . . . . . 56

9 Towards an optimal AMP: AdaTAP formalism 62

9.1 The AdaTAP single-instance free entropy . . . . . . . . . . . . . . . . . . . . . . 62

9.2 Saddle point: reduction to an Ising model, AdaTAP equations and optimal pre-

processingofthedata ................................ 64

9.3 Optimal pre-processing for the order 6 potential . . . . . . . . . . . . . . . . . . 65

9.4 Simplifying the AdaTAP equations by self-averaging of the Onsager reaction term 67

10 Approximate message passing, optimally 70

10.1 BAMP: Bayes-optimal AMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

10.2 Onsager coeﬃcients and state evolution recursion . . . . . . . . . . . . . . . . . 72

10.3 AMP-AP: AMP with Alternating Posteriors . . . . . . . . . . . . . . . . . . . . 75

11 Numerics 76

11.1 Spectral properties of the pre-processed matrix J(Y) ............... 76

11.2 BAMP and AMP-AP improve over the existing AMP and match the replica

prediction for the MMSE, and empirical universality of the rotational invariance

assumption ...................................... 77

A Approximation of non-polynomial potentials 83

B Learning the optimal pre-processing J(Y)86

C Proofs for BAMP 88

C.1 AuxiliaryAMP .................................... 88

C.2 State evolution of auxiliary AMP . . . . . . . . . . . . . . . . . . . . . . . . . . 88

C.3 ProofofTheorem2.................................. 90

Part I

Main part

1 Introduction

The success of inference and learning algorithms depends strongly on the structure of the high-

dimensional noisy data they process. Consequently, quantifying how this structure helps al-

gorithms to overcome the curse of dimensionality has become a central topic in statistics and

machine learning. Classical examples include sparsity in compressed sensing [39], low-rank

structure in matrix recovery [32], or community structure in community detection [1]. In all

these models, structure is usually assumed only at the signal’s level. But the decomposition

of the data into “signal” (the component considered of interest) and “noise” (the rest) is often

arbitrary and application-dependent. E.g., in classiﬁcation of “dogs/cats”, the training images

contain a lot of information unrelated to dogs and cats, e.g., on the notions of “inside/outside”,

“day/night”, etc. Yet, this highly structured potential source of information is discarded as

random noise (independent, Gaussian, etc.). Most of the research eﬀort has thus focused on

understanding how the signal structure alone helps inferring it. In contrast, much less is known

on the role of the noise structure and how to exploit it to improve inference.

Given their ubiquitous appearance in the statistics literature, spiked matrix models, which

were originally formulated as models for probabilistic principal component analysis (PCA) [62],

are now a paradigm in high-dimensional inference. Thanks to their universality features they,

and their generalizations, ﬁnd numerous applications in other central problems, including com-

munity detection [1], group synchronization [99] and sub-matrix localization or high-dimensional

clustering [72]. They thus oﬀer the perfect benchmark to quantify the inﬂuence of noise struc-

ture. In this paper we focus on the following estimation problem: a statistician needs to extract

a rank-one matrix (the spike)P∗:= X∗X∗⊺,X∗∈RN, from the data

Y=λ

NP∗+Z∈RN×N(1)

with “noise” Zand signal-to-noise ratio (SNR) λ≥0.

The spectral properties of ﬁnite rank perturbations of large random matrices like (1) were

intensively investigated in random matrix theory (see e.g. [8, 9, 26]), showing the presence of

a threshold phenomenon coined BBP transition (in reference to the authors of [8]): when λis

large enough, the top eigenvalue of Ydetaches from the bulk of eigenvalues. Its corresponding

eigenvector has then a non-trivial projection onto the sought ground truth X∗, and can be used

as its estimator. The problem has also been approached from the angle of Bayesian inference

[68, 37, 12, 71]. In particular, besides the previous spectral estimator, there exists a whole

family of iterative algorithms, known as approximate message passing (AMP), that can be

tailored to take further advantage of prior structural information about the signal and noise.

AMP algorithms were ﬁrst proposed for estimation in linear models [63, 38], but have since

been applied to a range of statistical estimation problems, including generalized linear models

[16, 103] and low-rank matrix estimation [37, 86]. An attractive feature of AMP is that its

performance in the high-dimensional limit can often be characterized by a succinct recursion

called state evolution [24, 29]. Using the state evolution analysis, it has been proved that AMP

achieves Bayes-optimal performance for some models [37, 86, 16], and a conjecture posits that

for a wide range of estimation problems, AMP is optimal among polynomial-time algorithms

[87].

The references mentioned above rely on the assumption of independent and identically dis-

tributed (i.i.d.) noise, often taken Gaussian Zij =Zji ∼ N(0,1), under which (1) is the

well-known spiked Wigner model [62]. This independence, or “absence of structure”, in the

noise simpliﬁes greatly the analysis. In order to relax this property, we may seek inspiration

from the statistical physics literature on disordered systems. An idea that was ﬁrst brought

forth in [23, 95] for the Sherrington-Kirkpatrick model, and later imported also in high dimen-

sional inference [5, 57] is that of giving an inhomogeneous variance proﬁle to the noise matrix

elements (we mention that this idea in inference is similar to the earlier deﬁnition of “spatially

coupled systems” [47, 70] in coding theory, see [12] for its use in the present context). The

procedure makes the (Zij ) no longer identically distributed, but it leaves them independent.

This is an important step towards more structure in the noise. Yet, the independence assump-

tion is a rather strong one. In fact, [57] showed that a broad class of observation models, as

long as the independence assumption holds, are information-theoretically equivalent to one with

independent Gaussian noise.

One way to go beyond is to consider noises belonging to the wider class of rotationally

invariant matrices. Since the appearance of the seminal works [80, 81, 97], there has been a

remarkable development in this direction, as evidenced by the rapidly growing number of papers

on spin glasses [92, 94, 77] and inference [53, 45, 110, 109] that take into account structured

disorder, including the present one. Indeed, we hereby consider a spiked model in which the

noise Zis drawn from an orthogonal matrix ensemble diﬀerent from the Gaussian orthogonal

ensemble (the only one with independent entries). Intuitively, the presence of dependencies

in the noise should be an advantage for an algorithm sharp enough to see patterns within it

and use them to retrieve the sought low-rank matrix. Going in that direction, [45] proposed

a version of AMP designed for rotationally invariant noises (using earlier ideas of [94, 92]).

Furthermore, in a recent work [14], part of the authors analysed a Bayes estimator and an

AMP, both assuming Gaussian noise, whereas the actual noise in the data was drawn from a

generic orthogonal matrix ensemble. However, besides intuition and the mentioned works, to

the best of our knowledge there is little theoretical understanding of the true role played by

noise structure in spiked matrix estimation and more generically in inference. In particular,

prior to our work there was no theoretical prediction of optimal performance to benchmark

practical inference algorithms.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FundamentallimitsinstructuredPCA,andhowtoreachthemJeanBarbier†,∗,FrancescoCamilli†,∗,MarcoMondelli⋄,ManuelS´aenz▽†AbdusSalamInternationalCenterforTheoreticalPhysics,Italy⋄InstituteofScienceandTechnology,Austria▽UniversidaddeLaRep´ublica,UruguayAbstractHowdostatisticaldependenciesinmeasurementnoisein...

展开>> 收起<<

Fundamental limits in structured PCA and how to reach them Jean Barbier Francesco Camilli.pdf

共107页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fundamental limits in structured PCA and how to reach them Jean Barbier Francesco Camilli

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: