Misspecified Phase Retrieval with Generative Priors Zhaoqiang Liu National University of Singapore

2025-04-24 0 0 2.38MB 36 页 10玖币
侵权投诉
Misspecified Phase Retrieval with Generative Priors
Zhaoqiang Liu
National University of Singapore
dcslizha@nus.edu.sg
Xinshao Wang
University of Oxford
xinshao.wang@eng.ox.ac.uk
Jiulong Liu
Chinese Academy of Sciences
jiulongliu@lsec.cc.ac.cn
Abstract
In this paper, we study phase retrieval under model misspecification and generative
priors. In particular, we aim to estimate an
n
-dimensional signal
x
from
m
i.i.d. re-
alizations of the single index model
y=f(aTx)
, where
f
is an unknown and
possibly random nonlinear link function and
aRn
is a standard Gaussian vector.
We make the assumption
Cov[y, (aTx)2]6= 0
, which corresponds to the misspec-
ified phase retrieval problem. In addition, the underlying signal
x
is assumed to
lie in the range of an
L
-Lipschitz continuous generative model with bounded
k
-
dimensional inputs. We propose a two-step approach, for which the first step plays
the role of spectral initialization and the second step refines the estimated vector
produced by the first step iteratively. We show that both steps enjoy a statistical
rate of order
p(klog L)·(log m)/m
under suitable conditions. Experiments on
image datasets are performed to demonstrate that our approach performs on par
with or even significantly outperforms several competing methods.
1 Introduction
Compressed sensing (CS) is perhaps the most popular instance of high-dimensional inverse problems,
for which one has the linear measurement model
y=aTx+η, (1)
where
aRn
is the sensing vector,
xRn
is the sparse signal to estimate, and
η
represents additive
noise. It has been well-known for CS that roughly
m=O(slog(n/s))
i.i.d. Gaussian measurements
are sufficient to ensure the accurate recovery of a signal with snon-zero entries [88, 1, 23, 11, 77].
Phase retrieval (PR) arises in numerous scientific areas including X-ray crystallography, acoustics,
astronomy, microscopy, optics, wireless communications, and quantum information [
13
], where one
cannot measure
aTx
directly, and can only record its magnitude. For example, the following noisy
magnitude-only measurement model has been adopted in various prior works on sparse (real-valued)
PR [92, 39, 25, 36, 9, 7]:
y=|aTx|+η, (2)
where the signal
x
is assumed to be sparse, and the sensing vector
a
is assumed to be a standard
Gaussian vector.
However, both the linear and magnitude-only measure models in
(1)
and
(2)
are idealized views
of the data generating process. To make the setup more general, one can utilize the following
semi-parametric single index model (SIM) for general nonlinear models:
y=f(aTx),(3)
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.05571v1 [stat.ML] 11 Oct 2022
where
f:RR
is an unknown (possibly random) nonlinear link function, and
a
is typically
assumed to be Gaussian. In addition, since the norm of
xRn
is absorbed into the SIM, for brevity,
it is common to assume that
x
is a unit vector, i.e.,
kxk2= 1
. SIMs have long been studied in the
conventional setting where the number of measurements
m>n
[
30
,
81
,
45
]. In recent years, they
have also been analyzed in the high-dimensional setting where
mn
, mainly under the assumption
that the underlying signal is sparse. Relevant works include but are not limited to [
72
,
63
,
26
,
73
,
68
].
For all these works, it is crucial to impose the following assumption on the SIM:
Cov[y, aTx]6= 0.(4)
The pivotal condition in
(4)
is fairly generic and encompasses notable special examples such as noisy
1
-bit measurements and general (non-binary) quantization schemes. However, it is not satisfied by
PR models including (2) and the related models
y=|aTx+η|, y = (aTx)2+η, (5)
where
η
refers to zero-mean random noise that is independent of
a
. In order to formalize a class of
SIMs that encompass the above-mentioned PR models (and more general models, see the discussion
in Section 2.2) as special cases, misspecified phase retrieval (MPR) has been studied in [
64
,
98
], with
the condition (4) being replaced by the assumption
Cov[y, (aTx)2]6= 0.(6)
It is worth mentioning that another motivation behind studying MPR is that the theoretical analysis
for PR typically relies on the correct model specification that the data points are indeed generated
by the correct model, and the MPR model enables theoretical analysis under statistical model
misspecification.
In both works [
64
,
98
], the signal
x
is assumed to be sparse. Recently, motivated by tremendous
successful applications of deep generative models and following the seminal work [
6
] on generative
model based linear CS, it has been popular to study high-dimensional inverse problems under
generative priors [
76
,
31
,
32
,
34
,
96
,
41
,
2
,
67
,
95
,
60
,
40
,
65
]. In particular, instead of being sparse,
the underlying signal is assumed to be contained in (or lie near) the range of a generative model. It
has been empirically demonstrated in [
6
] and its various follow-up works (e.g., see [
19
,
85
,
48
] and a
literature review in [
78
]) that compared to sparsity based methods, corresponding generative model
based algorithms require significantly fewer samples to recover the signal up to a given accuracy.
In this paper, we study the MPR problem under generative modeling assumptions.
1.1 Related Work
In this subsection, we summarize some relevant works on PR and SIM, for both cases with or without
generative priors.
PR and SIM without generative priors
: There is a large amount of literature providing practical
algorithms for PR, including convex methods [
66
,
14
,
12
,
90
,
4
,
29
] and empirically more competitive
non-convex methods [
27
,
21
,
62
,
70
]. In particular, in the seminal work [
62
] and a variety of its
follow-up works [
13
,
10
,
15
,
92
,
91
,
99
,
39
,
82
,
8
], whether it is for general PR (with no priors on the
signal) or sparse PR, two-step approaches have been proposed with provable guarantees. The first
step consists of a spectral initialization method, and the second step is typically an iterative (such
as alternating minimization and gradient descent) algorithm that further refines the initial guess of
the first step. For general PR, the spectral initialization step turns out to be unnecessary, and optimal
sample complexity guarantees can be established even when using random initialization [
83
,
16
].
However, to the best of our knowledge, it is not the case for sparse PR. More specifically, all
theoretically-guaranteed non-convex algorithms for sparse PR require spectral initialization, and this
typically results in a sub-optimal sample complexity of
Os2log n
(instead of
O(slog n)
), where
s
refers to the number of non-zero entries of the signal to estimate.
MPR is also closely related to SIMs, which have been extensively studied in the conventional setting
where
m > n
, see, e.g., [
30
,
35
,
59
]. High-dimensional SIMs have received a lot of attention in
recent years, with theoretical guarantees for signal estimation and support recovery [
22
,
24
,
74
,
84
,
55
,
72
,
73
,
17
,
97
,
28
,
93
,
69
,
20
]. In particular, motivated by the idea that under the assumption
(4)
, a
SIM can be converted into a scaled linear measurement model with unconventional noise, the authors
2
of [
72
] consider minimizing the linear least-squares objective function over a convex set. They show
that a reliable estimation of the signal can be obtained by such a simple method despite the unknown
nonlinear link function.
As mentioned earlier, MPR with sparse priors has been studied in [
64
,
98
]. The work [
64
] implements
a two-step procedure based on convex optimization, with the first step being a spectral initialization
method. More specifically, a semidefinite program is solved to produce an initial vector, and then an
`1
regularized least square is solved to obtain a refined estimator. Such a procedure suffers from high
computational costs. A more efficient two-step approach, which is a simple variant of the thresholded
Wirtinger flow method [
10
], is proposed in [
98
]. In the first step, identical to that in [
10
], the initial
vector is calculated by a thresholded spectral method that first estimates the support of the sparse
signal by thresholding and then performs the classic power method over the submatrix corresponding
to the estimated support. In the second step, a thresholded gradient descent algorithm is employed.
Both approaches in [
64
,
98
] can attain the optimal statistical rate of order
p(slog n)/m
, provided
that the sensing vector is standard Gaussian and the number of samples
m= Ωs2log n
. In addition,
the second step of the approach proposed in [98] is shown to achieve a linear convergence rate.
PR and SIM with generative priors
: PR with generative priors has been studied in [
31
,
37
,
38
,
80
,
3
,
47
]. More specifically, an approximate message passing algorithm is proposed in [
3
]. The authors
of [
31
,
80
] minimize the objective over the latent space in
Rk
using gradient descent, where
k
is
the latent dimension of the generative prior. Corresponding algorithms may easily get stuck in local
minima since the explorable solution space is limited. Recovery guarantees for projected gradient
descent algorithms over the ambient space in
Rn
for noiseless PR with pre-trained or untrained neural
network priors have been proposed in [
37
,
38
]. No initialization methods have been proposed in these
works, making the assumption on the initial vector therein a bit stringent. On the other hand, the
authors of [
47
] propose a spectral initialization method for PR with generative priors and provide
recovery guarantees with respect to globally optimal solutions of a corresponding optimization
problem. The optimization problem is non-convex, and a projected power method is proposed in [
50
]
to approximately find an optimal solution.
Generative model based SIMs have been studied in [
51
,
49
,
46
,
94
]. The authors of [
51
,
49
,
46
]
provide optimal sample complexity upper bounds under the assumption of Gaussian sensing vectors.
But their results rely on the assumption
(4)
, which is not satisfied by widely adopted PR models. The
SIM studied in [
94
] encompasses certain PR models as special cases, and the sensing vector can
be non-Gaussian. However, the nonlinear link function
f
is assumed to be differentiable, making
it not directly applicable to the PR model in
(2)
. Moreover, it is worth mentioning that in both
works [
94
,
51
], the recovery guarantees are with respect to globally optimal solutions of typically
highly non-convex optimization problems. Attaining these optimal solutions is practically difficult.
1.2 Contributions
Throughout this paper, we assume that the signal
x
lies in the range of an
L
-Lipschitz continuous
generative model with bounded k-dimensional inputs. Our main contributions are as follows:
We propose a two-step approach for MPR with generative priors. In particular, in the first
step, we make use of the projected power method proposed in [
50
] to obtain a good initial
vector for the iterative algorithm used in the second step.
We show that under appropriate initialization, both steps attain a statistical rate of order
p(klog L)·(log m)/m
, and the second step achieves a linear convergence rate. The
initialization condition for the first step is mild in the sense that it allows the inner product
between the starting point and the signal
x
to be a sufficiently small positive constant. In
contrary, the initialization condition for the second step is more restrictive, making the
first step necessary. Notably, unlike for the works on MPR with sparse priors [
64
,
98
] that
require the sub-optimal
Os2log n
sample complexity, the sample complexity requirement
for our recovery guarantees is
O((klog L)·(log m))
, which is naturally conjectured to be
near-optimal [52, 42].
We perform numerical experiments on image datasets to corroborate our theoretical results.
In particular, for the noisy magnitude-only measurement model
(2)
, we observe that our
approach gives reconstructions that are competitive with those of the alternating phase
projected gradient descent (APPGD) algorithm proposed in [
37
], which is the corresponding
3
state-of-the-art method, though we do not utilize the knowledge of the link function and
allow for model misspecification. Moreover, for several closely related measurement models
that satisfy the condition
(6)
, our approach leads to superior reconstruction performance
compared to all other competing methods, including APPGD.
2 Problem Formulation
In this section, we overview some important assumptions that we adopt. Before proceeding, we
present the notation used in this paper.
2.1 Notation
We use upper and lower case boldface letters to denote matrices and vectors respectively. For any
positive integer
N
, we use the shorthand notation
[N] = {1,2,··· , N}
, and
IN
represents the
identity matrix in
RN×N
. The support (set) of a vector is the index set of its non-zero entries.
We use
kXk22
to represent the spectral norm of
X
. We use
Bk(r)
to denote the radius-
r
in
Rk
, i.e.,
Bk(r) := {zRk:kzk2r}
, and
Sn1
represents the unit sphere in
Rn
, i.e.,
Sn1:= {sRn:ksk2= 1}
. We use
G
to denote a pre-trained
L
-Lipschitz continuous generative
model from
Bk(r)
to
Rn
. We focus on the setting where
kn
. For any set
SBk(r)
, we write
G(S) = {G(z) : zS}
. Our goal is to estimate the signal
xRange(G) = G(Bk(r))
from
realizations of the sensing vector
a
and the observation
y
(generated according to the SIM in
(3)
).
For any two sequences of real values
{aj}
and
{bj}
, we write
aj=O(bj)
if there exist an absolute
constant
C1
and a positive integer
j1
such that for any
j > j1
,
|aj| ≤ C1bj
,
aj= Ω(bj)
if there exist
an absolute constant
C2
and a positive integer
j2
such that for any
j > j2
,
|aj| ≥ C2bj
. We use the
generic notations
C
and
C0
to denote large positive constants, and we use
c
to denote a small positive
constant; their values may differ from line to line.
2.2 Setup
First, the following are the standard definitions of a sub-exponential random variable and the associ-
ated sub-exponential norm.
Definition 1.
A random variable
X
is said to be sub-exponential if there exists a positive constant
C
such that
(E[|X|p])1/p Cp
for all
p1
, or equivalently, if there exists a positive constant
C0
such that
P(|X|> u)exp(1 u/C0)
for all
u0
. The sub-exponential norm of
X
is defined as
kXkψ1:= supp1p1(E[|X|p])1/p.
We will focus on the following settings except where stated otherwise:
The observations are independently generated according to the SIM
(3)
, where
f
is the link
function that is unknown and possibly random.
We have an
L
-Lipschitz continuous generative model
G:Bk(r)Rn
. For conve-
nience, similarly to [
48
,
50
], we assume that the generative model is normalized such that
Range(G)⊆ Sn1
. For a general (unnormalized) generative model, we may essentially
consider its normalized version. See, e.g., [50, Remark 1].
The signal xis contained in the range of G, i.e., xRange(G)⊆ Sn1.
The sensing vector aRnis standard Gaussian, i.e., a∼ N(0,In).
The random variable
y=f(aTx)
is sub-exponential with the sub-exponential norm being
denoted by
Ky
, i.e.,
Ky:= kykψ1
. In addition, we use
My
to denote the expectation of
y
,
i.e., My:= E[y].
Remark 1. y
will be sub-exponential when
f(x)
comprises of
xc
plus lower order terms
with
c2
(since the product of two sub-Gaussian random variables is sub-exponential),
and therefore we will see that the
y
corresponding to all the measurement models presented
in our paper is sub-exponential. We remark that the assumption of sub-exponential
y
is
not essential and it can be easily relaxed. For example, when
y=xc
with
c
being an even
integer that is larger than
2
, there will be only a minor change in the order of the
log m
term in the sample complexity and statistical rate. However, for brevity, we follow [
64
,
98
]
and make the assumption of sub-exponential yto avoid non-essential complications.
4
We consider MPR and assume that1
ν:= Cov y, (aTx)2>0,(7)
or equivalently,
ν:= Cov f(g), g2>0,(8)
where
g∼ N(0,1)
is a standard normal random variable. Note that the assumption
(8)
is
only with respect to the nonlinear link function
f
. The condition in
(7)
is satisfied by PR
models described in (2) and (5). It is also satisfied by relevant models such as [98]
y=|aTx|+ 2 tanh(|aTx|) + η, y = 2(aTx)2+ 3 sin(|aTx|) + η. (9)
See [64, Proposition 3 and Remark 4] for more general examples.
3 Algorithm
In this section, we describe our two-step algorithm devised for MPR with generative priors. Suppose
that we have
m
i.i.d. realizations of
a
and
y
, namely
a1,...,am
and
y1, . . . , ym
. To estimate the
signal x, we consider the following two-step approach:
1. We perform T1iterations in the first step. In particular, let
V=1
m
m
X
i=1
yiaiaT
iIn.(10)
We perform the projected power method proposed in [50]: For t= 0,1, . . . , T11, let
w(t+1) =PGVw(t),(11)
where
PG(·)
denotes the projection function onto
Range(G)
,
2
and we obtain
x(0) := w(T1)
.
Similarly to [
43
,
47
], we set the starting point
w(0)
as the column of
1
mPm
i=1 yiaiaT
i
(i.e.,
a shifted version of
V
) that corresponds to the largest diagonal entry. Note that it is easy to
calculate that
E[V] = νxxT
(see, e.g., [
47
, Lemma 8]), for which each column is a scalar
product of
x
. This motivates the use of a shifted version of
V
to get the initialization vector.
2. We perform T2iterations in the second step. In particular, let
¯y=1
m
m
X
i=1
yi(12)
be the empirical mean of the observations. We perform the following iterative procedure:
For t= 0,1,2, . . . , T21, let
ˆν(t)=1
m
m
X
i=1
(yi¯y)·aT
ix(t)2
(13)
˜y(t)
i= (yi¯y)·aT
ix(t), i = 1,2, . . . , m. (14)
˜
x(t+1) =x(t)ζ
m·
m
X
i=1 ˆν(t)·aT
ix(t)˜y(t)
iai,(15)
x(t+1) =PG˜
x(t+1),(16)
where
ζ > 0
is a tuning parameter, and
ˆν(t)
can be thought of as an approximation of
ν
defined in
(7)
. The idea behind calculating
˜y(t)
i
is that by comparing
(4)
and
(6)
, we observe
that to transform the MPR model into a conventional SIM, we may use
(yE[y])(aTx)
to
1The case that ν < 0can be similarly handled by considering y.
2
That is, for any
sRn
,
PG(s) := Garg minzBk(r)kG(z)sk2
. Similarly to [
79
,
37
,
38
,
71
,
50
],
we implicitly assume the exact projection in our analysis. In practice, approximate methods such as gradient-
and GAN-based projections [79, 75] have been shown to work well.
5
摘要:

MisspeciedPhaseRetrievalwithGenerativePriorsZhaoqiangLiuNationalUniversityofSingaporedcslizha@nus.edu.sgXinshaoWangUniversityofOxfordxinshao.wang@eng.ox.ac.ukJiulongLiuChineseAcademyofSciencesjiulongliu@lsec.cc.ac.cnAbstractInthispaper,westudyphaseretrievalundermodelmisspecicationandgenerativeprio...

展开>> 收起<<
Misspecified Phase Retrieval with Generative Priors Zhaoqiang Liu National University of Singapore.pdf

共36页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:36 页 大小:2.38MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 36
客服
关注