
Optimization for Amortized Inverse Problems
tion is subject to different noise models (Van Veen et al.,
2018;Asim et al.,2020a;Whang et al.,2021a).
In execution, the reconstruction can be challenging as
Lreg
involves a deep generative prior. The success fundamen-
tally relies on an effective optimization algorithm to find
the global or a satisfactory local minimum of
(1)
. However,
the non-convex nature of inverse problems often makes
gradient descent unprincipled and non-robust, e.g., to ini-
tialization. In fact, even in a simpler problem where the
forward operator is the identity map (corresponding to a
denoising problem), solving
(1)
with a deep generative prior
is NP-hard as demonstrated in Lei et al. (2019). This es-
tablishes the complexity of solving inverse problems in
general. On the other hand, even for specific cases, gradient
descent possibly fails to find global optima, unlike training
an (over-parameterized) neural network. This is because
inverse problems require building a consistent (or under-
parameterized) system and yielding a unique solution. It
is known both theoretically and empirically that the more
over-parameterized the system is, the easier it is to find the
global minima with first-order methods (Jacot et al.,2018;
Du et al.,2019;Allen-Zhu et al.,2019).
In this paper, we overcome the difficulty by proposing a
new principled optimization scheme for inverse problems
with a deep generative prior. Our algorithm incrementally
optimizes the reconstruction conditioning on a sequence of
λ
’s that are gradually increased from 0 to a prespecified
value. Intuitively, suppose we have found a satisfactory
solution (e.g., the global optimum)
x∗(λ)
as in
(1)
. Then
with a small increase
∆λ
in the hyperparameter, the new
solution
x∗(λ+ ∆λ)
should be close to
x∗(λ)
and easy to
find if starting from
x∗(λ)
. Our algorithm is related to amor-
tized optimization (Amos,2022) in that the difficulty and
the high computing cost of finding
x∗(λ)
for the original
inverse problem is amortized over a sequence of much easier
tasks, where finding
x∗(0)
is feasible and the solution to
one task facilitates solving the next. We refer to our method
as Amortized Inverse Problem Optimization (AIPO).
It is noteworthy that AIPO is different from the amortized
optimization in the conventional sense, which uses learning
to approximate/predict the solutions to similar problems
(Amos,2022). In stark contrast, we are spreading the dif-
ficulty in solving the original problem into a sequence of
much easier tasks, each of which is still the optimization of
an inverse problem objective function. We provide a the-
oretical underpinning of AIPO: Under some conventional
assumptions, AIPO is guaranteed to find the global mini-
mum. A practical and efficient algorithm is also provided.
Empirically, our algorithm exhibits superior performance
in minimizing the loss of various inverse problems, includ-
ing denoising, noisy compressed sensing, and inpainting.
To the best of our knowledge, AIPO is the first principled
and efficient algorithm for solving inverse problems with a
flow-based prior.
The paper proceeds as follows. In Section 2, we provide
background knowledge in normalizing flows and amortized
optimization and introduce example inverse problems. In
Section 3, we formally propose AIPO and give theoreti-
cal analysis. In Section 4, we illustrate our algorithm and
show its outstanding performance compared to conventional
methods that have
λ
fixed during optimization. Section 5
concludes the paper. We defer the proofs, some technical
details and experiment settings, and supplementary results
to the appendix.
2. Backgrounds
We first provide an overview of normalizing flows that are
used as the generative prior in our setting. We also briefly
introduce amortized optimization based on learning and
highlight its difference from the proposed AIPO. In addition,
we showcase three representative inverse problem tasks, on
which our algorithm will be evaluated.
2.1. Normalizing Flows
Normalizing flows (NFs) (Rezende & Mohamed,2015;Pa-
pamakarios et al.,2021) are a family of generative models
and capable of representing an
n
-dimensional complex dis-
tribution by transforming it to a simple base distribution
(e.g., standard Gaussian or uniform distribution) of the same
dimension. Compared to other generative models such as
GAN (Goodfellow et al.,2014) and variational autoencoders
(Kingma & Welling,2013), NFs use a bijective (invertible)
mapping and are computationally flexible in the sense that
they admit sampling from the distribution efficiently and
conduct exact likelihood estimation.
To be more specific, let
x∈Rn
denote a data point that
follows an unknown complex distribution and
z∈Rn
fol-
low some pre-specified based distribution such as a standard
Gaussian. An NF model learns a differentiable bijective
function
G:Rn→Rn
such that
x=G(z)
. To sam-
ple from the data distribution
pG(x)
, one can first generate
z∼p(z)
and then apply the transformation
x=G(z)
.
Moreover, the invertibility of
G
allows one to use the change-
of-variable formula to calculate the likelihood of xby
log pG(x) = log p(z) + log |det(JG−1(x))|,
where
JG−1
denotes the Jacobian matrix of the inverse map-
ping
G−1
evaluated at
x
. To speed up the computation,
G
is usually composed of several simpler invertible functions
that have triangular Jacobian matrices. Typical NFs include
RealNVP (Dinh et al.,2016) and GLOW (Kingma & Dhari-
wal,2018). For more details of NF models, we refer the
readers to the review by Papamakarios et al. (2021) and the