inverse problems can be intractable, as numerically approximating infinite-dimensional posterior distribu-
tions with complex structures requires an untenable number of numerical solutions at different parameters,
i.e., these problems suffer from the curse of dimensionality. Many mathematical and numerical techniques
are developed to mitigate the computational burden of these problems. Examples of these techniques are
(i) advanced sampling methods exploiting the intrinsic low-dimensionality [12,13] or derivatives [14–16] of
posterior distributions, (ii) direct posterior construction and statistical computation via Laplace approxima-
tion [17,18], deterministic quadrature [19,20], or transport maps [21–24], and (iii) surrogate modeling using
polynomial approximation [25,26] or model order reduction [27–29] combined with multilevel or multifidelity
methods [30–32].
Neural operators, or neural network representations of nonlinear maps between function spaces, have
gained significant interest in recent years for their ability to represent the parameter-to-state maps defined
by nonlinear parametric PDEs, and approximate these maps using a limited number of PDE solutions at
samples of different parameters [33–44]. Notable neural operators include POD-NN [44], DeepONet [38],
Fourier neural operator [45], and derivative-informed reduced basis neural networks [39]. The problem
of approximating nonlinear maps is often referred to as the operator learning problem, and numerically
solving the operator learning problem by optimizing the neural network weights is referred to as training.
Neural operators are fast-to-evaluate and offer an alternative to the existing surrogate modeling techniques
for accelerating the posterior characterization of infinite-dimensional Bayesian inverse problems by replacing
the nonlinear PDE solves with evaluations of trained neural operators. We explore this alternative surrogate
modeling approach using neural operators in this work.
The direct deployment of trained neural operators as surrogates of the nonlinear PDE-based model
transfers most of the computational cost from posterior characterization to the offline generation of training
samples and neural network training. Moreover, in contrast to some of the surrogate modeling approaches
that approximate the parameter-to-observation or parameter-to-likelihood maps [25,46], neural operators
approximate the parameter-to-state map, or learn the physical laws. As a result, they can be used as
surrogates for a class of different Bayesian inverse problems with models governed by the same PDEs but with
different types of observations and noise models, thus further amortize the cost of surrogate construction.
While the drastic reduction of computational cost is advantageous, the accuracy of trained neural op-
erators as well as the accuracy of the resulting posterior characterization produced by them needs to be
examined. In theory, there are universal approximation results, such as those for DeepONet [38], Fourier
neural operators [35], and reduced basis architectures [33,40], that imply the existence of neural operators
that approximate a given nonlinear map between function spaces within certain classes arbitrarily well. In
practice, however, constructing and training neural operators to satisfy a given accuracy can be challenging
and unreliable. One often observes an empirical accuracy ceiling – enriching training data and enhancing the
representation power of network operators via increasing the inner-layer dimensions or the depth of neural
networks, as often suggested by universal approximation theories, do not guarantee improved performance.
In fact, in certain cases, increasing training data or depth of networks can lead to degraded performance.
These behaviors are contrary to some other approximation methods, such as the finite element method with
hp-refinement and surrogate modeling using polynomial approximation or model order reduction, for which
theoretical results are well-connected to numerical implementation for controlling and reducing approxima-
tion errors [47–50]. The unreliability of neural operator performance improvement via training is a result of
several confounding reasons that are discussed in this work. It is demonstrated via empirical studies in recent
work by de Hoop et. al [51], where neural operator performance, measured by their cost–accuracy trade-off,
for approximating the parameter-to-state maps of various nonlinear parametric PDEs are provided.
The approximation error of a trained neural operator in the operator learning problem propagates to
the error in the solutions of Bayesian inverse problems when the trained neural operator is employed as a
surrogate. We demonstrate, through deriving an a priori bound, that the approximation error of a trained
neural operator controls the error in the posterior distributions defined using the trained neural operator.
Additionally, the bounding constant shows that Bayesian inverse problems can be ill-conditioned to the
approximation error of neural operators in many scenarios, such as when the prior is uninformative, data is
high-dimensional, noise corruption is small, or the models are inadequate. Our theoretical result suggests
that for many challenging Bayesian inverse problems, posing accuracy requirements on their solutions may
3