Residual-Based Error Correction for Neural Operator Accelerated Infinite-Dimensional Bayesian Inverse Problems

2025-05-01 0 0 7.09MB 37 页 10玖币
侵权投诉
Residual-Based Error Correction for Neural Operator Accelerated
Infinite-Dimensional Bayesian Inverse Problems
Lianghao Cao, Thomas O’Leary-Roseberry, Prashant K. Jha, J. Tinsley Oden, Omar Ghattas
Oden Institute for Computational Sciences and Engineering, The University of Texas at Austin, 201 E. 24th Street, C0200,
Austin, TX 78712, United States of America.
Abstract
We explore using neural operators, or neural network representations of nonlinear maps between function
spaces, to accelerate infinite-dimensional Bayesian inverse problems (BIPs) with models governed by nonlin-
ear parametric partial differential equations (PDEs). Neural operators have gained significant attention in
recent years for their ability to approximate the parameter-to-solution maps defined by PDEs using as train-
ing data solutions of PDEs at a limited number of parameter samples. The computational cost of BIPs can
be drastically reduced if a large number of PDE solves required for posterior characterization are replaced
with evaluations of trained neural operators. However, reducing error in the resulting BIP solutions via
reducing the approximation error of the neural operators in training can be challenging and unreliable. We
provide an a priori error bound result that implies certain BIPs can be ill-conditioned to the approximation
error of neural operators, thus leading to inaccessible accuracy requirements in training. To reliably deploy
neural operators in BIPs, we consider a strategy for enhancing the performance of neural operators, which
is to correct the prediction of a trained neural operator by solving a linear variational problem based on
the PDE residual. We show that a trained neural operator with error correction can achieve a quadratic
reduction of its approximation error, all while retaining substantial computational speedups of posterior
sampling when models are governed by highly nonlinear PDEs. The strategy is applied to two numerical
examples of BIPs based on a nonlinear reaction–diffusion problem and deformation of hyperelastic materials.
We demonstrate that posterior representations of the two BIPs produced using trained neural operators are
greatly and consistently enhanced by error correction.
Keywords: uncertainty quantification, partial differential equations, machine learning, neural networks,
operator learning, error analysis
Contents
1 Introduction 2
1.1 Relatedworks............................................. 4
1.2 Layoutofthepaper ......................................... 5
2 Preliminaries 5
2.1 Models governed by parametric partial differential equations . . . . . . . . . . . . . . . . . . . 5
2.2 Infinite-dimensional Bayesian inverse problems . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Numerical solutions of Bayesian inverse problems . . . . . . . . . . . . . . . . . . . . . . . . . 7
Corresponding authors
Email addresses: lianghao@oden.utexas.edu (Lianghao Cao), tom.olearyroseberry@utexas.edu (Thomas
O’Leary-Roseberry), prashant.jha@austin.utexas.edu (Prashant K. Jha), oden@oden.utexas.edu (J. Tinsley Oden),
omar@oden.utexas.edu (Omar Ghattas)
1
arXiv:2210.03008v2 [math.NA] 19 Oct 2022
3 Neural operators and approximation errors 8
3.1 Operator learning with neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Sources and reduction of the approximation errors . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Propagation of the approximation errors in Bayesian inverse problems . . . . . . . . . . . . . 11
4 Residual-based error correction of neural operator predictions 13
4.1 The residual-based error correction problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Error correction of neural operator prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Connection to goal-oriented a posteriori error estimation . . . . . . . . . . . . . . . . . . . . . 15
4.4 Discussion of computational costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Numerical Examples 17
5.1 Derivative-informed reduced basis neural operator . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Software................................................ 18
5.3 Inverting a coefficient field of a nonlinear reaction–diffusion problem . . . . . . . . . . . . . . 18
5.3.1 Numerical approximation and neural operator performance . . . . . . . . . . . . . . . 19
5.3.2 Bayesian inverse problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.3 Posterior visualization and cost analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Hyperelastic material properties discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4.1 A model for hyperelastic material deformation . . . . . . . . . . . . . . . . . . . . . . 23
5.4.2 Numerical approximation and neural operator performance . . . . . . . . . . . . . . . 25
5.4.3 Bayesian inverse problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.4 Posterior visualization and cost analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Conclusion and Outlook 29
References 30
Appendix A The full statement and proof of Theorem 1 33
Appendix B The full statement of the corollary to the Newton–Kantorovich theorem 37
1. Introduction
Many mathematical models of physical systems are governed by parametric partial differential equations
(PDEs), where the states of the systems are described by spatially and/or temporally–varying functions
of PDE solutions, such as the evolution of temperature fields modeled by the heat equation and material
deformation modeled by the nonlinear elasticity equation. The parameters, such as thermal conductivity
and Young’s modulus, of these models, often characterize properties of the physical systems and cannot be
directly determined; one has to solve inverse problems for that purpose, where the parameters are inferred
from discrete and noisy observations of the states. To account for uncertainties in observations and our
prior knowledge of the parameters, represented by prior probability distributions, in the solutions of inverse
problems, they are often formulated via Bayes’ rule, or as Bayesian inverse problems, for which solutions
are probability distributions of the parameters conditioned on the observations, or posterior probability
distributions. In some scenarios, our prior knowledge of the parameters requires them to be treated as
functions, leading to infinite-dimensional Bayesian inverse problems. These scenarios arise, for example,
when the parameters are possibly spatially varying with uncertain spatial structures. Bayesian inverse
problems are fundamental to constructing predictive models [14], and the need for inferring parameters as
functions can be found in many areas of engineering, sciences, and medicine [510].
For models governed by large-scale highly nonlinear parametric PDEs, numerical simulations are com-
putationally expensive as it involves solving high-dimensional linear systems in an iterative manner many
times to obtain solutions with desired accuracy [11]. In these cases, solving infinite-dimensional Bayesian
2
inverse problems can be intractable, as numerically approximating infinite-dimensional posterior distribu-
tions with complex structures requires an untenable number of numerical solutions at different parameters,
i.e., these problems suffer from the curse of dimensionality. Many mathematical and numerical techniques
are developed to mitigate the computational burden of these problems. Examples of these techniques are
(i) advanced sampling methods exploiting the intrinsic low-dimensionality [12,13] or derivatives [1416] of
posterior distributions, (ii) direct posterior construction and statistical computation via Laplace approxima-
tion [17,18], deterministic quadrature [19,20], or transport maps [2124], and (iii) surrogate modeling using
polynomial approximation [25,26] or model order reduction [2729] combined with multilevel or multifidelity
methods [3032].
Neural operators, or neural network representations of nonlinear maps between function spaces, have
gained significant interest in recent years for their ability to represent the parameter-to-state maps defined
by nonlinear parametric PDEs, and approximate these maps using a limited number of PDE solutions at
samples of different parameters [3344]. Notable neural operators include POD-NN [44], DeepONet [38],
Fourier neural operator [45], and derivative-informed reduced basis neural networks [39]. The problem
of approximating nonlinear maps is often referred to as the operator learning problem, and numerically
solving the operator learning problem by optimizing the neural network weights is referred to as training.
Neural operators are fast-to-evaluate and offer an alternative to the existing surrogate modeling techniques
for accelerating the posterior characterization of infinite-dimensional Bayesian inverse problems by replacing
the nonlinear PDE solves with evaluations of trained neural operators. We explore this alternative surrogate
modeling approach using neural operators in this work.
The direct deployment of trained neural operators as surrogates of the nonlinear PDE-based model
transfers most of the computational cost from posterior characterization to the offline generation of training
samples and neural network training. Moreover, in contrast to some of the surrogate modeling approaches
that approximate the parameter-to-observation or parameter-to-likelihood maps [25,46], neural operators
approximate the parameter-to-state map, or learn the physical laws. As a result, they can be used as
surrogates for a class of different Bayesian inverse problems with models governed by the same PDEs but with
different types of observations and noise models, thus further amortize the cost of surrogate construction.
While the drastic reduction of computational cost is advantageous, the accuracy of trained neural op-
erators as well as the accuracy of the resulting posterior characterization produced by them needs to be
examined. In theory, there are universal approximation results, such as those for DeepONet [38], Fourier
neural operators [35], and reduced basis architectures [33,40], that imply the existence of neural operators
that approximate a given nonlinear map between function spaces within certain classes arbitrarily well. In
practice, however, constructing and training neural operators to satisfy a given accuracy can be challenging
and unreliable. One often observes an empirical accuracy ceiling – enriching training data and enhancing the
representation power of network operators via increasing the inner-layer dimensions or the depth of neural
networks, as often suggested by universal approximation theories, do not guarantee improved performance.
In fact, in certain cases, increasing training data or depth of networks can lead to degraded performance.
These behaviors are contrary to some other approximation methods, such as the finite element method with
hp-refinement and surrogate modeling using polynomial approximation or model order reduction, for which
theoretical results are well-connected to numerical implementation for controlling and reducing approxima-
tion errors [4750]. The unreliability of neural operator performance improvement via training is a result of
several confounding reasons that are discussed in this work. It is demonstrated via empirical studies in recent
work by de Hoop et. al [51], where neural operator performance, measured by their cost–accuracy trade-off,
for approximating the parameter-to-state maps of various nonlinear parametric PDEs are provided.
The approximation error of a trained neural operator in the operator learning problem propagates to
the error in the solutions of Bayesian inverse problems when the trained neural operator is employed as a
surrogate. We demonstrate, through deriving an a priori bound, that the approximation error of a trained
neural operator controls the error in the posterior distributions defined using the trained neural operator.
Additionally, the bounding constant shows that Bayesian inverse problems can be ill-conditioned to the
approximation error of neural operators in many scenarios, such as when the prior is uninformative, data is
high-dimensional, noise corruption is small, or the models are inadequate. Our theoretical result suggests
that for many challenging Bayesian inverse problems, posing accuracy requirements on their solutions may
3
lead to significantly tighter accuracy requirements in the training of neural operators that are practically
inaccessible due to the limitation of neural operator training.
In this work, we consider a strategy for reliably deploying a trained neural operator as a surrogate in,
but not limited to, infinite-dimensional Bayesian inverse problems. This strategy is inspired by a recent
work by Jha and Oden [52] on extending the goal-oriented a posteriori error estimation techniques [5359]
to accelerate Bayesian calibration of high-fidelity models with a calibrated low-fidelity model. Instead of
directly using the prediction of the trained neural operator at a given parameter for likelihood evaluation,
we first solve a linear error correction problem based on the PDE residual evaluated at the neural oper-
ator prediction and then use the obtained solution for likelihood evaluation. We show that solving this
error-correction problem is equivalent to generating one Newton iteration under some mild conditions, and
a trained neural operator with error correction can achieve global, i.e., over the prior distribution, quadratic
error reduction when the approximation error of the trained neural operator is relatively small. We expect
that the significant accuracy improvement of a trained neural operator from the error correction leads to
vital accuracy improvement of the posterior characterization for challenging Bayesian inverse problems. The
improvement in the accuracy of posterior characterization is achieved while retaining substantial computa-
tional speedups proportional to the expected number of iterative linear solves within a nonlinear PDE solve
at parameters sampled from the posterior distribution,
To showcase the utility of the proposed strategy, two numerical examples are provided. In the first
example, we consider the inference of an uncertain coefficient field in an equilibrium nonlinear reaction–
diffusion problem with a cubic reaction term from discrete observations of the state. The second example
concerns the inference of Young’s modulus, as a spatially varying field, of a hyperelastic material from
discrete observations of its displacement in response to an external force. For both examples, trained neural
operators, despite reaching their empirical accuracy ceilings, fail to recover all distinctive features of the
posterior predictive means, whereas the error-corrected neural operators are consistently successful in such
tasks.
1.1. Related works
We next discuss some of the related works on error correction in surrogate modeling approaches for
Bayesian inverse problems. To the best of our knowledge, the existing works mainly focus on building data-
driven models of the approximation error of surrogate parameter-to-observation maps. The sampling-based
techniques for error correction presented in these works are different from the residual-based approach pro-
posed in this work. The term model error correction sometime refers to numerical methods for representing
model inadequacy, which is beyond the scope of this work.
In the context of model order reduction, Arridge et. al [60] proposed an offline sampling approach for
constructing a normal approximation for the joint probability distribution of the error in surrogate-predicted
observations and the parameter over the prior distribution. The probability distribution of the error con-
ditioned on the parameter can then be directly used for correcting likelihood evaluations defined using an
additive Gaussian noise model. This approach simplifies the conditional dependence of the error on the
parameter, leading to unreliable performance as pointed out by Manzoni et. al [61], who proposed two al-
ternative error models: one based on radial basis interpolation and the other on linear regression models.
Cui et. al [62] presented two methods for adaptively constructing error models during posterior sampling
using delayed-acceptance Metropolis–Hastings: one is similar to that of Arridge et. al but with posterior
samples, and the other is a zeroth order error correction using the error evaluated at the current Markov
chain position.
Additionally, correcting errors in neural network surrogates is explored by Yan and Zhou [63] for large-
scale Bayesian inverse problems. They propose a strategy based on a predictor–corrector scheme using two
neural networks. The predictor is a deep neural network surrogate of the parameter-to-observable map
constructed offline. The corrector is a shallow neural network that takes the prediction of the surrogate as
input and produces a corrected prediction. The corrector is trained using a few model simulations produced
during posterior characterization.
4
1.2. Layout of the paper
The layout of the paper is as follows. In Section 2, infinite-dimensional Bayesian inverse problems and
their numerical solutions are introduced in an abstract Hilbert space setting. In Section 3, the operator
learning problem associated with neural operator approximation of nonlinear mappings in function spaces
is introduced. The sources and reduction of approximation errors in neural network training are discussed.
A result on a priori bound of the error in the posterior distributions of the Bayesian inverse problem
using the operator learning error is provided and interpreted. In Section 4, we introduce the residual-
based error correction problem and discuss its conditional equivalency to a Newton-step problem. Then the
error-corrected neural operator is proposed, and computational cost analysis for its use as a surrogate for
posterior sampling is provided. Connections of the error-correction problem to goal-oriented a posteriori
error estimation techniques are also taken up in the same section. In Section 5, the physical, mathematical,
and numerical settings for the two numerical examples of infinite-dimensional Bayesian inverse problems are
provided. The empirical accuracy of neural operators and error-corrected neural operators at different sizes
of training data is presented. Posterior mean estimates generated by the model, trained neural operators,
and neural operators with error correction are visualized and examined to understand the accuracy of
posterior sampling. The results of empirical and asymptotic cost analysis for the posterior sampling are also
showcased. The concluding remarks are given in Section 6.
2. Preliminaries
In this section, we introduce infinite-dimensional Bayesian inverse problems in an abstract Hilbert space
setting. We refer to [17,64,65] and references therein for a more detailed analysis and numerical imple-
mentation of infinite-dimensional Bayesian inverse problems. For general treatments of Bayesian inference
problems, see [66,67]. For a reference on the theory of probability in infinite-dimensional Hilbert spaces,
see [68].
2.1. Models governed by parametric partial differential equations
Consider a mathematical model that predicts the state u∈ U of a physical system given a parameter
m∈ M. We assume that the model is governed by partial differential equations (PDEs), and Uand
Mare infinite-dimensional separable real Hilbert spaces endowed with inner products (·,·)Uand (·,·)M,
respectively. The state space Uis a Sobolev space defined over a bounded, open, and sufficiently regular
spatial domain ΩuR3. It is either consists of functions with ranges in a vector space of dimension ds3,
such as H1(Ωu;Rds), or time-evolving functions, such as L2(0, T ;H1(Ωu;Rds)) with T > 0. The former is
appropriate for boundary value problems (BVPs), while the latter is appropriate for initial and boundary
value problems (IBVPs). We assume Mconsists of spatially-varying scalar-valued functions defined over a
set Ωmu. The parameter mmay appear in boundary conditions, initial conditions, forcing terms, or
coefficients of the PDEs.
We specify the model as an abstract nonlinear variational problem as follows. Let U0⊆ U be a closed
subspace that satisfies the homogenized strongly-enforced boundary and initial conditions of the PDEs. Let
the solution set Vu⊆ U be an affine space of U0that satisfies the strongly-enforced boundary conditions and
initial conditions that possibly depend on m. The abstract nonlinear variational problem can be written as,
Given m∈ M,find u∈ Vusuch that R(u, m) = 0 ∈ U
0,(1)
where R=R(u, m) is a residual operator associated with the variational form, and U
0is the dual space of
the space of test functions U0. We assume that the residual operator is possibly nonlinear with respect to
both the parameter and state, and the nonlinear variational problem has a unique solution for any m∈ M.
As a result, we can define a solution operator F:M→Vu, or the forward operator, of the model, i.e.,
R(F(m), m)0m∈ M.(2)
5
摘要:

Residual-BasedErrorCorrectionforNeuralOperatorAcceleratedIn nite-DimensionalBayesianInverseProblemsLianghaoCao,ThomasO'Leary-Roseberry,PrashantK.Jha,J.TinsleyOden,OmarGhattasOdenInstituteforComputationalSciencesandEngineering,TheUniversityofTexasatAustin,201E.24thStreet,C0200,Austin,TX78712,UnitedS...

展开>> 收起<<
Residual-Based Error Correction for Neural Operator Accelerated Infinite-Dimensional Bayesian Inverse Problems.pdf

共37页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:37 页 大小:7.09MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 37
客服
关注