Residual-Based Error Correction for Neural Operator Accelerated Infinite-Dimensional Bayesian Inverse Problems

2025-05-01 0 0 7.09MB 37 页 10玖币

Residual-Based Error Correction for Neural Operator Accelerated

Inﬁnite-Dimensional Bayesian Inverse Problems

Lianghao Cao∗, Thomas O’Leary-Roseberry, Prashant K. Jha, J. Tinsley Oden, Omar Ghattas

Oden Institute for Computational Sciences and Engineering, The University of Texas at Austin, 201 E. 24th Street, C0200,

Austin, TX 78712, United States of America.

Abstract

We explore using neural operators, or neural network representations of nonlinear maps between function

spaces, to accelerate inﬁnite-dimensional Bayesian inverse problems (BIPs) with models governed by nonlin-

ear parametric partial diﬀerential equations (PDEs). Neural operators have gained signiﬁcant attention in

recent years for their ability to approximate the parameter-to-solution maps deﬁned by PDEs using as train-

ing data solutions of PDEs at a limited number of parameter samples. The computational cost of BIPs can

be drastically reduced if a large number of PDE solves required for posterior characterization are replaced

with evaluations of trained neural operators. However, reducing error in the resulting BIP solutions via

reducing the approximation error of the neural operators in training can be challenging and unreliable. We

provide an a priori error bound result that implies certain BIPs can be ill-conditioned to the approximation

error of neural operators, thus leading to inaccessible accuracy requirements in training. To reliably deploy

neural operators in BIPs, we consider a strategy for enhancing the performance of neural operators, which

is to correct the prediction of a trained neural operator by solving a linear variational problem based on

the PDE residual. We show that a trained neural operator with error correction can achieve a quadratic

reduction of its approximation error, all while retaining substantial computational speedups of posterior

sampling when models are governed by highly nonlinear PDEs. The strategy is applied to two numerical

examples of BIPs based on a nonlinear reaction–diﬀusion problem and deformation of hyperelastic materials.

We demonstrate that posterior representations of the two BIPs produced using trained neural operators are

greatly and consistently enhanced by error correction.

Keywords: uncertainty quantiﬁcation, partial diﬀerential equations, machine learning, neural networks,

operator learning, error analysis

Contents

1 Introduction 2

1.1 Relatedworks............................................. 4

1.2 Layoutofthepaper ......................................... 5

2 Preliminaries 5

2.1 Models governed by parametric partial diﬀerential equations . . . . . . . . . . . . . . . . . . . 5

2.2 Inﬁnite-dimensional Bayesian inverse problems . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Numerical solutions of Bayesian inverse problems . . . . . . . . . . . . . . . . . . . . . . . . . 7

∗Corresponding authors

Email addresses: lianghao@oden.utexas.edu (Lianghao Cao), tom.olearyroseberry@utexas.edu (Thomas

O’Leary-Roseberry), prashant.jha@austin.utexas.edu (Prashant K. Jha), oden@oden.utexas.edu (J. Tinsley Oden),

omar@oden.utexas.edu (Omar Ghattas)

arXiv:2210.03008v2 [math.NA] 19 Oct 2022

3 Neural operators and approximation errors 8

3.1 Operator learning with neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Sources and reduction of the approximation errors . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Propagation of the approximation errors in Bayesian inverse problems . . . . . . . . . . . . . 11

4 Residual-based error correction of neural operator predictions 13

4.1 The residual-based error correction problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Error correction of neural operator prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3 Connection to goal-oriented a posteriori error estimation . . . . . . . . . . . . . . . . . . . . . 15

4.4 Discussion of computational costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Numerical Examples 17

5.1 Derivative-informed reduced basis neural operator . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2 Software................................................ 18

5.3 Inverting a coeﬃcient ﬁeld of a nonlinear reaction–diﬀusion problem . . . . . . . . . . . . . . 18

5.3.1 Numerical approximation and neural operator performance . . . . . . . . . . . . . . . 19

5.3.2 Bayesian inverse problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.3.3 Posterior visualization and cost analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.4 Hyperelastic material properties discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.4.1 A model for hyperelastic material deformation . . . . . . . . . . . . . . . . . . . . . . 23

5.4.2 Numerical approximation and neural operator performance . . . . . . . . . . . . . . . 25

5.4.3 Bayesian inverse problem setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.4.4 Posterior visualization and cost analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Conclusion and Outlook 29

References 30

Appendix A The full statement and proof of Theorem 1 33

Appendix B The full statement of the corollary to the Newton–Kantorovich theorem 37

1. Introduction

Many mathematical models of physical systems are governed by parametric partial diﬀerential equations

(PDEs), where the states of the systems are described by spatially and/or temporally–varying functions

of PDE solutions, such as the evolution of temperature ﬁelds modeled by the heat equation and material

deformation modeled by the nonlinear elasticity equation. The parameters, such as thermal conductivity

and Young’s modulus, of these models, often characterize properties of the physical systems and cannot be

directly determined; one has to solve inverse problems for that purpose, where the parameters are inferred

from discrete and noisy observations of the states. To account for uncertainties in observations and our

prior knowledge of the parameters, represented by prior probability distributions, in the solutions of inverse

problems, they are often formulated via Bayes’ rule, or as Bayesian inverse problems, for which solutions

are probability distributions of the parameters conditioned on the observations, or posterior probability

distributions. In some scenarios, our prior knowledge of the parameters requires them to be treated as

functions, leading to inﬁnite-dimensional Bayesian inverse problems. These scenarios arise, for example,

when the parameters are possibly spatially varying with uncertain spatial structures. Bayesian inverse

problems are fundamental to constructing predictive models [1–4], and the need for inferring parameters as

functions can be found in many areas of engineering, sciences, and medicine [5–10].

For models governed by large-scale highly nonlinear parametric PDEs, numerical simulations are com-

putationally expensive as it involves solving high-dimensional linear systems in an iterative manner many

times to obtain solutions with desired accuracy [11]. In these cases, solving inﬁnite-dimensional Bayesian

inverse problems can be intractable, as numerically approximating inﬁnite-dimensional posterior distribu-

tions with complex structures requires an untenable number of numerical solutions at diﬀerent parameters,

i.e., these problems suﬀer from the curse of dimensionality. Many mathematical and numerical techniques

are developed to mitigate the computational burden of these problems. Examples of these techniques are

(i) advanced sampling methods exploiting the intrinsic low-dimensionality [12,13] or derivatives [14–16] of

posterior distributions, (ii) direct posterior construction and statistical computation via Laplace approxima-

tion [17,18], deterministic quadrature [19,20], or transport maps [21–24], and (iii) surrogate modeling using

polynomial approximation [25,26] or model order reduction [27–29] combined with multilevel or multiﬁdelity

methods [30–32].

Neural operators, or neural network representations of nonlinear maps between function spaces, have

gained signiﬁcant interest in recent years for their ability to represent the parameter-to-state maps deﬁned

by nonlinear parametric PDEs, and approximate these maps using a limited number of PDE solutions at

samples of diﬀerent parameters [33–44]. Notable neural operators include POD-NN [44], DeepONet [38],

Fourier neural operator [45], and derivative-informed reduced basis neural networks [39]. The problem

of approximating nonlinear maps is often referred to as the operator learning problem, and numerically

solving the operator learning problem by optimizing the neural network weights is referred to as training.

Neural operators are fast-to-evaluate and oﬀer an alternative to the existing surrogate modeling techniques

for accelerating the posterior characterization of inﬁnite-dimensional Bayesian inverse problems by replacing

the nonlinear PDE solves with evaluations of trained neural operators. We explore this alternative surrogate

modeling approach using neural operators in this work.

The direct deployment of trained neural operators as surrogates of the nonlinear PDE-based model

transfers most of the computational cost from posterior characterization to the oﬄine generation of training

samples and neural network training. Moreover, in contrast to some of the surrogate modeling approaches

that approximate the parameter-to-observation or parameter-to-likelihood maps [25,46], neural operators

approximate the parameter-to-state map, or learn the physical laws. As a result, they can be used as

surrogates for a class of diﬀerent Bayesian inverse problems with models governed by the same PDEs but with

diﬀerent types of observations and noise models, thus further amortize the cost of surrogate construction.

While the drastic reduction of computational cost is advantageous, the accuracy of trained neural op-

erators as well as the accuracy of the resulting posterior characterization produced by them needs to be

examined. In theory, there are universal approximation results, such as those for DeepONet [38], Fourier

neural operators [35], and reduced basis architectures [33,40], that imply the existence of neural operators

that approximate a given nonlinear map between function spaces within certain classes arbitrarily well. In

practice, however, constructing and training neural operators to satisfy a given accuracy can be challenging

and unreliable. One often observes an empirical accuracy ceiling – enriching training data and enhancing the

representation power of network operators via increasing the inner-layer dimensions or the depth of neural

networks, as often suggested by universal approximation theories, do not guarantee improved performance.

In fact, in certain cases, increasing training data or depth of networks can lead to degraded performance.

These behaviors are contrary to some other approximation methods, such as the ﬁnite element method with

hp-reﬁnement and surrogate modeling using polynomial approximation or model order reduction, for which

theoretical results are well-connected to numerical implementation for controlling and reducing approxima-

tion errors [47–50]. The unreliability of neural operator performance improvement via training is a result of

several confounding reasons that are discussed in this work. It is demonstrated via empirical studies in recent

work by de Hoop et. al [51], where neural operator performance, measured by their cost–accuracy trade-oﬀ,

for approximating the parameter-to-state maps of various nonlinear parametric PDEs are provided.

The approximation error of a trained neural operator in the operator learning problem propagates to

the error in the solutions of Bayesian inverse problems when the trained neural operator is employed as a

surrogate. We demonstrate, through deriving an a priori bound, that the approximation error of a trained

neural operator controls the error in the posterior distributions deﬁned using the trained neural operator.

Additionally, the bounding constant shows that Bayesian inverse problems can be ill-conditioned to the

approximation error of neural operators in many scenarios, such as when the prior is uninformative, data is

high-dimensional, noise corruption is small, or the models are inadequate. Our theoretical result suggests

that for many challenging Bayesian inverse problems, posing accuracy requirements on their solutions may

lead to signiﬁcantly tighter accuracy requirements in the training of neural operators that are practically

inaccessible due to the limitation of neural operator training.

In this work, we consider a strategy for reliably deploying a trained neural operator as a surrogate in,

but not limited to, inﬁnite-dimensional Bayesian inverse problems. This strategy is inspired by a recent

work by Jha and Oden [52] on extending the goal-oriented a posteriori error estimation techniques [53–59]

to accelerate Bayesian calibration of high-ﬁdelity models with a calibrated low-ﬁdelity model. Instead of

directly using the prediction of the trained neural operator at a given parameter for likelihood evaluation,

we ﬁrst solve a linear error correction problem based on the PDE residual evaluated at the neural oper-

ator prediction and then use the obtained solution for likelihood evaluation. We show that solving this

error-correction problem is equivalent to generating one Newton iteration under some mild conditions, and

a trained neural operator with error correction can achieve global, i.e., over the prior distribution, quadratic

error reduction when the approximation error of the trained neural operator is relatively small. We expect

that the signiﬁcant accuracy improvement of a trained neural operator from the error correction leads to

vital accuracy improvement of the posterior characterization for challenging Bayesian inverse problems. The

improvement in the accuracy of posterior characterization is achieved while retaining substantial computa-

tional speedups proportional to the expected number of iterative linear solves within a nonlinear PDE solve

at parameters sampled from the posterior distribution,

To showcase the utility of the proposed strategy, two numerical examples are provided. In the ﬁrst

example, we consider the inference of an uncertain coeﬃcient ﬁeld in an equilibrium nonlinear reaction–

diﬀusion problem with a cubic reaction term from discrete observations of the state. The second example

concerns the inference of Young’s modulus, as a spatially varying ﬁeld, of a hyperelastic material from

discrete observations of its displacement in response to an external force. For both examples, trained neural

operators, despite reaching their empirical accuracy ceilings, fail to recover all distinctive features of the

posterior predictive means, whereas the error-corrected neural operators are consistently successful in such

tasks.

1.1. Related works

We next discuss some of the related works on error correction in surrogate modeling approaches for

Bayesian inverse problems. To the best of our knowledge, the existing works mainly focus on building data-

driven models of the approximation error of surrogate parameter-to-observation maps. The sampling-based

techniques for error correction presented in these works are diﬀerent from the residual-based approach pro-

posed in this work. The term model error correction sometime refers to numerical methods for representing

model inadequacy, which is beyond the scope of this work.

In the context of model order reduction, Arridge et. al [60] proposed an oﬄine sampling approach for

constructing a normal approximation for the joint probability distribution of the error in surrogate-predicted

observations and the parameter over the prior distribution. The probability distribution of the error con-

ditioned on the parameter can then be directly used for correcting likelihood evaluations deﬁned using an

additive Gaussian noise model. This approach simpliﬁes the conditional dependence of the error on the

parameter, leading to unreliable performance as pointed out by Manzoni et. al [61], who proposed two al-

ternative error models: one based on radial basis interpolation and the other on linear regression models.

Cui et. al [62] presented two methods for adaptively constructing error models during posterior sampling

using delayed-acceptance Metropolis–Hastings: one is similar to that of Arridge et. al but with posterior

samples, and the other is a zeroth order error correction using the error evaluated at the current Markov

chain position.

Additionally, correcting errors in neural network surrogates is explored by Yan and Zhou [63] for large-

scale Bayesian inverse problems. They propose a strategy based on a predictor–corrector scheme using two

neural networks. The predictor is a deep neural network surrogate of the parameter-to-observable map

constructed oﬄine. The corrector is a shallow neural network that takes the prediction of the surrogate as

input and produces a corrected prediction. The corrector is trained using a few model simulations produced

during posterior characterization.

1.2. Layout of the paper

The layout of the paper is as follows. In Section 2, inﬁnite-dimensional Bayesian inverse problems and

their numerical solutions are introduced in an abstract Hilbert space setting. In Section 3, the operator

learning problem associated with neural operator approximation of nonlinear mappings in function spaces

is introduced. The sources and reduction of approximation errors in neural network training are discussed.

A result on a priori bound of the error in the posterior distributions of the Bayesian inverse problem

using the operator learning error is provided and interpreted. In Section 4, we introduce the residual-

based error correction problem and discuss its conditional equivalency to a Newton-step problem. Then the

error-corrected neural operator is proposed, and computational cost analysis for its use as a surrogate for

posterior sampling is provided. Connections of the error-correction problem to goal-oriented a posteriori

error estimation techniques are also taken up in the same section. In Section 5, the physical, mathematical,

and numerical settings for the two numerical examples of inﬁnite-dimensional Bayesian inverse problems are

provided. The empirical accuracy of neural operators and error-corrected neural operators at diﬀerent sizes

of training data is presented. Posterior mean estimates generated by the model, trained neural operators,

and neural operators with error correction are visualized and examined to understand the accuracy of

posterior sampling. The results of empirical and asymptotic cost analysis for the posterior sampling are also

showcased. The concluding remarks are given in Section 6.

2. Preliminaries

In this section, we introduce inﬁnite-dimensional Bayesian inverse problems in an abstract Hilbert space

setting. We refer to [17,64,65] and references therein for a more detailed analysis and numerical imple-

mentation of inﬁnite-dimensional Bayesian inverse problems. For general treatments of Bayesian inference

problems, see [66,67]. For a reference on the theory of probability in inﬁnite-dimensional Hilbert spaces,

see [68].

2.1. Models governed by parametric partial diﬀerential equations

Consider a mathematical model that predicts the state u∈ U of a physical system given a parameter

m∈ M. We assume that the model is governed by partial diﬀerential equations (PDEs), and Uand

Mare inﬁnite-dimensional separable real Hilbert spaces endowed with inner products (·,·)Uand (·,·)M,

respectively. The state space Uis a Sobolev space deﬁned over a bounded, open, and suﬃciently regular

spatial domain Ωu⊂R3. It is either consists of functions with ranges in a vector space of dimension ds≤3,

such as H1(Ωu;Rds), or time-evolving functions, such as L2(0, T ;H1(Ωu;Rds)) with T > 0. The former is

appropriate for boundary value problems (BVPs), while the latter is appropriate for initial and boundary

value problems (IBVPs). We assume Mconsists of spatially-varying scalar-valued functions deﬁned over a

set Ωm⊂Ωu. The parameter mmay appear in boundary conditions, initial conditions, forcing terms, or

coeﬃcients of the PDEs.

We specify the model as an abstract nonlinear variational problem as follows. Let U0⊆ U be a closed

subspace that satisﬁes the homogenized strongly-enforced boundary and initial conditions of the PDEs. Let

the solution set Vu⊆ U be an aﬃne space of U0that satisﬁes the strongly-enforced boundary conditions and

initial conditions that possibly depend on m. The abstract nonlinear variational problem can be written as,

Given m∈ M,ﬁnd u∈ Vusuch that R(u, m) = 0 ∈ U∗

0,(1)

where R=R(u, m) is a residual operator associated with the variational form, and U∗

0is the dual space of

the space of test functions U0. We assume that the residual operator is possibly nonlinear with respect to

both the parameter and state, and the nonlinear variational problem has a unique solution for any m∈ M.

As a result, we can deﬁne a solution operator F:M→Vu, or the forward operator, of the model, i.e.,

R(F(m), m)≡0∀m∈ M.(2)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Residual-BasedErrorCorrectionforNeuralOperatorAcceleratedInnite-DimensionalBayesianInverseProblemsLianghaoCao,ThomasO'Leary-Roseberry,PrashantK.Jha,J.TinsleyOden,OmarGhattasOdenInstituteforComputationalSciencesandEngineering,TheUniversityofTexasatAustin,201E.24thStreet,C0200,Austin,TX78712,UnitedS...

展开>> 收起<<

Residual-Based Error Correction for Neural Operator Accelerated Infinite-Dimensional Bayesian Inverse Problems.pdf

共37页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Residual-Based Error Correction for Neural Operator Accelerated Infinite-Dimensional Bayesian Inverse Problems

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: