NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF PDE S WITH DISCONTINUITIES A P REPRINT

2025-05-02 0 0 2.35MB 40 页 10玖币

侵权投诉

NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF

PDES WITH DISCONTINUITIES

A PREPRINT

Samuel Lanthaler

Computing and Mathematical Science

California Institute of Technology

Pasadena, CA, USA

slanth@caltech.edu

Roberto Molinaro

Seminar for Applied Mathematics

ETH Zurich

Zurich, Switzerland

roberto.molinaro@math.ethz.ch

Patrik Hadorn

Seminar for Applied Mathematics

ETH Zurich

Zurich, Switzerland

Siddhartha Mishra

Seminar for Applied Mathematics

ETH Zurich

Zurich, Switzerland

siddhartha.mishra@math.ethz.ch

October 4, 2022

ABSTRACT

A large class of hyperbolic and advection-dominated PDEs can have solutions with discontinuities.

This paper investigates, both theoretically and empirically, the operator learning of PDEs with

discontinuous solutions. We rigorously prove, in terms of lower approximation bounds, that methods

which entail a linear reconstruction step (e.g. DeepONet or PCA-Net) fail to efﬁciently approximate

the solution operator of such PDEs. In contrast, we show that certain methods employing a non-

linear reconstruction mechanism can overcome these fundamental lower bounds and approximate

the underlying operator efﬁciently. The latter class includes Fourier Neural Operators and a novel

extension of DeepONet termed shift-DeepONet. Our theoretical ﬁndings are conﬁrmed by empirical

results for advection equation, inviscid Burgers’ equation and compressible Euler equations of

aerodynamics.

1 Introduction

Many interesting phenomena in physics and engineering are described by partial differential equations (PDEs) with

discontinuous solutions. The most common types of such PDEs are nonlinear hyperbolic systems of conservation laws

(Dafermos, 2005), such as the Euler equations of aerodynamics, the shallow-water equations of oceanography and

MHD equations of plasma physics. It is well-known that solutions of these PDEs develop ﬁnite-time discontinuities

such as shock waves, even when the initial and boundary data are smooth. Other examples include the propagation of

waves with jumps in linear transport and wave equations, crack and fracture propagation in materials (Sun & Jin, 2012),

moving interfaces in multiphase ﬂows (Drew & Passman, 1998) and motion of very sharp gradients as propagating

fronts and traveling wave solutions for reaction-diffusion equations (Smoller, 2012). Approximating such (propagating)

discontinuities in PDEs is considered to be extremely challenging for traditional numerical methods (Hesthaven, 2018)

as resolving them could require very small grid sizes. Although bespoke numerical methods such as high-resolution

ﬁnite-volume methods, discontinuous Galerkin ﬁnite-element and spectral viscosity methods (Hesthaven, 2018) have

successfully been used in this context, their very high computational cost prohibits their extensive use, particularly for

many-query problems such as UQ, optimal control and (Bayesian) inverse problems (Lye et al., 2020), necessitating the

design of fast machine learning-based surrogates.

As the task at hand in this context is to learn the underlying solution operator that maps input functions (initial and

boundary data) to output functions (solution at a given time), recently developed operator learning methods can be

arXiv:2210.01074v1 [cs.LG] 3 Oct 2022

Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT

employed in this inﬁnite-dimensional setting (Higgins, 2021). These methods include operator networks (Chen &

Chen, 1995) and their deep version, DeepONet (Lu et al., 2019, 2021), where two sets of neural networks (branch and

trunk nets) are combined in a linear reconstruction procedure to obtain an inﬁnite-dimensional output. DeepONets

have been very successfully used for different PDEs (Lu et al., 2021; Mao et al., 2020b; Cai et al., 2021; Lin et al.,

2021). An alternative framework is provided by neural operators (Kovachki et al., 2021a), wherein the afﬁne functions

within DNN hidden layers are generalized to inﬁnite-dimensions by replacing them with kernel integral operators as in

(Li et al., 2020a; Kovachki et al., 2021a; Li et al., 2020b). A computationally efﬁcient form of neural operators is the

Fourier Neural Operator (FNO) (Li et al., 2021a), where a translation invariant kernel is evaluated in Fourier space,

leading to many successful applications for PDEs (Li et al., 2021a,b; Pathak et al., 2022).

Currently available theoretical results for operator learning (e.g. Lanthaler et al. (2022); Kovachki et al. (2021a,b);

De Ryck & Mishra (2022b); Deng et al. (2022)) leverage the regularity (or smoothness) of solutions of the PDE to

prove that frameworks such as DeepONet, FNO and their variants approximate the underlying operator efﬁciently.

Although such regularity holds for many elliptic and parabolic PDEs, it is obviously destroyed when discontinuities

appear in the solutions of the PDEs such as in the hyperbolic PDEs mentioned above. Thus, a priori, it is unclear if

existing operator learning frameworks can efﬁciently approximate PDEs with discontinuous solutions. This explains the

paucity of theoretical and (to a lesser extent) empirical work on operator learning of PDEs with discontinuous solutions

and provides the rationale for the current paper where,

•

using a lower bound, we rigorously prove approximation error estimates to show that operator learning

architectures such as DeepONet (Lu et al., 2021) and PCA-Net (Bhattacharya et al., 2021), which entail

alinear reconstruction step, fail to efﬁciently approximate solution operators of prototypical PDEs with

discontinuities. In particular, the approximation error only decays, at best, linearly in network size.

•

We rigorously prove that using a nonlinear reconstruction procedure within an operator learning architecture

can lead to the efﬁcient approximation of prototypical PDEs with discontinuities. In particular, the approxima-

tion error can decay exponentially in network size, even after discontinuity formation. This result is shown for

two types of architectures with nonlinear reconstruction, namely the widely used Fourier Neural Operator

(FNO) of (Li et al., 2021a) and for a novel variant of DeepONet that we term as shift-DeepONet.

•

We supplement the theoretical results with extensive experiments where FNO and shift-DeepONet are shown

to consistently outperform DeepONet and other baselines for PDEs with discontinuous solutions such as linear

advection, inviscid Burgers’ equation, and both the one- and two-dimensional versions of the compressible

Euler equations of gas dynamics.

2 Methods

Setting.

Given compact domains

D⊂Rd

U⊂Rd0

, we consider the approximation of operators

G:X → Y

, where

X ⊂ L2(D)

and

Y ⊂ L2(U)

are the input and output function spaces. In the following, we will focus on the case,

where

¯u7→ G(¯u)

maps initial data

¯u

to the solution at some time

t > 0

, of an underlying time-dependent PDE. We

assume the input ¯uto be sampled from a probability measure µ∈Prob(X)

DeepONet.

DeepONet (Lu et al., 2021) will be our prototype for operator learning frameworks with linear reconstruc-

tion. To deﬁne them, let

x:= (x1, . . . , xm)∈D

be a ﬁxed set of sensor points. Given an input function

¯u∈ X

, we en-

code it by the point values

E(¯u) = (¯u(x1),...,¯u(xm)) ∈Rm

. DeepONet is formulated in terms of two neural networks:

The ﬁrst is the

branch-net β

, which maps the point values

E(¯u)

to coefﬁcients

β(E(¯u)) = (β1(E(¯u)), . . . , βp(E(¯u))

resulting in a mapping

β:Rm→Rp,E(¯u)7→ (β1(E(¯u)), . . . , βp(E(¯u)).(2.1)

The second neural network is the so-called trunk-net τ(y)=(τ1(y), . . . , τp(y)), which is used to deﬁne a mapping

τ:U→Rp, y 7→ (τ1(y), . . . , τp(y)).(2.2)

While the branch net provides the coefﬁcients, the trunk net provides the “basis” functions in an expansion of the output

function of the form

NDON(¯u)(y) =

k=1

βk(¯u)τk(y),¯u∈ X, y ∈U, (2.3)

with βk(¯u) = βk(E(¯u)). The resulting mapping NDON :X → Y,¯u7→ NDON(¯u)is a DeepONet.

Although DeepONet were shown to be universal in the class of measurable operators (Lanthaler et al., 2022), the

following fundamental lower bound on the approximation error was also established,

Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT

Proposition 2.1

(Lanthaler et al. (2022, Thm. 3.4))

Let

be a separable Banach space,

a separable Hilbert space,

and let

be a probability measure on

. Let

G:X → Y

be a Borel measurable operator with

E¯u∼µ[kG(¯u)k2

Y]<∞

Then the following lower approximation bound holds for any DeepONet NDON with trunk-/branch-net dimension p:

E(NDON) = E¯u∼µkNDON(¯u)− G(¯u)k2

Y1/2≥Eopt := sX

j>p

λj,(2.4)

where the optimal error

Eopt

is written in terms of the eigenvalues

λ1≥λ2≥. . .

of the covariance operator

ΓG#µ:= Eu∼G#µ[(u⊗u)] of the push-forward measure G#µ.

We refer to

A for relevant background on the underlying principal component analysis (PCA) and covariance

operators. The same lower bound

(2.4)

in fact holds for any operator approximation of the form

N(¯u) = Pp

k=1 βk(¯u)τk

where

βk:X → R

are arbitrary functionals. In particular, this bound continuous to hold for e.g. the PCA-Net

architecture of Hesthaven & Ubbiali (2018); Bhattacharya et al. (2021). We will refer to any operator learning

architecture of this form as a method with “linear reconstruction”, since the output function

N(¯u)

is restricted to the

linear

-dimensional space spanned by the

τ1, . . . , τp∈ Y

. In particular, DeepONet are based on linear reconstruction.

shift-DeepONet.

The lower bound

(2.4)

shows that there are fundamental barriers to the expressive power of operator

learning methods based on linear reconstruction. This is of particular relevance for problems in which the optimal lower

bound

Eopt

(2.4)

exhibits a slow decay in terms of the number of basis functions

, due to the slow decay of the

eigenvalues

λj

of the covariance operator. It is well-known that even linear advection- or transport-dominated problems

can suffer from such a slow decay of the eigenvalues (Ohlberger & Rave, 2013; Dahmen et al., 2014; Taddei et al.,

2015; Peherstorfer, 2020), which could hinder the application of linear-reconstruction based operator learning methods

to this very important class of problems. In view of these observations, it is thus desirable to develop a non-linear

variant of DeepONet which can overcome such a lower bound in the context of transport-dominated problems. We

propose such an extension below.

Ashift-DeepONet NsDON :X → Y is an operator of the form

NsDON(¯u)(y) =

k=1

βk(¯u)τkAk(¯u)·y+γk(¯u),(2.5)

where the input function

¯u

is encoded by evaluation at the sensor points

E(¯u)∈Rm

. We retain the DeepONet

branch-

and

trunk-nets β

deﬁned in

(2.1)

(2.2)

, respectively, and we have introduced a

scale-net A= (Ak)p

k=1

, consisting

of matrix-valued functions

Ak:Rm→Rd0×d0,E(¯u)7→ Ak(¯u) := Ak(E(¯u)),

and a shift-net γ= (γk)p

k=1, with

γk:Rm→Rd0,E(¯u)7→ γk(¯u) := γk(E(¯u)),

All components of a shift-DeepONet are represented by deep neural networks, potentially with different activation

functions.

Since shift-DeepONets reduce to DeepONets for the particular choice

A≡1

and

γ≡0

, the universality of DeepONets

(Theorem 3.1 of Lanthaler et al. (2022)) is clearly inherited by shift-DeepONets. However, as shift-DeepONets do

not use a linear reconstruction (the trunk nets in

(2.5)

depend on the input through the scale and shift nets), the

lower bound

(2.4)

does not directly apply, providing possible space for shift-DeepONet to efﬁciently approximate

transport-dominated problems, especially in the presence of discontinuities.

Fourier neural operators (FNO). A FNO NFNO (Li et al., 2021a) is a composition

NFNO :X 7→ Y :NFNO =Q◦ LL◦ ··· ◦ L1◦R, (2.6)

consisting of a ”lifting operator”

¯u(x)7→ R(¯u(x), x)

, where

is represented by a (shallow) neural network

Rdu×Rd→Rdv

with

the number of components of the input function,

the dimension of the domain and

the

”lifting dimension” (a hyperparameter), followed by Lhidden layers L`:v`(x)7→ v`+1(x)of the form

v`+1(x) = σW`·v`(x) + b`(x) + K`v`(x),

Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT

with

W`∈Rdv×dv

a weight matrix (residual connection),

x7→ b`(x)∈Rdv

a bias function and with a convolution

operator

K`v`(x) = ´Tdκ`(x−y)v`(y)dy

, expressed in terms of a (learnable) integral kernel

x7→ κ`(x)∈Rdv×dv

The output function is ﬁnally obtained by a linear projection layer vL+1(x)7→ NFNO(¯u)(x) = Q·vL+1(x).

The convolution operators

add the indispensable non-local dependence of the output on the input function. Given

values on an equidistant Cartesian grid, the evaluation of

K`v`

can be efﬁciently carried out in Fourier space based on

the discrete Fourier transform (DFT), leading to a representation

K`v`=F−1

NP`(k)· FNv`(k),

where

FNv`(k)

denotes the Fourier coefﬁcients of the DFT of

v`(x)

, computed based on the given

grid values

in each direction,

P`(k)∈Cdv×dv

is a complex Fourier multiplication matrix indexed by

k∈Zd

, and

F−1

denotes

the inverse DFT. In practice, only a ﬁnite number of Fourier modes can be computed, and hence we introduce a

hyperparameter

kmax ∈N

, such that the Fourier coefﬁcients of

b`(x)

as well as the Fourier multipliers,

b`(k)≡0

and

P`(k)≡0

, vanish whenever

|k|∞> kmax

. In particular, with ﬁxed

kmax

the DFT and its inverse can be efﬁciently

computed in

O((kmaxN)d)

operations (i.e. linear in the total number of grid points). The output space of FNO

(2.6)

manifestly non-linear as it is not spanned by a ﬁxed number of basis functions. Hence, FNO constitute a nonlinear

reconstruction method.

3 Theoretical Results.

Context.

Our aim in this section is to rigorously prove that the nonlinear reconstruction methods (shift-DeepONet,

FNO) efﬁciently approximate operators stemming from discontinuous solutions of PDEs whereas linear reconstruction

methods (DeepONet, PCA-Net) fail to do so. To this end, we follow standard practice in numerical analysis of

PDEs (Hesthaven, 2018) and choose two prototypical PDEs that are widely used to analyze numerical methods for

transport-dominated PDEs. These are the linear transport or advection equation and the nonlinear inviscid Burgers’

equation, which is the prototypical example for hyperbolic conservation laws. The exact operators and the corresponding

approximation results with both linear and nonlinear reconstruction methods are described below. The computational

complexity of the models is expressed in terms of hyperparameters such as the model size, which are described in detail

in SM B.

Linear Advection Equation. We consider the one-dimensional linear advection equation

∂tu+a∂xu= 0, u(·, t = 0) = ¯u(3.1)

on a

2π

-periodic domain

D=T

, with constant speed

a∈R

. The underlying operator is

Gadv :L1(T)∩L∞(T)→

L1(T)∩L∞(T)

¯u7→ Gadv(¯u) := u(·, T )

, obtained by solving the PDE

(3.1)

with initial data

¯u

up to some ﬁnal time

t=T

. We note that

X=L1(T)∩L∞(T)⊂L2(T)

. As input measure

µ∈Prob(X)

, we consider random input

functions ¯u∼µgiven by the square (box) wave of height h, width wand centered at ξ,

¯u(x) = h1[−w/2,+w/2](x−ξ),(3.2)

where

h∈[h, ¯

w∈[w, ¯w]ξ∈[0,2π]

are independent and uniformly identically distributed. The constants

0< h ≤¯

h,0< w ≤¯ware ﬁxed.

DeepONet fails at approximating Gadv efﬁciently.

Our ﬁrst rigorous result is the following lower bound on the

error incurred by DeepONet (2.3) in approximating Gadv,

Theorem 3.1.

Let

p, m ∈N

. There exists a constant

C > 0

, independent of

and

, such that for any DeepONet

NDON (2.3), with sup¯u∼µkNDON(¯u)kL∞≤M < ∞, we have the lower bound

E=E¯u∼µkGadv(¯u)− NDON(¯u)kL1≥C

min(m, p).

Consequently, to achieve

E(NDON)≤

with DeepONet, we need

p, m &−1

trunk and branch net basis functions

and sensor points, respectively, entailing that size(NDON)&pm &−2(cp. SM B).

The detailed proof is presented in

C.2. It relies on two facts. First, following Lanthaler et al. (2022), one observes

that translation invariance of the problem implies that the Fourier basis is optimal for spanning the output space. As the

underlying functions are discontinuous, the corresponding eigenvalues of the covariance operator for the push-forward

measure decay, at most, quadratically in

. Consequently, the lower bound

(2.4)

leads to a linear decay of error in terms

of the number of trunk net basis functions. Second, roughly speaking, the linear decay of error in terms of sensor points

is a consequence of the fact that one needs sufﬁcient number of sensor points to resolve the underlying discontinuous

inputs.

Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT

Shift-DeepONet approximates Gadv efﬁciently.

Next and in contrast to the previous result on DeepONet, we have

following efﬁcient approximation result for shift-DeepONet (2.5),

Theorem 3.2.

There exists a constant

C > 0

, such that for any

 > 0

, there exists a shift-DeepONet

NsDON

(2.5)

such

that

E=E¯u∼µkGadv(¯u)− NsDON

(¯u)kL1≤, (3.3)

with uniformly bounded p≤C, and with the number of sensor points m≤C−1. Furthermore, we have

width(NsDON

)≤C, depth(NsDON

)≤Clog(−1)2,size(NsDON

)≤C−1.

The detailed proof, presented in

C.3, is based on the fact that for each input, the exact solution can be completely

determined in terms of three variables, i.e., the height

, width

and shift

of the box wave

(3.2)

. Given an input

¯u

, we

explicitly construct neural networks for inferring each of these variables with high accuracy. These neural networks are

then combined together to yield a shift-DeepONet that approximates

Gadv

, with the desired complexity. The nonlinear

dependence of the trunk net in shift-DeepONet

(2.5)

on the input is the key to encode the shift in the box-wave

(3.2)

and this demonstrates the necessity of nonlinear reconstruction in this context.

FNO approximates Gadv efﬁciently.

Finally, we state an efﬁcient approximation result for

Gadv

with FNO

(2.6)

below,

Theorem 3.3.For any  > 0, there exists an FNO NFNO

(2.6), such that

E¯u∼µkGadv(¯u)− NFNO

(¯u)kL1≤,

with grid size N≤C−1, and with Fourier cut-off kmax, lifting dimension dv, depth and size:

kmax = 1, dv≤C, depth(NFNO

)≤Clog(−1)2,size(NFNO

)≤Clog(−1)2.

A priori, one recognizes that

Gadv

can be represented by Fourier multipliers (see

C.4). Consequently, a single linear

FNO layer would in principle sufﬁce in approximating

Gadv

. However, the size of this FNO would be exponentially

larger than the bound in Theorem 3.3. To obtain a more efﬁcient approximation, one needs to leverage the non-linear

reconstruction within FNO layers. This is provided in the proof, presented in

C.4, where the underlying height,

wave and shift of the box-wave inputs

(3.2)

are approximated with high accuracy by FNO layers. These are then

combined together with a novel representation formula for the solution to yield the desired FNO.

Comparison.

Observing the complexity bounds in Theorems 3.1, 3.2, 3.3, we note that the DeepONet size scales at

least quadratically,

size &−2

, in terms of the error in approximating

Gadv

, whereas for shift-DeepONet and FNO,

this scaling is only linear and logarithmic, respectively. Thus, we rigorously prove that for this problem, the nonlinear

reconstruction methods (FNO and shift-DeepONet) can be more efﬁcient than DeepONet and other methods based on

linear reconstruction. Moreover, FNO is shown to have a smaller approximation error than even shift-DeepONet for

similar model size.

Inviscid Burgers’ equation.

Next, we consider the inviscid Burgers’ equation in one-space dimension, which is

considered the prototypical example of nonlinear hyperbolic conservation laws (Dafermos, 2005):

∂tu+∂x1

2u2= 0, u(·, t = 0) = ¯u, (3.4)

on the

2π

-periodic domain

D=T

. It is well-known that discontinuities in the form of shock waves can appear

in ﬁnite-time even for smooth

¯u

. Consequently, solutions of

(3.4)

are interpreted in the sense of distributions and

entropy conditions are imposed to ensure uniqueness (Dafermos, 2005). Thus, the underlying solution operator is

GBurg :L1(T)∩L∞(T)→L1(T)∩L∞(T)

¯u7→ GBurg(¯u) := u(·, T )

, with

being the entropy solution of

(3.4)

ﬁnal time T. Given ξ∼Unif([0,2π]), we deﬁne the random ﬁeld

¯u(x) := −sin(x−ξ),(3.5)

and we deﬁne the input measure

µ∈Prob(L1(T)∩L∞(T))

as the law of

¯u

. We emphasize that the difﬁculty in

approximating the underlying operator

GBurg

arises even though the input functions are smooth, in fact analytic. This is

in contrast to the linear advection equation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NONLINEARRECONSTRUCTIONFOROPERATORLEARNINGOFPDESWITHDISCONTINUITIESAPREPRINTSamuelLanthalerComputingandMathematicalScienceCaliforniaInstituteofTechnologyPasadena,CA,USAslanth@caltech.eduRobertoMolinaroSeminarforAppliedMathematicsETHZurichZurich,Switzerlandroberto.molinaro@math.ethz.chPatrikHadornSem...

展开>> 收起<<

NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF PDE S WITH DISCONTINUITIES A P REPRINT.pdf

共40页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF PDE S WITH DISCONTINUITIES A P REPRINT

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: