NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF PDE S WITH DISCONTINUITIES A P REPRINT

2025-05-02 0 0 2.35MB 40 页 10玖币
侵权投诉
NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF
PDES WITH DISCONTINUITIES
A PREPRINT
Samuel Lanthaler
Computing and Mathematical Science
California Institute of Technology
Pasadena, CA, USA
slanth@caltech.edu
Roberto Molinaro
Seminar for Applied Mathematics
ETH Zurich
Zurich, Switzerland
roberto.molinaro@math.ethz.ch
Patrik Hadorn
Seminar for Applied Mathematics
ETH Zurich
Zurich, Switzerland
Siddhartha Mishra
Seminar for Applied Mathematics
ETH Zurich
Zurich, Switzerland
siddhartha.mishra@math.ethz.ch
October 4, 2022
ABSTRACT
A large class of hyperbolic and advection-dominated PDEs can have solutions with discontinuities.
This paper investigates, both theoretically and empirically, the operator learning of PDEs with
discontinuous solutions. We rigorously prove, in terms of lower approximation bounds, that methods
which entail a linear reconstruction step (e.g. DeepONet or PCA-Net) fail to efficiently approximate
the solution operator of such PDEs. In contrast, we show that certain methods employing a non-
linear reconstruction mechanism can overcome these fundamental lower bounds and approximate
the underlying operator efficiently. The latter class includes Fourier Neural Operators and a novel
extension of DeepONet termed shift-DeepONet. Our theoretical findings are confirmed by empirical
results for advection equation, inviscid Burgers’ equation and compressible Euler equations of
aerodynamics.
1 Introduction
Many interesting phenomena in physics and engineering are described by partial differential equations (PDEs) with
discontinuous solutions. The most common types of such PDEs are nonlinear hyperbolic systems of conservation laws
(Dafermos, 2005), such as the Euler equations of aerodynamics, the shallow-water equations of oceanography and
MHD equations of plasma physics. It is well-known that solutions of these PDEs develop finite-time discontinuities
such as shock waves, even when the initial and boundary data are smooth. Other examples include the propagation of
waves with jumps in linear transport and wave equations, crack and fracture propagation in materials (Sun & Jin, 2012),
moving interfaces in multiphase flows (Drew & Passman, 1998) and motion of very sharp gradients as propagating
fronts and traveling wave solutions for reaction-diffusion equations (Smoller, 2012). Approximating such (propagating)
discontinuities in PDEs is considered to be extremely challenging for traditional numerical methods (Hesthaven, 2018)
as resolving them could require very small grid sizes. Although bespoke numerical methods such as high-resolution
finite-volume methods, discontinuous Galerkin finite-element and spectral viscosity methods (Hesthaven, 2018) have
successfully been used in this context, their very high computational cost prohibits their extensive use, particularly for
many-query problems such as UQ, optimal control and (Bayesian) inverse problems (Lye et al., 2020), necessitating the
design of fast machine learning-based surrogates.
As the task at hand in this context is to learn the underlying solution operator that maps input functions (initial and
boundary data) to output functions (solution at a given time), recently developed operator learning methods can be
arXiv:2210.01074v1 [cs.LG] 3 Oct 2022
Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT
employed in this infinite-dimensional setting (Higgins, 2021). These methods include operator networks (Chen &
Chen, 1995) and their deep version, DeepONet (Lu et al., 2019, 2021), where two sets of neural networks (branch and
trunk nets) are combined in a linear reconstruction procedure to obtain an infinite-dimensional output. DeepONets
have been very successfully used for different PDEs (Lu et al., 2021; Mao et al., 2020b; Cai et al., 2021; Lin et al.,
2021). An alternative framework is provided by neural operators (Kovachki et al., 2021a), wherein the affine functions
within DNN hidden layers are generalized to infinite-dimensions by replacing them with kernel integral operators as in
(Li et al., 2020a; Kovachki et al., 2021a; Li et al., 2020b). A computationally efficient form of neural operators is the
Fourier Neural Operator (FNO) (Li et al., 2021a), where a translation invariant kernel is evaluated in Fourier space,
leading to many successful applications for PDEs (Li et al., 2021a,b; Pathak et al., 2022).
Currently available theoretical results for operator learning (e.g. Lanthaler et al. (2022); Kovachki et al. (2021a,b);
De Ryck & Mishra (2022b); Deng et al. (2022)) leverage the regularity (or smoothness) of solutions of the PDE to
prove that frameworks such as DeepONet, FNO and their variants approximate the underlying operator efficiently.
Although such regularity holds for many elliptic and parabolic PDEs, it is obviously destroyed when discontinuities
appear in the solutions of the PDEs such as in the hyperbolic PDEs mentioned above. Thus, a priori, it is unclear if
existing operator learning frameworks can efficiently approximate PDEs with discontinuous solutions. This explains the
paucity of theoretical and (to a lesser extent) empirical work on operator learning of PDEs with discontinuous solutions
and provides the rationale for the current paper where,
using a lower bound, we rigorously prove approximation error estimates to show that operator learning
architectures such as DeepONet (Lu et al., 2021) and PCA-Net (Bhattacharya et al., 2021), which entail
alinear reconstruction step, fail to efficiently approximate solution operators of prototypical PDEs with
discontinuities. In particular, the approximation error only decays, at best, linearly in network size.
We rigorously prove that using a nonlinear reconstruction procedure within an operator learning architecture
can lead to the efficient approximation of prototypical PDEs with discontinuities. In particular, the approxima-
tion error can decay exponentially in network size, even after discontinuity formation. This result is shown for
two types of architectures with nonlinear reconstruction, namely the widely used Fourier Neural Operator
(FNO) of (Li et al., 2021a) and for a novel variant of DeepONet that we term as shift-DeepONet.
We supplement the theoretical results with extensive experiments where FNO and shift-DeepONet are shown
to consistently outperform DeepONet and other baselines for PDEs with discontinuous solutions such as linear
advection, inviscid Burgers’ equation, and both the one- and two-dimensional versions of the compressible
Euler equations of gas dynamics.
2 Methods
Setting.
Given compact domains
DRd
,
URd0
, we consider the approximation of operators
G:X → Y
, where
X L2(D)
and
Y L2(U)
are the input and output function spaces. In the following, we will focus on the case,
where
¯u7→ G(¯u)
maps initial data
¯u
to the solution at some time
t > 0
, of an underlying time-dependent PDE. We
assume the input ¯uto be sampled from a probability measure µProb(X)
DeepONet.
DeepONet (Lu et al., 2021) will be our prototype for operator learning frameworks with linear reconstruc-
tion. To define them, let
x:= (x1, . . . , xm)D
be a fixed set of sensor points. Given an input function
¯u∈ X
, we en-
code it by the point values
E(¯u) = (¯u(x1),...,¯u(xm)) Rm
. DeepONet is formulated in terms of two neural networks:
The first is the
branch-net β
, which maps the point values
E(¯u)
to coefficients
β(E(¯u)) = (β1(E(¯u)), . . . , βp(E(¯u))
,
resulting in a mapping
β:RmRp,E(¯u)7→ (β1(E(¯u)), . . . , βp(E(¯u)).(2.1)
The second neural network is the so-called trunk-net τ(y)=(τ1(y), . . . , τp(y)), which is used to define a mapping
τ:URp, y 7→ (τ1(y), . . . , τp(y)).(2.2)
While the branch net provides the coefficients, the trunk net provides the “basis” functions in an expansion of the output
function of the form
NDON(¯u)(y) =
p
X
k=1
βk(¯u)τk(y),¯u∈ X, y U, (2.3)
with βk(¯u) = βk(E(¯u)). The resulting mapping NDON :X → Y,¯u7→ NDON(¯u)is a DeepONet.
Although DeepONet were shown to be universal in the class of measurable operators (Lanthaler et al., 2022), the
following fundamental lower bound on the approximation error was also established,
2
Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT
Proposition 2.1
(Lanthaler et al. (2022, Thm. 3.4))
.
Let
X
be a separable Banach space,
Y
a separable Hilbert space,
and let
µ
be a probability measure on
X
. Let
G:X → Y
be a Borel measurable operator with
E¯uµ[kG(¯u)k2
Y]<
.
Then the following lower approximation bound holds for any DeepONet NDON with trunk-/branch-net dimension p:
E(NDON) = E¯uµkNDON(¯u)− G(¯u)k2
Y1/2Eopt := sX
j>p
λj,(2.4)
where the optimal error
Eopt
is written in terms of the eigenvalues
λ1λ2. . .
of the covariance operator
ΓG#µ:= Eu∼G#µ[(uu)] of the push-forward measure G#µ.
We refer to
SM
A for relevant background on the underlying principal component analysis (PCA) and covariance
operators. The same lower bound
(2.4)
in fact holds for any operator approximation of the form
N(¯u) = Pp
k=1 βk(¯u)τk
,
where
βk:X R
are arbitrary functionals. In particular, this bound continuous to hold for e.g. the PCA-Net
architecture of Hesthaven & Ubbiali (2018); Bhattacharya et al. (2021). We will refer to any operator learning
architecture of this form as a method with “linear reconstruction”, since the output function
N(¯u)
is restricted to the
linear
p
-dimensional space spanned by the
τ1, . . . , τp∈ Y
. In particular, DeepONet are based on linear reconstruction.
shift-DeepONet.
The lower bound
(2.4)
shows that there are fundamental barriers to the expressive power of operator
learning methods based on linear reconstruction. This is of particular relevance for problems in which the optimal lower
bound
Eopt
in
(2.4)
exhibits a slow decay in terms of the number of basis functions
p
, due to the slow decay of the
eigenvalues
λj
of the covariance operator. It is well-known that even linear advection- or transport-dominated problems
can suffer from such a slow decay of the eigenvalues (Ohlberger & Rave, 2013; Dahmen et al., 2014; Taddei et al.,
2015; Peherstorfer, 2020), which could hinder the application of linear-reconstruction based operator learning methods
to this very important class of problems. In view of these observations, it is thus desirable to develop a non-linear
variant of DeepONet which can overcome such a lower bound in the context of transport-dominated problems. We
propose such an extension below.
Ashift-DeepONet NsDON :X → Y is an operator of the form
NsDON(¯u)(y) =
p
X
k=1
βk(¯u)τkAk(¯u)·y+γk(¯u),(2.5)
where the input function
¯u
is encoded by evaluation at the sensor points
E(¯u)Rm
. We retain the DeepONet
branch-
and
trunk-nets β
,
τ
defined in
(2.1)
,
(2.2)
, respectively, and we have introduced a
scale-net A= (Ak)p
k=1
, consisting
of matrix-valued functions
Ak:RmRd0×d0,E(¯u)7→ Ak(¯u) := Ak(E(¯u)),
and a shift-net γ= (γk)p
k=1, with
γk:RmRd0,E(¯u)7→ γk(¯u) := γk(E(¯u)),
All components of a shift-DeepONet are represented by deep neural networks, potentially with different activation
functions.
Since shift-DeepONets reduce to DeepONets for the particular choice
A1
and
γ0
, the universality of DeepONets
(Theorem 3.1 of Lanthaler et al. (2022)) is clearly inherited by shift-DeepONets. However, as shift-DeepONets do
not use a linear reconstruction (the trunk nets in
(2.5)
depend on the input through the scale and shift nets), the
lower bound
(2.4)
does not directly apply, providing possible space for shift-DeepONet to efficiently approximate
transport-dominated problems, especially in the presence of discontinuities.
Fourier neural operators (FNO). A FNO NFNO (Li et al., 2021a) is a composition
NFNO :X 7→ Y :NFNO =Q◦ LL◦ ··· ◦ L1R, (2.6)
consisting of a ”lifting operator”
¯u(x)7→ R(¯u(x), x)
, where
R
is represented by a (shallow) neural network
R:
Rdu×RdRdv
with
du
the number of components of the input function,
d
the dimension of the domain and
dv
the
”lifting dimension” (a hyperparameter), followed by Lhidden layers L`:v`(x)7→ v`+1(x)of the form
v`+1(x) = σW`·v`(x) + b`(x) + K`v`(x),
3
Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT
with
W`Rdv×dv
a weight matrix (residual connection),
x7→ b`(x)Rdv
a bias function and with a convolution
operator
K`v`(x) = ´Tdκ`(xy)v`(y)dy
, expressed in terms of a (learnable) integral kernel
x7→ κ`(x)Rdv×dv
.
The output function is finally obtained by a linear projection layer vL+1(x)7→ NFNO(¯u)(x) = Q·vL+1(x).
The convolution operators
K`
add the indispensable non-local dependence of the output on the input function. Given
values on an equidistant Cartesian grid, the evaluation of
K`v`
can be efficiently carried out in Fourier space based on
the discrete Fourier transform (DFT), leading to a representation
K`v`=F1
NP`(k)· FNv`(k),
where
FNv`(k)
denotes the Fourier coefficients of the DFT of
v`(x)
, computed based on the given
N
grid values
in each direction,
P`(k)Cdv×dv
is a complex Fourier multiplication matrix indexed by
kZd
, and
F1
N
denotes
the inverse DFT. In practice, only a finite number of Fourier modes can be computed, and hence we introduce a
hyperparameter
kmax N
, such that the Fourier coefficients of
b`(x)
as well as the Fourier multipliers,
b
b`(k)0
and
P`(k)0
, vanish whenever
|k|> kmax
. In particular, with fixed
kmax
the DFT and its inverse can be efficiently
computed in
O((kmaxN)d)
operations (i.e. linear in the total number of grid points). The output space of FNO
(2.6)
is
manifestly non-linear as it is not spanned by a fixed number of basis functions. Hence, FNO constitute a nonlinear
reconstruction method.
3 Theoretical Results.
Context.
Our aim in this section is to rigorously prove that the nonlinear reconstruction methods (shift-DeepONet,
FNO) efficiently approximate operators stemming from discontinuous solutions of PDEs whereas linear reconstruction
methods (DeepONet, PCA-Net) fail to do so. To this end, we follow standard practice in numerical analysis of
PDEs (Hesthaven, 2018) and choose two prototypical PDEs that are widely used to analyze numerical methods for
transport-dominated PDEs. These are the linear transport or advection equation and the nonlinear inviscid Burgers’
equation, which is the prototypical example for hyperbolic conservation laws. The exact operators and the corresponding
approximation results with both linear and nonlinear reconstruction methods are described below. The computational
complexity of the models is expressed in terms of hyperparameters such as the model size, which are described in detail
in SM B.
Linear Advection Equation. We consider the one-dimensional linear advection equation
tu+a∂xu= 0, u(·, t = 0) = ¯u(3.1)
on a
2π
-periodic domain
D=T
, with constant speed
aR
. The underlying operator is
Gadv :L1(T)L(T)
L1(T)L(T)
,
¯u7→ Gadv(¯u) := u(·, T )
, obtained by solving the PDE
(3.1)
with initial data
¯u
up to some final time
t=T
. We note that
X=L1(T)L(T)L2(T)
. As input measure
µProb(X)
, we consider random input
functions ¯uµgiven by the square (box) wave of height h, width wand centered at ξ,
¯u(x) = h1[w/2,+w/2](xξ),(3.2)
where
h[h, ¯
h]
,
w[w, ¯w]ξ[0,2π]
are independent and uniformly identically distributed. The constants
0< h ¯
h,0< w ¯ware fixed.
DeepONet fails at approximating Gadv efficiently.
Our first rigorous result is the following lower bound on the
error incurred by DeepONet (2.3) in approximating Gadv,
Theorem 3.1.
Let
p, m N
. There exists a constant
C > 0
, independent of
m
and
p
, such that for any DeepONet
NDON (2.3), with sup¯uµkNDON(¯u)kLM < , we have the lower bound
E=E¯uµkGadv(¯u)− NDON(¯u)kL1C
min(m, p).
Consequently, to achieve
E(NDON)
with DeepONet, we need
p, m &1
trunk and branch net basis functions
and sensor points, respectively, entailing that size(NDON)&pm &2(cp. SM B).
The detailed proof is presented in
SM
C.2. It relies on two facts. First, following Lanthaler et al. (2022), one observes
that translation invariance of the problem implies that the Fourier basis is optimal for spanning the output space. As the
underlying functions are discontinuous, the corresponding eigenvalues of the covariance operator for the push-forward
measure decay, at most, quadratically in
p
. Consequently, the lower bound
(2.4)
leads to a linear decay of error in terms
of the number of trunk net basis functions. Second, roughly speaking, the linear decay of error in terms of sensor points
is a consequence of the fact that one needs sufficient number of sensor points to resolve the underlying discontinuous
inputs.
4
Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT
Shift-DeepONet approximates Gadv efficiently.
Next and in contrast to the previous result on DeepONet, we have
following efficient approximation result for shift-DeepONet (2.5),
Theorem 3.2.
There exists a constant
C > 0
, such that for any
 > 0
, there exists a shift-DeepONet
NsDON
(2.5)
such
that
E=E¯uµkGadv(¯u)− NsDON
(¯u)kL1, (3.3)
with uniformly bounded pC, and with the number of sensor points mC1. Furthermore, we have
width(NsDON
)C, depth(NsDON
)Clog(1)2,size(NsDON
)C1.
The detailed proof, presented in
SM
C.3, is based on the fact that for each input, the exact solution can be completely
determined in terms of three variables, i.e., the height
h
, width
w
and shift
ξ
of the box wave
(3.2)
. Given an input
¯u
, we
explicitly construct neural networks for inferring each of these variables with high accuracy. These neural networks are
then combined together to yield a shift-DeepONet that approximates
Gadv
, with the desired complexity. The nonlinear
dependence of the trunk net in shift-DeepONet
(2.5)
on the input is the key to encode the shift in the box-wave
(3.2)
and this demonstrates the necessity of nonlinear reconstruction in this context.
FNO approximates Gadv efficiently.
Finally, we state an efficient approximation result for
Gadv
with FNO
(2.6)
below,
Theorem 3.3.For any  > 0, there exists an FNO NFNO
(2.6), such that
E¯uµkGadv(¯u)− NFNO
(¯u)kL1,
with grid size NC1, and with Fourier cut-off kmax, lifting dimension dv, depth and size:
kmax = 1, dvC, depth(NFNO
)Clog(1)2,size(NFNO
)Clog(1)2.
A priori, one recognizes that
Gadv
can be represented by Fourier multipliers (see
SM
C.4). Consequently, a single linear
FNO layer would in principle suffice in approximating
Gadv
. However, the size of this FNO would be exponentially
larger than the bound in Theorem 3.3. To obtain a more efficient approximation, one needs to leverage the non-linear
reconstruction within FNO layers. This is provided in the proof, presented in
SM
C.4, where the underlying height,
wave and shift of the box-wave inputs
(3.2)
are approximated with high accuracy by FNO layers. These are then
combined together with a novel representation formula for the solution to yield the desired FNO.
Comparison.
Observing the complexity bounds in Theorems 3.1, 3.2, 3.3, we note that the DeepONet size scales at
least quadratically,
size &2
, in terms of the error in approximating
Gadv
, whereas for shift-DeepONet and FNO,
this scaling is only linear and logarithmic, respectively. Thus, we rigorously prove that for this problem, the nonlinear
reconstruction methods (FNO and shift-DeepONet) can be more efficient than DeepONet and other methods based on
linear reconstruction. Moreover, FNO is shown to have a smaller approximation error than even shift-DeepONet for
similar model size.
Inviscid Burgers’ equation.
Next, we consider the inviscid Burgers’ equation in one-space dimension, which is
considered the prototypical example of nonlinear hyperbolic conservation laws (Dafermos, 2005):
tu+x1
2u2= 0, u(·, t = 0) = ¯u, (3.4)
on the
2π
-periodic domain
D=T
. It is well-known that discontinuities in the form of shock waves can appear
in finite-time even for smooth
¯u
. Consequently, solutions of
(3.4)
are interpreted in the sense of distributions and
entropy conditions are imposed to ensure uniqueness (Dafermos, 2005). Thus, the underlying solution operator is
GBurg :L1(T)L(T)L1(T)L(T)
,
¯u7→ GBurg(¯u) := u(·, T )
, with
u
being the entropy solution of
(3.4)
at
final time T. Given ξUnif([0,2π]), we define the random field
¯u(x) := sin(xξ),(3.5)
and we define the input measure
µProb(L1(T)L(T))
as the law of
¯u
. We emphasize that the difficulty in
approximating the underlying operator
GBurg
arises even though the input functions are smooth, in fact analytic. This is
in contrast to the linear advection equation.
5
摘要:

NONLINEARRECONSTRUCTIONFOROPERATORLEARNINGOFPDESWITHDISCONTINUITIESAPREPRINTSamuelLanthalerComputingandMathematicalScienceCaliforniaInstituteofTechnologyPasadena,CA,USAslanth@caltech.eduRobertoMolinaroSeminarforAppliedMathematicsETHZurichZurich,Switzerlandroberto.molinaro@math.ethz.chPatrikHadornSem...

展开>> 收起<<
NONLINEAR RECONSTRUCTION FOR OPERATOR LEARNING OF PDE S WITH DISCONTINUITIES A P REPRINT.pdf

共40页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:40 页 大小:2.35MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 40
客服
关注