
Nonlinear Reconstruction for Operator Learning of PDEs with Discontinuities A PREPRINT
employed in this infinite-dimensional setting (Higgins, 2021). These methods include operator networks (Chen &
Chen, 1995) and their deep version, DeepONet (Lu et al., 2019, 2021), where two sets of neural networks (branch and
trunk nets) are combined in a linear reconstruction procedure to obtain an infinite-dimensional output. DeepONets
have been very successfully used for different PDEs (Lu et al., 2021; Mao et al., 2020b; Cai et al., 2021; Lin et al.,
2021). An alternative framework is provided by neural operators (Kovachki et al., 2021a), wherein the affine functions
within DNN hidden layers are generalized to infinite-dimensions by replacing them with kernel integral operators as in
(Li et al., 2020a; Kovachki et al., 2021a; Li et al., 2020b). A computationally efficient form of neural operators is the
Fourier Neural Operator (FNO) (Li et al., 2021a), where a translation invariant kernel is evaluated in Fourier space,
leading to many successful applications for PDEs (Li et al., 2021a,b; Pathak et al., 2022).
Currently available theoretical results for operator learning (e.g. Lanthaler et al. (2022); Kovachki et al. (2021a,b);
De Ryck & Mishra (2022b); Deng et al. (2022)) leverage the regularity (or smoothness) of solutions of the PDE to
prove that frameworks such as DeepONet, FNO and their variants approximate the underlying operator efficiently.
Although such regularity holds for many elliptic and parabolic PDEs, it is obviously destroyed when discontinuities
appear in the solutions of the PDEs such as in the hyperbolic PDEs mentioned above. Thus, a priori, it is unclear if
existing operator learning frameworks can efficiently approximate PDEs with discontinuous solutions. This explains the
paucity of theoretical and (to a lesser extent) empirical work on operator learning of PDEs with discontinuous solutions
and provides the rationale for the current paper where,
•
using a lower bound, we rigorously prove approximation error estimates to show that operator learning
architectures such as DeepONet (Lu et al., 2021) and PCA-Net (Bhattacharya et al., 2021), which entail
alinear reconstruction step, fail to efficiently approximate solution operators of prototypical PDEs with
discontinuities. In particular, the approximation error only decays, at best, linearly in network size.
•
We rigorously prove that using a nonlinear reconstruction procedure within an operator learning architecture
can lead to the efficient approximation of prototypical PDEs with discontinuities. In particular, the approxima-
tion error can decay exponentially in network size, even after discontinuity formation. This result is shown for
two types of architectures with nonlinear reconstruction, namely the widely used Fourier Neural Operator
(FNO) of (Li et al., 2021a) and for a novel variant of DeepONet that we term as shift-DeepONet.
•
We supplement the theoretical results with extensive experiments where FNO and shift-DeepONet are shown
to consistently outperform DeepONet and other baselines for PDEs with discontinuous solutions such as linear
advection, inviscid Burgers’ equation, and both the one- and two-dimensional versions of the compressible
Euler equations of gas dynamics.
2 Methods
Setting.
Given compact domains
D⊂Rd
,
U⊂Rd0
, we consider the approximation of operators
G:X → Y
, where
X ⊂ L2(D)
and
Y ⊂ L2(U)
are the input and output function spaces. In the following, we will focus on the case,
where
¯u7→ G(¯u)
maps initial data
¯u
to the solution at some time
t > 0
, of an underlying time-dependent PDE. We
assume the input ¯uto be sampled from a probability measure µ∈Prob(X)
DeepONet.
DeepONet (Lu et al., 2021) will be our prototype for operator learning frameworks with linear reconstruc-
tion. To define them, let
x:= (x1, . . . , xm)∈D
be a fixed set of sensor points. Given an input function
¯u∈ X
, we en-
code it by the point values
E(¯u) = (¯u(x1),...,¯u(xm)) ∈Rm
. DeepONet is formulated in terms of two neural networks:
The first is the
branch-net β
, which maps the point values
E(¯u)
to coefficients
β(E(¯u)) = (β1(E(¯u)), . . . , βp(E(¯u))
,
resulting in a mapping
β:Rm→Rp,E(¯u)7→ (β1(E(¯u)), . . . , βp(E(¯u)).(2.1)
The second neural network is the so-called trunk-net τ(y)=(τ1(y), . . . , τp(y)), which is used to define a mapping
τ:U→Rp, y 7→ (τ1(y), . . . , τp(y)).(2.2)
While the branch net provides the coefficients, the trunk net provides the “basis” functions in an expansion of the output
function of the form
NDON(¯u)(y) =
p
X
k=1
βk(¯u)τk(y),¯u∈ X, y ∈U, (2.3)
with βk(¯u) = βk(E(¯u)). The resulting mapping NDON :X → Y,¯u7→ NDON(¯u)is a DeepONet.
Although DeepONet were shown to be universal in the class of measurable operators (Lanthaler et al., 2022), the
following fundamental lower bound on the approximation error was also established,
2