neural network architectures and minimizing the network energy, or (ii) treating the whole neural
network as the basic approximation unit, with parameters trained to minimize a specialized error
function that includes the differential equation itself as well as its boundary and initial conditions.
In the first category, neurons output the discretized solution values over a set of grid points, and
minimizing the network energy drives the neuronal values towards the solution at the mesh points.
The neural network energy is the residual of the finite discretization, summed over all neurons [
22
]. A
strong feature is the preservation of the finite discretization convergence; however, the computational
cost grows with increasing resolution and dimensionality. Early examples include [15, 10, 9].
The second strategy, proposed by Lagaris et al. [
21
], relies on the function approximation capabilities
of the neural networks. Encoding the solution everywhere in the domain within a neural network
offers a mesh-free, compact, and memory efficient surrogate model for the solution function, which
can be used in subsequent inference tasks. This method has recently re-emerged as the physics-
informed neural networks (PINNs) [
35
] and is widely used. Despite their advantages, these methods
lack the controllable convergence properties of traditional numerical discretizations, and are biased
towards the lower frequency features of the solutions [41, 34, 20].
Hybridization frameworks seek to combine the performance of neural network inference on modern
accelerated hardware with the guaranteed accuracy of traditional discretizations developed by the
scientific community. The hybridization efforts are algorithmic or architectural.
One important algorithmic method is the deep Galerkin method (DGM) [
38
], a neural network
extension of the mesh-free Galerkin method where the solution is represented as a deep neural
network rather than a linear combination of basis functions. Being mesh-free, it enables the solution
of high-dimensional problems by training the neural network model to satisfy the differential operator
and its initial and boundary conditions on a randomly sampled set of points, rather than on an
exponentially large grid. Although the number of points can be very large in higher dimensions,
the training is done sequentially on smaller batches of data points and second-order derivatives are
calculated by a scalable Monte Carlo method. Another important algorithmic method is the deep Ritz
[
42
]. It implements a deep neural network approximation of the trial function that is constrained by
numerical quadrature rules for the variational functional, followed by stochastic gradient descent.
Architectural hybridization is based on differentiable numerical linear algebra. One emerging
class involves implementing differentiable finite discretization solvers and embedding them in the
neural architectures that enable application of end-to-end differentiable gradient based optimization.
Differentiable solvers have been developed in JAX [
7
] for fluid dynamic problems, e.g.
Phi-Flow
[
17
],
JAX-CFD
[
19
], and
JAX-FLUIDS
[
2
]. These methods are suitable for inverse problems where
an unknown field is modeled by the neural network, while the model influence is propagated by the
differentiable solver into a measurable residual [
33
,
12
,
26
]. We also note the classic strategy for
solving inverse problems, the adjoint method, to obtain the gradient of the loss without differentiation
across the solver [
1
]; however, deriving analytic expression for the adjoint equations can be tedious
or impractical. Other notable use of differentiable solvers is to model and correct for the solution
errors of finite discretizations [40], and learning and controlling differentiable systems [13, 16].
Neural networks are not only universal approximators of continuous functions, but also of nonlin-
ear operators [
8
]. Although this fact has been leveraged using data-driven strategies for learning
differential operators by many authors [
25
,
3
,
24
,
23
], researchers have demonstrated the ability
of differentiable solvers to effectively train nonlinear operators without any data in a completely
physics-driven fashion, see section on learning inverse transforms in [33].
We propose a novel framework for solving PDEs based on deep neural networks, that enables lifting
any existing mesh-based finite discretization method off of its underlying grid and extend it into a
mesh-free and embarrassingly parallel method that can be applied to high dimensional problems on
unstructured random points. In addition, discontinuous solutions can be readily considered.
2 Problem statement
We illustrate our approach by considering a closed irregular interface (
Γ
) that partitions an interior
(
Ω−
) and an exterior (
Ω+
) subdomain (see figure 2). The coupled solution
u±∈Ω±
satisfy the
Helmholtz equation
k±u±− ∇ · (µ±∇u±) = f±
with jump conditions
[u] = α
and
[µ∂nu] = β
,
2