A Deep Fourier Residual Method for solving PDEs using Neural Networks Jamie M. Taylor1 David Pardo234 and Ignacio Muga25

2025-04-30 0 0 913.39KB 32 页 10玖币
侵权投诉
A Deep Fourier Residual Method for solving PDEs using
Neural Networks
Jamie M. Taylor1, David Pardo2,3,4, and Ignacio Muga2,5
1Department of Quantitative Methods, CUNEF University, Madrid, Spain
2Basque Center for Applied Mathematics (BCAM), Bilbao, Bizkaia, Spain
3University of the Basque Country (UPV/EHU), Leioa, Spain
4Ikerbasque (Basque Foundation for Sciences), Bilbao, Spain
5Instituto de Matem´aticas, Pontificia Universidad Cat´olica de Valpara´ıso, Chile.
Abstract
When using Neural Networks as trial functions to numerically solve PDEs, a key choice to be
made is the loss function to be minimised, which should ideally correspond to a norm of the error.
In multiple problems, this error norm coincides with–or is equivalent to–the H1-norm of the
residual; however, it is often difficult to accurately compute it. This work assumes rectangular
domains and proposes the use of a Discrete Sine/Cosine Transform to accurately and efficiently
compute the H1norm. The resulting Deep Fourier-based Residual (DFR) method efficiently
and accurately approximate solutions to PDEs. This is particularly useful when solutions lack
H2regularity and methods involving strong formulations of the PDE fail. We observe that the
H1-error is highly correlated with the discretised loss during training, which permits accurate
error estimation via the loss.
1 Introduction
The use of Deep Learning techniques employing Neural Networks (NNs) have been sucessful to
solve a wide range of data-based problems across fields such as image proccessing, healthcare,
and autonomous cars [1, 2, 20, 23, 34, 43, 52, 54]. Recently, there has been a surge of interest
in the use of neural networks as function spaces that can be employed to obtain numerical
solutions of Partial Differential Equations (PDEs) [5, 9, 37, 40, 47, 48]. Owing to the universal
approximation theorem, and variants in Sobolev spaces [16, 24, 25, 30], it is known that a
sufficiently wide or deep NN is able to approximate any given continuous function on a compact
domain with arbitrary accuracy, and thus they make suitable function spaces for solving PDEs.
The use of automatic differentiation (autodiff) [4] facilitates efficient numerical evaluation of
derivatives, which allows algorithmic differentiation of the neural network itself, as well as the
use of gradient-based optimisation techniques such as Stochastic Gradient Descent (SGD) [8]
and Adam [31] in order to minimise appropriate loss functions over a space of neural networks.
A quantitative version of the Universal Approximation Theorem [3] demostrates that NNs
can approximate without suffering the curse of dimensionality, requiring far fewer degrees of
freedom to approximate functions with high-dimensional inputs than classical piecewise-linear
function spaces, making them an attractive function space for solving PDEs, in particular,
in high-dimensional problems. Beyond solving single instances of PDEs, NNs have shown a
capacity to learn operators that solve families of parametrised PDEs, allowing rapid “online”
evaluation of solutions after an “offline” training of the network [13, 22, 32, 35, 36].
1
arXiv:2210.14129v1 [math.NA] 25 Oct 2022
The flexibility of NNs to solve many classes of PDEs is owed to a general and simple frame-
work, whereby one chooses an appropriate architecture of the NN, a loss function, whose min-
imiser should be an exact solution of the PDE, and an optimisation procedure to attempt to
minimise the loss function. In this article, we focus on the choice of loss function when solv-
ing PDEs with NN function spaces. Previous works have considered losses based on strong
[26, 44, 53] and weak [29, 28, 27] formulations of the PDE. However, the choice of a perfect loss
function is generally not obvious as in practice solutions will only reach local minima, and the
loss and error may have distinct or unknown convergence rates as one approaches either a local
minimiser or a practically unattainable global minimiser.
Generally, a PDE operator can be described by a (possibly nonlinear) map R:XY,
where X, Y are Banach spaces. The PDE then takes the form
R(u) = 0.(1)
For example, Poisson’s equation, u(x) = f(x), on a domain Ω with homogeneous Dirichlet
boundary condition and fL2(Ω) may be interpreted in strong form via the map
Rs:H2(Ω) H1
0(Ω) L2(Ω) given by
Rs(u) = ∆u+f, (2)
or in weak form via the map Rw:H1
0(Ω) H1(Ω) := [H1
0(Ω)]given by
hRw(u), viH1×H1=Zu(x)· ∇v(x)f(x)v(x)dx. (3)
When employing NNs to numerically solve PDEs, a loss is often selected as the norm of the
PDE residual in Y, that is,
L(u) := ||R(u)||Y.(4)
One is generally confronted with two issues within this framework. The first is that norms on
function spaces are generally given by integrals and thus a quadrature rule must be employed
in order to numerically approximate L(u). In contrast to polynomial-based function spaces, an
exact quadrature rule is generally unobtainable. Moreover, a poor choice of quadrature rule can
lead to a form of “overfitting” and poor approximation of solutions [46]. The second issue is
that in infinite dimensional spaces not all norms are equivalent, and thus the choice of norm on
Ycan directly affect the convergence of the error during training. Ideally, we should employ
norms on Xand Ywhich are compatible in the sense that the X-norm of the error is equivalent
to the Y-norm of the residual, leading to a residual minimisation method [9, 10, 15].
Related to this second issue, progress has been made in the direction of a priori and a
posteriori error estimates that allow estimation of the error via the loss [6, 7, 18, 38, 50, 51].
The works rely on coercivity-type estimates of the error in terms of the exact norms, as well as
a control of quadrature and training errors.
Many PDEs can be expressed in weak form via (1) with X=Uis a space of trial functions,
and Y=V, where Vis the space of test functions. That is,
hR(u), viV×V= 0 vV. (5)
We commonly consider cases where Rrepresents a linear and inhomogeneous PDE, and thus
may be expressed in the form
hR(u), viV×V=b(u, v)f(v),(6)
where fVand b:U×VRis a bilinear form. It is clear that the PDE, in weak form, is
equivalent to the statement that ||R(u)|| = 0 for any norm on V. The most natural norm is
the dual norm, induced by the norm on V, defined via
||f||V= sup
vV\{0}
|f(v)|
||v||V
.(7)
2
The advantage of employing the dual norm on Vis that, under certain assumptions that
we will outline in more detail in Section 3.1, one can relate the dual norm of the residual to the
norm of the error. Specifically, 1
M||R(u)||V≤ ||uu||U1
γ||R(u)||V, where uis a candidate
solution, uis the exact solution, and M, γ are positive, problem dependent, constants. This
allows ||R(u)||Vto be used as an error estimator, without needing to know the exact solution.
In addition, if we can find a way to numerically approximate the dual norm, we can employ this
as a loss function to be minimised over a trial function space.
We propose a Deep Fourier Residual (DFR) method to approximate the error of candidate
solutions of PDEs in H1via an approximation of the dual norm of the residual of the PDE
operator. The dual norm is then employed as a loss function to be minimised. The advantage
of such a method is that the resulting norm is equivalent to the H1-error of the solutions for
certain well-posed problems.
We consider several numerical examples, comparing the DFR approach to other losses em-
ployed to solve differential equations using NNs. Our numerical examples exhibit strong cor-
relation between the proposed loss and H1-error during the training process. For sufficiently
regular problems, our DFR method is qualitatively equivalent to existing methods in the litera-
ture (Section 4.1.2) [27, 44]. However, in less regular problems, our method leads to significantly
more accurate solutions, both for an equation that admits a smooth solution with large gradi-
ents (Section 4.1.3), and for an elliptic equation with discontinuous parameters (Section 4.1.4).
Indeed, methods based on the strong formulation of the PDE, such as PINNs [44], cannot be
implemented for such applications. The DFR method is shown to be advantageous both when
solutions admit H1\H2regularity, and in regular problems where the forcing term has a large
discrepency between its L2and H1norm. We then consider further numerical experiments
which demonstrate the DFR method’s capability in a linear equation with point source (Sec-
tion 4.2.1), a nonlinear ODE (Section 4.2.2), and a 2D linear problem (Section 4.2.3).
The DFR method is currently limited to rectangular domains where each face has either
a Dirichlet or a Neumann Boundary condition. We rely on a Fourier-type representation of
the H1norm that can be performed efficiently using the one-dimensional Discrete Cosine
Transform and Discrete Sine Transform (DCT/DST), which are based on the Fast Fourier
Transform (FFT), in each coordinate direction. Generally, an extension of our techniques to
PDEs on arbitrary domains Ω would require access to an orthonormal basis of H1(Ω), whose
obtention may prove more costly than solving the PDE itself. Furthermore, the DST/DCT
takes advantage of the FFT, which allows an inexpensive evaluation of the loss and would not
be available in general domains. A possibility for the extension of the DFR method to arbitrary
domains include methods analogous to embedded domain methods [19, 21, 33, 39, 41, 45, 49],
which embeds domains with complex geometry into a simpler fictious computational domain. It
is also possible to borrow ideas from Goal-Oriented adaptivity (e.g., [42]) to the proposed DFR
method, although this will be postponed for a future work.
The structure of the paper is as follows. In Section 2 we cover some preliminary concepts.
The theoretical groundwork for the definition of the DFR method is presented in Section 3, with
our proposed loss defined in Section 3.3. Section 4.1 contains numerical examples comparing our
proposed loss function with the VPINNs and collocation losses, which are roughly equivalent in
regular problems, but we will demonstrate that the DFR method greatly outperforms VPINNs
and PINNs when solutions are less regular. In Section 4.2 we consider further numerical ex-
periments that demonstrate the DFR in equations with a point source, nonlinearities, and 2D
results. Finally, concluding remarks are made in Section 5.
3
2 Preliminaries
2.1 Neural Networks
Neural networks are functions expressed as compositions of more elementary functions. In the
simplest case of a fully connected feed-forward NN, an M-layer neural network is described by
Mlayer functions, Li:RNiRNi+1 , that are of the form
Li(x) = σi(Aix+bi),(8)
where Aiis an Ni×Ni+1 matrix, biRNi+1 , and σiis an activation function that may depend
on the layer index iand acts component-wise on vectors. A fully-connected feed forward neural
network is a function ˜u:RN1RNM+1 defined by
˜u(x) = LMLM1. . . L1(x).(9)
The final activation function σMis taken to be the identity, σM(x) = x. The parameters Ai, bi,
known as the weights and biases of the network, parametrise the neural network. Optimisation
over a neural network space with fixed architecture corresponds to identifying the optimal values
of these trainable parameters.
In the context of NNs for PDEs, we often need to impose homogeneous Dirichlet boundary
conditions on our candidate solutions. In this work, we will do this by introducing a cutoff
function. That is, if we wish to consider functions u: Ω Rso that for a subset of the
boundary ΓDΩ, u|ΓD=u0, we take ˜uto be of the form (9), and define
u(x) = φ1(x)˜u(x) + φ2(x),(10)
where φ1is a function satisfying φ1|ΓD= 0 and φ1>0 on ¯
\ΓD, and φ2|ΓD=u0.
We include a schematic of this architecture in Figure 1
.
.
.
.
.
.
.
.
.
.
.
.
Input
z }| {
Hidden layers
z }| {
Trainable layers
| {z }
˜u
z }| {
Output
z }| {
Apply B.C.
x˜u(x)u(x)
...
...
...
Figure 1: NN architecture
2.2 PINN and VPINN losses
Whilst there any many discrete losses employed when solving PDEs via NNs, in this section we
outline two particular cases, the PINN (collocation), and the VPINN losses, which are based
on strong and weak formulations of the PDE, respectively. These methods will be used for
comparison in the numerical experiments of Section 4.1.
4
2.2.1 Collocation
We assume that the strong form of the residual can be represented in the form
Lu(x) =0 (xΩ),
Gu(x) =0 (xΩ).(11)
The collocation method considers discretisations of the L2norms of Luand Guas the loss
function to be minimised, according to an appropriate quadrature rule. Explicitly, we consider
the loss
Lcol(u) := 1
K1
K1
X
i=1
ωi|Lu(xi)|2+1
K2
K2
X
i=1
ωb
iGu(xb
i)
2,(12)
where (xi)K1
i=1 and (ωi)K1
i=1 are quadrature points in Ω and quadrature weights, respectively,
which may be taken via a Monte Carlo or a deterministic quadrature scheme. Similarly, (xb
i)K2
i=1
and (ωb
i)K2
i=1 are quadrature points and weights on the boundary.
In PDEs with low regularity, the strong form of the PDE does not hold and minimisers of
Eq. (12) will not accurately represent the PDE. Despite this limitation, the collocation (PINN)
method is one of the most attractive methods for regular problems as it is simple to implement
using autodiff algorithms. Furthermore, by using Monte Carlo integration techniques, integrals
can be estimated in high dimension without suffering from the curse of dimensionality.
2.2.2 VPINNs
VPINNs employ a loss that utilizes the weak formulation of the PDE. They correspond to a
Petrov-Galerkin method where the trial space is given by NNs. Given a set of test functions
(vk)K
k=1, a candidate solution uand the residual R(u)Vgiven in weak form, the loss is
defined as
LVP (u) =
K
X
k=1 |hR(u), vkiV×V|2.(13)
In [27], this method was shown to be advantageous over classical PINNs method, both
in terms of accuracy and speed. A particular application within their work, relevant to this
manuscript, was to consider ODEs on [0,1] with a NN architecture that consists of a single
hidden layer with sine activation function, and test functions vk(x) = sin(kπx). For this imple-
mentation, the authors were able to perform an exact quadrature to evaluate hR(u), vkiV×V,
which was employed in their loss function. In other implementations within their article, Leg-
endre polynomials are considered as test functions. Whilst not directly commented within their
work, in their implementation with sine test functions, the norm may be interpreted as a dis-
cretisation of the L2-norm of the strong form of the residual. As they consider the test functions
(vk)K
k=1 form to be a subset of an orthonormal basis of L2, if there exists a strong form residual
LuL2such that
hR(u), viV×V=hLu, viL2
for all vH1
0, we observe that
K
X
k=1 |hR(u), vkiV×V|2=
K
X
k=1hLu, vki2
L2≈ ||Lu||2
L2.(14)
In particular, for sufficiently regular problems, this implies that LVP and Lcol each correspond
to distinct discretisations of the same loss, i.e., the L2-norm of the strong-form residual. The
significant difference, however, is that the discretisation (13) is always well defined, even if the
residual cannot be represented by an L2function, and we will observe the consequences of this
distinction in Section 4.1.4, employing sine-based test functions, as in [27].
5
摘要:

ADeepFourierResidualMethodforsolvingPDEsusingNeuralNetworksJamieM.Taylor1,DavidPardo2,3,4,andIgnacioMuga2,51DepartmentofQuantitativeMethods,CUNEFUniversity,Madrid,Spain2BasqueCenterforAppliedMathematics(BCAM),Bilbao,Bizkaia,Spain3UniversityoftheBasqueCountry(UPV/EHU),Leioa,Spain4Ikerbasque(BasqueFou...

展开>> 收起<<
A Deep Fourier Residual Method for solving PDEs using Neural Networks Jamie M. Taylor1 David Pardo234 and Ignacio Muga25.pdf

共32页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:32 页 大小:913.39KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 32
客服
关注