Comparison of neural closure models for discretised PDEs Hugo Melchers3 Daan Crommelin12 Barry Koren3 Vlado Menkovski3 and Benjamin Sanderse1

2025-04-29 0 0 4.95MB 24 页 10玖币
侵权投诉
Comparison of neural closure models for discretised PDEs
Hugo Melchers3, Daan Crommelin1,2, Barry Koren3, Vlado Menkovski3, and Benjamin
Sanderse1,*
1Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The
Netherlands
2Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Science Park
105-107, 1098 XG, The Netherlands
3Eindhoven University of Technology, De Zaale, 5600 MB, Eindhoven, The Netherlands
*Corresponding author. E-mail address: mailto:b.sanderse@cwi.nl
May 19, 2023
Abstract
Neural closure models have recently been proposed as a method for efficiently approximating
small scales in multiscale systems with neural networks. The choice of loss function and associated
training procedure has a large effect on the accuracy and stability of the resulting neural closure
model. In this work, we systematically compare three distinct procedures: “derivative fitting”, “tra-
jectory fitting” with discretise-then-optimise, and “trajectory fitting” with optimise-then-discretise.
Derivative fitting is conceptually the simplest and computationally the most efficient approach and
is found to perform reasonably well on one of the test problems (Kuramoto-Sivashinsky) but poorly
on the other (Burgers). Trajectory fitting is computationally more expensive but is more robust
and is therefore the preferred approach. Of the two trajectory fitting procedures, the discretise-
then-optimise approach produces more accurate models than the optimise-then-discretise approach.
While the optimise-then-discretise approach can still produce accurate models, care must be taken
in choosing the length of the trajectories used for training, in order to train the models on long-term
behaviour while still producing reasonably accurate gradients during training. Two existing theo-
rems are interpreted in a novel way that gives insight into the long-term accuracy of a neural closure
model based on how accurate it is in the short term.
Abbreviations
AD Automatic differentiation
CNN Convolutional neural network
FOM Full-order model
LES Large eddy simulation
MOR Model order reduction
MSE Mean-square error
NN Neural network
ODE Ordinary differential equation
PDE Partial differential equation
POD Proper orthogonal decomposition
RANS Reynolds-averaged Navier-Stokes
RMSE Root-mean-square error
RNN Recurrent neural network
ROM Reduced-order modelling
VPT Valid prediction time
Work performed while at Centrum Wiskunde & Informatica
1
arXiv:2210.14675v2 [cs.LG] 18 May 2023
1 Introduction
A number of real-world phenomena, such as fluid flows, can be modelled numerically as a system of
partial differential equations (PDEs). Such PDEs are typically solved by discretising them in space,
yielding ordinary differential equations (ODEs) over a large number of variables. These full-order models
(FOMs) are generally very accurate, but can be computationally expensive to solve. A remedy against
this high computational cost is to use ‘truncated’ models. These do not directly resolve all spatial and/or
temporal scales of the true solution of the underlying PDE, thereby lowering the dimensionality of the
model. Approaches to create lower dimensional models include reduced-order modelling (ROM [31]), as
well as large eddy simulation (LES [30]) and Reynolds-averaged Navier-Stokes (RANS [2]) for fluid flow
problems, specifically. In such a truncated model, one or more closure terms appear, which represent
the effects that are not directly resolved by the reduced-order model. For a recent overview of closure
modelling for reduced-order models, see Ahmed et al. [1].
While closure terms can in some cases be derived from theory (for example, for LES), this is generally
not the case. When they cannot be derived from theory, a recent approach is to use a machine learning
model to learn the closure term from data. In this approach a specific type of machine learning model is
used, called a neural closure model [10]. The overall idea is to approximate a PDE or large ODE system
by a smaller ODE system, and to train a neural network to correct for the approximation error in the
resulting ODE system. Neural closure models are a special form of neural ODEs [4], which have been the
subject of extensive research in the past years, for example by Finlay et al. [7] and Massaroli et al. [21].
A number of different approaches for training neural ODEs and neural closure models are available.
An important distinction is between approaches that compare predicted and actual time-derivatives of
the ODE (“derivative fitting”), and approaches that compare predicted and actual solutions (“trajectory
fitting”). Trajectory fitting itself can be done in two ways, depending on whether the optimisation
problem for the neural network is formulated as continuous in time and then discretised using an ODE
solver (optimise-then-discretise), or formulated as discrete in time (discretise-then-optimise).
In several recent studies, neural closure models have been applied to fluid flow problems. The con-
sidered approaches include derivative fitting [9, 26, 20, 3], discretise-then-optimise [18], and optimise-
then-discretise [33, 20]. Derivative fitting is also used on a comparable but distinct problem by San
and Maulik [32]. There, Burgers’ equation is solved using model order reduction (MOR) by means of
proper orthogonal decomposition (POD), resulting in an approximate ODE for which the closure term
is approximated by a neural network.
Training neural ODEs efficiently and accurately has been the subject of some previous research.
However, in the context of neural closure models, most of this earlier work either does not consider
certain relevant aspects or is not directly applicable. For example, Onken and Ruthotto [24] compare
discretise-then-optimise and optimise-then-discretise for pure neural ODEs (i.e. ODEs in which the right-
hand side only consists of a neural network term). They omit a derivative fitting approach since such
an approach is not applicable in the contexts considered there. Ma et al. [19] compare a wide variety
of training approaches for neural ODEs, however with an emphasis on the computational efficiency of
different training approaches rather than on the accuracy of the resulting model. Roesch et al. [29]
compare trajectory fitting and derivative fitting approaches, considering pure neural ODEs on two very
small ODE systems. As a result, the papers mentioned above are not fully conclusive to make general
recommendations regarding how to train neural closure models.
The purpose of this paper is to perform a systematic comparison of different approaches for construct-
ing neural closure models. Compared to other works, the experiments performed here are not aimed
at showing the efficacy of neural closure models for a particular problem type, but rather at making
general recommendations regarding different approaches for neural closure models. To this end, neural
closure models are trained on data from two different discretised PDEs, in a variety of ways. One of
these PDEs, the Kuramoto-Sivashinsky equation, is chaotic and discretised into a stiff ODE system. This
gives rise to additional challenges when training neural closure models. The results of this paper confirm
that discretise-then-optimise approaches are generally preferable to optimise-then-discretise approaches.
Furthermore, derivative fitting is found to be unpredictable, producing excellent models on one test set,
but very poor models on the other. We give theoretical support to our results by reinterpreting two fun-
damental theorems from the fields of dynamical systems and time integration in terms of neural closure
models.
This paper is organised as follows: Section 2 describes a number of different approaches that are
available for training neural closure models. Section 3 gives a number of theoretical results that can
be used to predict the short-term and long-term accuracy of models based on how they are trained
2
and what error they achieve during training. Section 4 performs a number of numerical experiments
in which the same neural closure model is trained in multiple ways on the same two test equations,
and the accuracy of the resulting models is compared. Finally, Section 5 provides conclusions and
recommendations. The code used to perform the numerical experiments from Section 4 is available
online at https://github.com/HugoMelchers/neural-closure-models.
2 Preliminaries: approaches for neural ODEs
In this paper, neural networks will be used as closure models for discretised PDEs. Here, a time-evolution
of the form ∂ u
t =F(u) is discretised into an ODE system du
dt=f(u),uRNx, such that taking
progressively finer discretisations (resulting in larger values of Nx) produces more accurate solutions.
However, instead of taking a very fine discretisation, a relatively coarse discretisation will be used and
a neural network (NN) closure term will be added to correct for the spatial discretisation error. This
neural network depends not only on the vector u, but also on a vector ϑof trainable parameters:
du
dt=f(u) + NN(u;ϑ).(1)
Some of the theory regarding neural closure models also applies to neural ODEs, in which the neural
network is the only term in the right-hand side:
du
dt= NN(u;ϑ).(2)
In both cases, the result is a system of ODEs over a vector u(t), in which the right-hand side depends
not only on u(t) but also on some trainable parameters ϑ:
du
dt=g(u;ϑ).(3)
Note that in these models, the ODE is assumed to be autonomous, i.e., the right-hand side is assumed
to be independent of t. However, the work presented in this paper can be extended to non-autonomous
ODEs, by extending the neural network to depend on tor on some time-dependent control signal as
well as on u(t), and by including this additional data in the training data set. The general form (3)
covers more model types than just the neural ODEs and neural closure models of equations (1) and (2).
Specifically, the output of the neural network does not have to be one of the terms in the right-hand
side function, but can also be included in other ways. For example, Beck et al. [3] train neural networks
to predict the eddy viscosity in an LES closure term, rather than to predict the entire closure term, in
order to ensure stability of the resulting model.
In this work, the specific form (1) is used, with the exception that the output of the neural network
is passed through a simple linear function ∆fwd, listed as a non-trainable layer in Tables 4 and 5, which
ensures that the solutions of the neural ODE satisfy conservation of momentum.
Training a neural network corresponds to minimising a certain loss function, which must be chosen
ahead of time. Some loss functions are such that their gradients, which are used by the optimiser, can
be computed in different ways. In this section, an overview of different available approaches is given.
2.1 Derivative fitting
With derivative fitting, also referred to as non-intrusive training [28], the loss function compares the
predicted and actual time-derivatives (i.e. right-hand sides) of the neural ODE. In this paper, the loss
function used for derivative fitting will be a mean-square error (MSE):
Loss ϑ, uref,d
dturef=1
NxNsNp
Ns
X
i=1
Np
X
j=1
du(j)
ref
dt(ti)gu(j)
ref (ti); ϑ
2
2
.(4)
Here, Nxis the size of the vector uref,Nsis the number of snapshots in each trajectory of the training
data, and Npis the number of trajectories (i.e. ODE solutions). The value and time-derivative of the
jth trajectory at time tiare given by u(j)
ref (ti) and du(j)
ref
dt(ti), respectively.
The main advantage of derivative fitting is that in order to compute the gradient of the loss function
with respect to ϑ, one only has to differentiate through the neural network itself. This makes derivative
3
Figure 1: A visual comparison of the two types of neural ODE training: given a reference trajectory
(solid), one can train the neural ODE to match the time-derivative of the trajectory (dotted lines), or to
result in accurate ODE solutions (dashed line and arrows).
fitting a relatively simple approach to use. A disadvantage of derivative fitting is that the training data
must consist of not just the values u, but also their time derivatives du
dt. This data is not always available,
for example in cases where the trajectories u(t) are obtained as measurements from a physical experiment.
In this work, the training data is obtained through a high-resolution numerical algorithm. Hence, the
derivatives to be used for training are available. In cases where exact derivatives are not available, they
can be estimated from the available data for u(t) itself, as described by Roesch et al. [29]. While they
obtain good results with approximated derivatives, in general it is to be expected that substituting real
time-derivatives by approximations also decreases the accuracy of the neural network.
2.2 Trajectory fitting: Discretise-then-optimise
An alternative to derivative fitting is trajectory fitting, also referred to as intrusive training [28], em-
bedded training [20], or a solver-in-the-loop setup [37]. Here, the loss function compares the predicted
and actual trajectories of the neural ODE. Unless otherwise specified, trajectory fitting will also be done
with the MSE loss function:
Loss (ϑ, uref) = 1
NxNtNp
Nt
X
i=1
Np
X
j=1
u(j)(ti)u(j)
ref (ti)
2
2,(5)
where du(j)
dt=gu(j);ϑand u(j)(0) = u(j)
ref (0).(6)
Trajectory fitting involves applying an ODE solver to the neural closure model to perform Nttime steps,
where Ntis a hyper-parameter that must be chosen ahead of time. Computing the gradient of the loss
function involves differentiating through the ODE solving process and can be done in two separate ways.
One way to do this is by directly differentiating through the computations of the ODE solver. This
approach is called discretise-then-optimise.
In the discretise-then-optimise approach, the neural ODE is embedded in an ODE solver, for example
an explicit Runge-Kutta method. In such an ODE solver, the next solution snapshot u(t+ ∆t) is
computed from u(t) by performing one step of the ODE solver, which generally involves applying the
internal neural network multiple times (depending on the number of stages of the ODE solver). This
is repeated to obtain a predicted trajectory over a time interval of length T=Ntt. Since all the
computations done by an ODE solver are differentiable, one can simply compute the required gradient
by differentiating through all the time steps performed by the ODE solver. The discretise-then-optimise
approach effectively transforms a neural ODE into a discrete model, in which the time series is predicted
by advancing the solution by a fixed time step ∆tat a time. As such, any training approach that can
be applied to discrete models of the form u(t+ ∆t) = model(u(t)) can also be applied to neural ODEs
trained using this approach.
4
Table 1: An overview of the differences between the three training approaches outlined in Section 2.
Trajectory fitting
Derivative fitting Discretise-then-
optimise
Optimise-then-
discretise
Differentiability required NN NN, f, ODE solver NN, f
Accuracy of loss function gradients Exact Exact Approximate
Learns long-term accuracy No Yes Yes
Requires time-derivatives of training data Yes No No
Computational cost Low High High
2.3 Trajectory fitting: Optimise-then-discretise
Differentiating through the computations of the ODE solver is not always a possibility, for example if
the ODE solver is implemented as black-box software. In such cases, trajectory fitting with the loss
function (5) can still be used by computing gradients with the optimise-then-discretise approach. In
this approach, the required gradients are computed either by extending the ODE with more variables
that store derivative information, or by solving a second “adjoint” ODE backwards in time after the
original “forward” ODE solution is computed. These two methods can be considered continuous-time
analogues to forward-mode and reverse-mode automatic differentiation (AD), respectively.
The adjoint ODE approach was popularised by Chen et al. [4], who demonstrate that on some
problems the adjoint ODE approach can be used to train a neural ODE with much lower memory usage
than other approaches. Ma et al. [19] find that the adjoint ODE approach is computationally more
efficient than the forward mode approach for ODEs with more than 100 variables and parameters. As
such, a description of the forward mode approach is omitted here. The adjoint ODE approach is the
optimise-then-discretise approach that will be tested here. This approach can be implemented in three
different ways. The implementation used in this work is the interpolating adjoint method, in which the
gradient of the loss function is computed by first solving the forward ODE (3) to obtain the trajectory
u(t), and then solving the adjoint ODE system
d
dty>=y>
ug(u;ϑ),y(T) = 0,(7a)
d
dtz>=y>
ϑ g(u;ϑ),z(T) = 0,(7b)
from t=Tbackwards in time until t= 0, performing discrete updates to y(t) at times ti, i =
Nt, Nt1,...,2,1. After the adjoint ODE system is solved, the gradient dLoss
dϑis given by z(0). For
implementation details and an overview of other optimise-then-discretise methods, see Chapter 3 of
Melchers [22].
Note that the two trajectory fitting approaches, i.e. discretise-then-optimise and optimise-then-
discretise, both require choosing Nt, the number of time steps that the solution prediction is computed
for. As will be described in Section 3, choosing Nteither too small or too large may have negative conse-
quences for the accuracy of the trained model. For the optimise-then-discretise approach, the gradients
used by the optimiser are computed as the solution of an ODE over a time span of T=Ntt. Since the
numerically computed ODE solution is inexact, choosing a larger value of Ntwill generally result in less
accurate gradients, which may also decrease the accuracy of the trained model.
2.4 Algorithm comparison
An overview of the advantages and disadvantages of different approaches is given in Table 1. Here, the
term ‘long-term’ refers to the accuracy of predictions when solving the ODE over multiple time steps as
opposed to only considering the instantaneous error in the time-derivative of the solution. Note that the
computational cost will not be compared in this work; the goal is to compare the accuracy of the resulting
models. Performance measurements of different training procedures will not be given here since the code
used to perform the numerical experiments in this work was not written with computational efficiency in
mind, and since training was not performed on recent hardware. However, derivative fitting is expected
to be computationally more efficient due to the fact that it does not require differentiating through the
5
摘要:

ComparisonofneuralclosuremodelsfordiscretisedPDEsHugoMelchers*3,DaanCrommelin1,2,BarryKoren3,VladoMenkovski3,andBenjaminSanderse1,*1CentrumWiskunde&Informatica,SciencePark123,1098XGAmsterdam,TheNetherlands2Korteweg-deVriesInstituteforMathematics,UniversityofAmsterdam,SciencePark105-107,1098XG,TheNet...

展开>> 收起<<
Comparison of neural closure models for discretised PDEs Hugo Melchers3 Daan Crommelin12 Barry Koren3 Vlado Menkovski3 and Benjamin Sanderse1.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:4.95MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注