Comparison of neural closure models for discretised PDEs Hugo Melchers3 Daan Crommelin12 Barry Koren3 Vlado Menkovski3 and Benjamin Sanderse1

2025-04-29 0 0 4.95MB 24 页 10玖币

侵权投诉

Comparison of neural closure models for discretised PDEs

Hugo Melchers∗3, Daan Crommelin1,2, Barry Koren3, Vlado Menkovski3, and Benjamin

Sanderse1,*

1Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The

Netherlands

2Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Science Park

105-107, 1098 XG, The Netherlands

3Eindhoven University of Technology, De Zaale, 5600 MB, Eindhoven, The Netherlands

*Corresponding author. E-mail address: mailto:b.sanderse@cwi.nl

May 19, 2023

Abstract

Neural closure models have recently been proposed as a method for eﬃciently approximating

small scales in multiscale systems with neural networks. The choice of loss function and associated

training procedure has a large eﬀect on the accuracy and stability of the resulting neural closure

model. In this work, we systematically compare three distinct procedures: “derivative ﬁtting”, “tra-

jectory ﬁtting” with discretise-then-optimise, and “trajectory ﬁtting” with optimise-then-discretise.

Derivative ﬁtting is conceptually the simplest and computationally the most eﬃcient approach and

is found to perform reasonably well on one of the test problems (Kuramoto-Sivashinsky) but poorly

on the other (Burgers). Trajectory ﬁtting is computationally more expensive but is more robust

and is therefore the preferred approach. Of the two trajectory ﬁtting procedures, the discretise-

then-optimise approach produces more accurate models than the optimise-then-discretise approach.

While the optimise-then-discretise approach can still produce accurate models, care must be taken

in choosing the length of the trajectories used for training, in order to train the models on long-term

behaviour while still producing reasonably accurate gradients during training. Two existing theo-

rems are interpreted in a novel way that gives insight into the long-term accuracy of a neural closure

model based on how accurate it is in the short term.

Abbreviations

AD Automatic diﬀerentiation

CNN Convolutional neural network

FOM Full-order model

LES Large eddy simulation

MOR Model order reduction

MSE Mean-square error

NN Neural network

ODE Ordinary diﬀerential equation

PDE Partial diﬀerential equation

POD Proper orthogonal decomposition

RANS Reynolds-averaged Navier-Stokes

RMSE Root-mean-square error

RNN Recurrent neural network

ROM Reduced-order modelling

VPT Valid prediction time

∗Work performed while at Centrum Wiskunde & Informatica

arXiv:2210.14675v2 [cs.LG] 18 May 2023

1 Introduction

A number of real-world phenomena, such as ﬂuid ﬂows, can be modelled numerically as a system of

partial diﬀerential equations (PDEs). Such PDEs are typically solved by discretising them in space,

yielding ordinary diﬀerential equations (ODEs) over a large number of variables. These full-order models

(FOMs) are generally very accurate, but can be computationally expensive to solve. A remedy against

this high computational cost is to use ‘truncated’ models. These do not directly resolve all spatial and/or

temporal scales of the true solution of the underlying PDE, thereby lowering the dimensionality of the

model. Approaches to create lower dimensional models include reduced-order modelling (ROM [31]), as

well as large eddy simulation (LES [30]) and Reynolds-averaged Navier-Stokes (RANS [2]) for ﬂuid ﬂow

problems, speciﬁcally. In such a truncated model, one or more closure terms appear, which represent

the eﬀects that are not directly resolved by the reduced-order model. For a recent overview of closure

modelling for reduced-order models, see Ahmed et al. [1].

While closure terms can in some cases be derived from theory (for example, for LES), this is generally

not the case. When they cannot be derived from theory, a recent approach is to use a machine learning

model to learn the closure term from data. In this approach a speciﬁc type of machine learning model is

used, called a neural closure model [10]. The overall idea is to approximate a PDE or large ODE system

by a smaller ODE system, and to train a neural network to correct for the approximation error in the

resulting ODE system. Neural closure models are a special form of neural ODEs [4], which have been the

subject of extensive research in the past years, for example by Finlay et al. [7] and Massaroli et al. [21].

A number of diﬀerent approaches for training neural ODEs and neural closure models are available.

An important distinction is between approaches that compare predicted and actual time-derivatives of

the ODE (“derivative ﬁtting”), and approaches that compare predicted and actual solutions (“trajectory

ﬁtting”). Trajectory ﬁtting itself can be done in two ways, depending on whether the optimisation

problem for the neural network is formulated as continuous in time and then discretised using an ODE

solver (optimise-then-discretise), or formulated as discrete in time (discretise-then-optimise).

In several recent studies, neural closure models have been applied to ﬂuid ﬂow problems. The con-

sidered approaches include derivative ﬁtting [9, 26, 20, 3], discretise-then-optimise [18], and optimise-

then-discretise [33, 20]. Derivative ﬁtting is also used on a comparable but distinct problem by San

and Maulik [32]. There, Burgers’ equation is solved using model order reduction (MOR) by means of

proper orthogonal decomposition (POD), resulting in an approximate ODE for which the closure term

is approximated by a neural network.

Training neural ODEs eﬃciently and accurately has been the subject of some previous research.

However, in the context of neural closure models, most of this earlier work either does not consider

certain relevant aspects or is not directly applicable. For example, Onken and Ruthotto [24] compare

discretise-then-optimise and optimise-then-discretise for pure neural ODEs (i.e. ODEs in which the right-

hand side only consists of a neural network term). They omit a derivative ﬁtting approach since such

an approach is not applicable in the contexts considered there. Ma et al. [19] compare a wide variety

of training approaches for neural ODEs, however with an emphasis on the computational eﬃciency of

diﬀerent training approaches rather than on the accuracy of the resulting model. Roesch et al. [29]

compare trajectory ﬁtting and derivative ﬁtting approaches, considering pure neural ODEs on two very

small ODE systems. As a result, the papers mentioned above are not fully conclusive to make general

recommendations regarding how to train neural closure models.

The purpose of this paper is to perform a systematic comparison of diﬀerent approaches for construct-

ing neural closure models. Compared to other works, the experiments performed here are not aimed

at showing the eﬃcacy of neural closure models for a particular problem type, but rather at making

general recommendations regarding diﬀerent approaches for neural closure models. To this end, neural

closure models are trained on data from two diﬀerent discretised PDEs, in a variety of ways. One of

these PDEs, the Kuramoto-Sivashinsky equation, is chaotic and discretised into a stiﬀ ODE system. This

gives rise to additional challenges when training neural closure models. The results of this paper conﬁrm

that discretise-then-optimise approaches are generally preferable to optimise-then-discretise approaches.

Furthermore, derivative ﬁtting is found to be unpredictable, producing excellent models on one test set,

but very poor models on the other. We give theoretical support to our results by reinterpreting two fun-

damental theorems from the ﬁelds of dynamical systems and time integration in terms of neural closure

models.

This paper is organised as follows: Section 2 describes a number of diﬀerent approaches that are

available for training neural closure models. Section 3 gives a number of theoretical results that can

be used to predict the short-term and long-term accuracy of models based on how they are trained

and what error they achieve during training. Section 4 performs a number of numerical experiments

in which the same neural closure model is trained in multiple ways on the same two test equations,

and the accuracy of the resulting models is compared. Finally, Section 5 provides conclusions and

recommendations. The code used to perform the numerical experiments from Section 4 is available

online at https://github.com/HugoMelchers/neural-closure-models.

2 Preliminaries: approaches for neural ODEs

In this paper, neural networks will be used as closure models for discretised PDEs. Here, a time-evolution

of the form ∂ u

∂t =F(u) is discretised into an ODE system du

dt=f(u),u∈RNx, such that taking

progressively ﬁner discretisations (resulting in larger values of Nx) produces more accurate solutions.

However, instead of taking a very ﬁne discretisation, a relatively coarse discretisation will be used and

a neural network (NN) closure term will be added to correct for the spatial discretisation error. This

neural network depends not only on the vector u, but also on a vector ϑof trainable parameters:

dt=f(u) + NN(u;ϑ).(1)

Some of the theory regarding neural closure models also applies to neural ODEs, in which the neural

network is the only term in the right-hand side:

dt= NN(u;ϑ).(2)

In both cases, the result is a system of ODEs over a vector u(t), in which the right-hand side depends

not only on u(t) but also on some trainable parameters ϑ:

dt=g(u;ϑ).(3)

Note that in these models, the ODE is assumed to be autonomous, i.e., the right-hand side is assumed

to be independent of t. However, the work presented in this paper can be extended to non-autonomous

ODEs, by extending the neural network to depend on tor on some time-dependent control signal as

well as on u(t), and by including this additional data in the training data set. The general form (3)

covers more model types than just the neural ODEs and neural closure models of equations (1) and (2).

Speciﬁcally, the output of the neural network does not have to be one of the terms in the right-hand

side function, but can also be included in other ways. For example, Beck et al. [3] train neural networks

to predict the eddy viscosity in an LES closure term, rather than to predict the entire closure term, in

order to ensure stability of the resulting model.

In this work, the speciﬁc form (1) is used, with the exception that the output of the neural network

is passed through a simple linear function ∆fwd, listed as a non-trainable layer in Tables 4 and 5, which

ensures that the solutions of the neural ODE satisfy conservation of momentum.

Training a neural network corresponds to minimising a certain loss function, which must be chosen

ahead of time. Some loss functions are such that their gradients, which are used by the optimiser, can

be computed in diﬀerent ways. In this section, an overview of diﬀerent available approaches is given.

2.1 Derivative ﬁtting

With derivative ﬁtting, also referred to as non-intrusive training [28], the loss function compares the

predicted and actual time-derivatives (i.e. right-hand sides) of the neural ODE. In this paper, the loss

function used for derivative ﬁtting will be a mean-square error (MSE):

Loss ϑ, uref,d

dturef=1

NxNsNp

i=1

j=1 



du(j)

ref

dt(ti)−gu(j)

ref (ti); ϑ



.(4)

Here, Nxis the size of the vector uref,Nsis the number of snapshots in each trajectory of the training

data, and Npis the number of trajectories (i.e. ODE solutions). The value and time-derivative of the

jth trajectory at time tiare given by u(j)

ref (ti) and du(j)

ref

dt(ti), respectively.

The main advantage of derivative ﬁtting is that in order to compute the gradient of the loss function

with respect to ϑ, one only has to diﬀerentiate through the neural network itself. This makes derivative

Figure 1: A visual comparison of the two types of neural ODE training: given a reference trajectory

(solid), one can train the neural ODE to match the time-derivative of the trajectory (dotted lines), or to

result in accurate ODE solutions (dashed line and arrows).

ﬁtting a relatively simple approach to use. A disadvantage of derivative ﬁtting is that the training data

must consist of not just the values u, but also their time derivatives du

dt. This data is not always available,

for example in cases where the trajectories u(t) are obtained as measurements from a physical experiment.

In this work, the training data is obtained through a high-resolution numerical algorithm. Hence, the

derivatives to be used for training are available. In cases where exact derivatives are not available, they

can be estimated from the available data for u(t) itself, as described by Roesch et al. [29]. While they

obtain good results with approximated derivatives, in general it is to be expected that substituting real

time-derivatives by approximations also decreases the accuracy of the neural network.

2.2 Trajectory ﬁtting: Discretise-then-optimise

An alternative to derivative ﬁtting is trajectory ﬁtting, also referred to as intrusive training [28], em-

bedded training [20], or a solver-in-the-loop setup [37]. Here, the loss function compares the predicted

and actual trajectories of the neural ODE. Unless otherwise speciﬁed, trajectory ﬁtting will also be done

with the MSE loss function:

Loss (ϑ, uref) = 1

NxNtNp

i=1

j=1 



u(j)(ti)−u(j)

ref (ti)



2,(5)

where du(j)

dt=gu(j);ϑand u(j)(0) = u(j)

ref (0).(6)

Trajectory ﬁtting involves applying an ODE solver to the neural closure model to perform Nttime steps,

where Ntis a hyper-parameter that must be chosen ahead of time. Computing the gradient of the loss

function involves diﬀerentiating through the ODE solving process and can be done in two separate ways.

One way to do this is by directly diﬀerentiating through the computations of the ODE solver. This

approach is called discretise-then-optimise.

In the discretise-then-optimise approach, the neural ODE is embedded in an ODE solver, for example

an explicit Runge-Kutta method. In such an ODE solver, the next solution snapshot u(t+ ∆t) is

computed from u(t) by performing one step of the ODE solver, which generally involves applying the

internal neural network multiple times (depending on the number of stages of the ODE solver). This

is repeated to obtain a predicted trajectory over a time interval of length T=Nt∆t. Since all the

computations done by an ODE solver are diﬀerentiable, one can simply compute the required gradient

by diﬀerentiating through all the time steps performed by the ODE solver. The discretise-then-optimise

approach eﬀectively transforms a neural ODE into a discrete model, in which the time series is predicted

by advancing the solution by a ﬁxed time step ∆tat a time. As such, any training approach that can

be applied to discrete models of the form u(t+ ∆t) = model(u(t)) can also be applied to neural ODEs

trained using this approach.

Table 1: An overview of the diﬀerences between the three training approaches outlined in Section 2.

Trajectory ﬁtting

Derivative ﬁtting Discretise-then-

optimise

Optimise-then-

discretise

Diﬀerentiability required NN NN, f, ODE solver NN, f

Accuracy of loss function gradients Exact Exact Approximate

Learns long-term accuracy No Yes Yes

Requires time-derivatives of training data Yes No No

Computational cost Low High High

2.3 Trajectory ﬁtting: Optimise-then-discretise

Diﬀerentiating through the computations of the ODE solver is not always a possibility, for example if

the ODE solver is implemented as black-box software. In such cases, trajectory ﬁtting with the loss

function (5) can still be used by computing gradients with the optimise-then-discretise approach. In

this approach, the required gradients are computed either by extending the ODE with more variables

that store derivative information, or by solving a second “adjoint” ODE backwards in time after the

original “forward” ODE solution is computed. These two methods can be considered continuous-time

analogues to forward-mode and reverse-mode automatic diﬀerentiation (AD), respectively.

The adjoint ODE approach was popularised by Chen et al. [4], who demonstrate that on some

problems the adjoint ODE approach can be used to train a neural ODE with much lower memory usage

than other approaches. Ma et al. [19] ﬁnd that the adjoint ODE approach is computationally more

eﬃcient than the forward mode approach for ODEs with more than 100 variables and parameters. As

such, a description of the forward mode approach is omitted here. The adjoint ODE approach is the

optimise-then-discretise approach that will be tested here. This approach can be implemented in three

diﬀerent ways. The implementation used in this work is the interpolating adjoint method, in which the

gradient of the loss function is computed by ﬁrst solving the forward ODE (3) to obtain the trajectory

u(t), and then solving the adjoint ODE system

dty>=−y>∂

∂ug(u;ϑ),y(T) = 0,(7a)

dtz>=−y>∂

∂ϑ g(u;ϑ),z(T) = 0,(7b)

from t=Tbackwards in time until t= 0, performing discrete updates to y(t) at times ti, i =

Nt, Nt−1,...,2,1. After the adjoint ODE system is solved, the gradient dLoss

dϑis given by z(0). For

implementation details and an overview of other optimise-then-discretise methods, see Chapter 3 of

Melchers [22].

Note that the two trajectory ﬁtting approaches, i.e. discretise-then-optimise and optimise-then-

discretise, both require choosing Nt, the number of time steps that the solution prediction is computed

for. As will be described in Section 3, choosing Nteither too small or too large may have negative conse-

quences for the accuracy of the trained model. For the optimise-then-discretise approach, the gradients

used by the optimiser are computed as the solution of an ODE over a time span of T=Nt∆t. Since the

numerically computed ODE solution is inexact, choosing a larger value of Ntwill generally result in less

accurate gradients, which may also decrease the accuracy of the trained model.

2.4 Algorithm comparison

An overview of the advantages and disadvantages of diﬀerent approaches is given in Table 1. Here, the

term ‘long-term’ refers to the accuracy of predictions when solving the ODE over multiple time steps as

opposed to only considering the instantaneous error in the time-derivative of the solution. Note that the

computational cost will not be compared in this work; the goal is to compare the accuracy of the resulting

models. Performance measurements of diﬀerent training procedures will not be given here since the code

used to perform the numerical experiments in this work was not written with computational eﬃciency in

mind, and since training was not performed on recent hardware. However, derivative ﬁtting is expected

to be computationally more eﬃcient due to the fact that it does not require diﬀerentiating through the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ComparisonofneuralclosuremodelsfordiscretisedPDEsHugoMelchers*3,DaanCrommelin1,2,BarryKoren3,VladoMenkovski3,andBenjaminSanderse1,*1CentrumWiskunde&Informatica,SciencePark123,1098XGAmsterdam,TheNetherlands2Korteweg-deVriesInstituteforMathematics,UniversityofAmsterdam,SciencePark105-107,1098XG,TheNet...

展开>> 收起<<

Comparison of neural closure models for discretised PDEs Hugo Melchers3 Daan Crommelin12 Barry Koren3 Vlado Menkovski3 and Benjamin Sanderse1.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Comparison of neural closure models for discretised PDEs Hugo Melchers3 Daan Crommelin12 Barry Koren3 Vlado Menkovski3 and Benjamin Sanderse1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: