1 Introduction
A number of real-world phenomena, such as fluid flows, can be modelled numerically as a system of
partial differential equations (PDEs). Such PDEs are typically solved by discretising them in space,
yielding ordinary differential equations (ODEs) over a large number of variables. These full-order models
(FOMs) are generally very accurate, but can be computationally expensive to solve. A remedy against
this high computational cost is to use ‘truncated’ models. These do not directly resolve all spatial and/or
temporal scales of the true solution of the underlying PDE, thereby lowering the dimensionality of the
model. Approaches to create lower dimensional models include reduced-order modelling (ROM [31]), as
well as large eddy simulation (LES [30]) and Reynolds-averaged Navier-Stokes (RANS [2]) for fluid flow
problems, specifically. In such a truncated model, one or more closure terms appear, which represent
the effects that are not directly resolved by the reduced-order model. For a recent overview of closure
modelling for reduced-order models, see Ahmed et al. [1].
While closure terms can in some cases be derived from theory (for example, for LES), this is generally
not the case. When they cannot be derived from theory, a recent approach is to use a machine learning
model to learn the closure term from data. In this approach a specific type of machine learning model is
used, called a neural closure model [10]. The overall idea is to approximate a PDE or large ODE system
by a smaller ODE system, and to train a neural network to correct for the approximation error in the
resulting ODE system. Neural closure models are a special form of neural ODEs [4], which have been the
subject of extensive research in the past years, for example by Finlay et al. [7] and Massaroli et al. [21].
A number of different approaches for training neural ODEs and neural closure models are available.
An important distinction is between approaches that compare predicted and actual time-derivatives of
the ODE (“derivative fitting”), and approaches that compare predicted and actual solutions (“trajectory
fitting”). Trajectory fitting itself can be done in two ways, depending on whether the optimisation
problem for the neural network is formulated as continuous in time and then discretised using an ODE
solver (optimise-then-discretise), or formulated as discrete in time (discretise-then-optimise).
In several recent studies, neural closure models have been applied to fluid flow problems. The con-
sidered approaches include derivative fitting [9, 26, 20, 3], discretise-then-optimise [18], and optimise-
then-discretise [33, 20]. Derivative fitting is also used on a comparable but distinct problem by San
and Maulik [32]. There, Burgers’ equation is solved using model order reduction (MOR) by means of
proper orthogonal decomposition (POD), resulting in an approximate ODE for which the closure term
is approximated by a neural network.
Training neural ODEs efficiently and accurately has been the subject of some previous research.
However, in the context of neural closure models, most of this earlier work either does not consider
certain relevant aspects or is not directly applicable. For example, Onken and Ruthotto [24] compare
discretise-then-optimise and optimise-then-discretise for pure neural ODEs (i.e. ODEs in which the right-
hand side only consists of a neural network term). They omit a derivative fitting approach since such
an approach is not applicable in the contexts considered there. Ma et al. [19] compare a wide variety
of training approaches for neural ODEs, however with an emphasis on the computational efficiency of
different training approaches rather than on the accuracy of the resulting model. Roesch et al. [29]
compare trajectory fitting and derivative fitting approaches, considering pure neural ODEs on two very
small ODE systems. As a result, the papers mentioned above are not fully conclusive to make general
recommendations regarding how to train neural closure models.
The purpose of this paper is to perform a systematic comparison of different approaches for construct-
ing neural closure models. Compared to other works, the experiments performed here are not aimed
at showing the efficacy of neural closure models for a particular problem type, but rather at making
general recommendations regarding different approaches for neural closure models. To this end, neural
closure models are trained on data from two different discretised PDEs, in a variety of ways. One of
these PDEs, the Kuramoto-Sivashinsky equation, is chaotic and discretised into a stiff ODE system. This
gives rise to additional challenges when training neural closure models. The results of this paper confirm
that discretise-then-optimise approaches are generally preferable to optimise-then-discretise approaches.
Furthermore, derivative fitting is found to be unpredictable, producing excellent models on one test set,
but very poor models on the other. We give theoretical support to our results by reinterpreting two fun-
damental theorems from the fields of dynamical systems and time integration in terms of neural closure
models.
This paper is organised as follows: Section 2 describes a number of different approaches that are
available for training neural closure models. Section 3 gives a number of theoretical results that can
be used to predict the short-term and long-term accuracy of models based on how they are trained
2