Online model error correction with neural networks – preprint – October 26, 2022
show that the new method is effective and yields a more accurate model error correction than the usual offline learning
approach. The results show the potential of incorporating data assimilation and machine learning tightly, and pave the
way towards an application to the Integrated Forecasting System used for operational numerical weather prediction at
the European Centre for Medium-Range Weather Forecasts.
Key points
• Weak-constraint 4D-Var variants can be used to train neural networks for online model error correction.
• Online learning yields a more accurate model error correction than offline learning.
•
The new, simplified method, developed in the incremental 4D-Var framework, can be easily applied in
operational weather models.
1 Introduction: machine learning for model error correction
In the geosciences, data assimilation (DA) is used to increase the quality of forecasts by providing accurate initial
conditions (Kalnay,2003;Reich and Cotter,2015;Law et al.,2015;Asch et al.,2016;Carrassi et al.,2018;Evensen
et al.,2022). The initial conditions are obtained by combining all sources of information in a mathematically optimal
way, in particular information from the dynamical model and information from sparse and noisy observations. There
are two main classes of DA methods. In variational DA, the core of the methods is to minimise a cost function, usually
using gradient-based optimisation techniques, to estimate the system state. Examples include 3D- and 4D-Var. In
statistical DA, the methods relies on the sampled error statistics to perform sequential updates to the state estimation.
The most popular examples are the ensemble Kalman filter (EnKF) and the particle filter.
Most of the time, DA methods are applied with the perfect model assumption: this is called strong-constraint DA.
However, despite the significant effort provided by the modellers, geoscientific models remain affected by errors
(Dee,2005), for example due to unresolved small-scale processes. This is why there is a growing interest of the DA
community in weak-constraint (WC) methods, i.e. DA methods relaxing the perfect model assumption (Trémolet,2006).
This has led, for example, to the iterative ensemble Kalman filter in the presence of additive noise (Sakov et al.,2018)
in statistical DA, and to the forcing formulation of WC 4D-Var (Laloyaux et al.,2020a) in variational DA. In practice,
the DA control vector has to be extended to include the model error in addition to the system state. The downside of this
approach is the potentially significant increase of the problem’s dimension since the model trajectory is not anymore
described uniquely by the initial condition. By construction, WC 4D-Var is an online model error correction method,
meaning that the model error is estimated during the assimilation process and only valid for the states in the current
assimilation window.
In parallel, following the renewed impetus of machine learning (ML) applications (LeCun et al.,2015;Goodfellow
et al.,2016;Chollet,2018), data-driven approaches are more and more frequent in the geosciences. The goal of these
approaches (e.g., Brunton et al.,2016;Hamilton et al.,2016;Lguensat et al.,2017;Pathak et al.,2018a;Dueben and
Bauer,2018;Fablet et al.,2018;Scher and Messori,2019;Weyn et al.,2019;Arcomano et al.,2020, among many
others) is to learn a surrogate of the dynamical model using supervised learning, i.e. by minimising a loss function
which measures the discrepancy between the surrogate model predictions and an observation dataset. In order to take
into account sparse and noisy observations, ML techniques can be combined with DA (Abarbanel et al.,2018;Bocquet
et al.,2019;Brajard et al.,2020;Bocquet et al.,2020;Arcucci et al.,2021). The idea is to take the best of both worlds:
DA techniques are used to estimate the state of the system from the observations, and ML techniques are used to
estimate the surrogate model from the estimated state. In practice, the hybrid DA and ML methods can be used both for
full model emulation and model error correction (Rasp et al.,2018;Pathak et al.,2018b;Bolton and Zanna,2019;Jia
et al.,2019;Watson,2019;Bonavita and Laloyaux,2020;Brajard et al.,2021;Gagne et al.,2020;Wikner et al.,2020;
Farchi et al.,2021a,b;Chen et al.,2022). In the first case, the surrogate model is entirely learned from observations,
while in the latter case, the surrogate model is hybrid: a physical, knowledge-based model is corrected by a statistical
model, e.g. a neural network (NN), which is learned from observations. Even though from a technical point of view it
can arguably be more difficult to implement, model error correction has many advantages over full model emulation: by
leveraging the long history of numerical modelling, one can hope to end up with an easier learning problem (Watson,
2019;Farchi et al.,2021b).
Most of the current hybrid DA-ML methods use offline learning strategies: the surrogate model (or model error
correction) is learned using a large dataset of observations (or analyses) and should be generalisable to other situations
(i.e. outside the dataset). There are two main reasons for this choice. First, surrogate modelling requires a large amount
of data to provide accurate results – certainly more than what is available in a single assimilation update with online
2