
2 A. A. POPOV AND A. SANDU
The multifidelity ensemble Kalman filter (MFEnKF) [7,23,25,26] circumvents numerical
difficulties present in the MLEnKF through a robust use of linear control variate theory. The
MFEnKF also extends the idea of model coarseness to arbitrary non-linear couplings between
high fidelity (fine level) and low fidelity (coarse level) model states, allowing the use of various
types of reduced order models (ROMs) to form a model hierarchy.
This work further extends the EnKF ideas and brings two novel contributions. (i) First,
it extends model hierarchies to model trees and model forests, covering the situation were the
collection of models cannot neatly form a model hierarchy. (ii) Second, it extends the multifi-
delity ensemble Kalman filter to the model forest Kalman filter allowing data assimilation to
make use of model forests in a rigorous way.
Given one high fidelity model and a collection of low fidelity models, it is not always
possible to organize them in a strict model hierarchy. Following this observation we introduce
the first key contribution of the this work (i); we generalize the idea of model hierarchies to
model trees, where one model is allowed to have multiple low fidelity models on the same level
below it; the low fidelity models are surrogates for the high fidelity one, but they may not
have a direct relationship with each other. This results in a tree structure of models with the
high fidelity model acting as the root. We further extend model trees by leveraging the idea of
model averaging [8]. Assuming that we have a collection of model trees, each with their own
high fidelity model at the root, we organize them in a “model forest” and build an averaging
procedure over all the trees in the forest.
By bringing together the ideas of the MFEnKF with that of model forests, we make the
second key contribution (ii) of this work; we replace the MFEnKF with the model forest
ensemble Kalman filter, which also has the acronym MFEnKF as we show that the former is
a special case of the latter.
Numerical tests on the Quasi-Geostrophic equations with a quadratic reduced order model
and an autoencoder-based surrogate show that our proposed extension significantly decreases
the number of high fidelity model runs required to achieve a certain level of analysis accuracy.
This paper is organized as follows. Relevant background information including the se-
quential data-assimilation problem, model hierarchies, model averages, and the multifidelity
ensemble Kalman filter are presented in Section 2. The extension of model hierarchies to
model trees, and the extension of model averages to model forests is described in Section 3.
Next the extension of the multifideity ensemble Kalman filter to the model forest Kalman
filter is explained in Section 4. The quasi-geostrophic equations and two surrogate models are
detailed in Section 5. Numerical experiments on various model trees and model forests are
presented in Section 6. Finally, some closing remarks are stated in Section 7.
2. Background. We review relevant background on data assimilation, including model
hierarchies, linear control variates, model averaging, and the multifidelity ensemble Kalman
filter.
2.1. Data Assimilation. Let Xt
idenote the state of some natural process at time ti, where
the superscript t represents ground-truth. Assume that we have some prior information about
this state represented by the distribution of the random variable Xb
i. Assume also that we