Data assimilation methods for flow reconstruction can be classified based on a number of
criteria, perhaps most important among them in fundamental studies of turbulence is whether
the governing equations are enforced. If the governing equations are not enforced, a statistical
or dynamical model is adopted instead. For example, linear stochastic estimation [7, 8] uses
prior knowledge of two-point correlations to estimate the full velocity field from observations;
Extended Kalman filter provides an optimal update to the state by combining a prediction from
the linearized equations and the observations [9, 10], while ensemble Kalman filter advances
an ensemble of solutions and optimally weighs them based on the observations to estimate
the new state [11]. Some recent machine-learning algorithms also belong in this class, when
neural networks learn a mapping from limited measurements to full flow field, based on training
data from simulations and without direct enforcement of the Navier-Stokes equations. In such
instances, there is no guarantee that predictions, especially outside the training space, satisfy
the governing equations. For example, Fukami et al. [12] trained a convolutional neural network
to map from coarse-grained to full-resolution velocity fields based on training data from two-
dimensional homogeneous turbulence; In Gundersen et al. [13], a semi-conditional variational
auto-encoder was developed to perform flow reconstruction from sparse measurements in a
probabilistic framework, which predicts the full flow field as well as it uncertainty. In these
methods, the predicted fields do not satisfy the physical governing equations, and some of the
estimated flow structures could be the outcome of generalization errors of the surrogate model
instead of physical ones governed by the Navier-Stokes equations.
The second class of methods aims to predict a trajectory of the flow in state space, that
both satisfies the governing equations and optimally reproduces the measurements. In this class,
four-dimensional adjoint variational data assimilation [14, 15], or 4DVar, casts the problem as an
optimization constrained by the governing equations. Starting from an estimate of the unknown
flow state, a forward simulation produces the full flow trajectory over the time horizon where
measurements are available. The disparity between the measurements and their estimates from
the simulation defines the loss function, and also features in the adjoint equations which are
solved backward in time. A complete forward-adjoint loop yields the gradient of the loss with
respect to the unknown flow state, which can be adopted to improve the estimate of the state.
Since the governing equations are nonlinear, the procedure is repeated till convergence, whereby
the optimal flow state is identified and accurately reproduces the entire history of observations.
The method has been adopted in a wide range of applications, including prediction of scalar
sources from remote measurements [16], estimation of transitional and turbulent Taylor-Couette
flows from limited observations [17], and estimation of turbulence in channel flow [18, 19, 20, 21].
An important property of 4DVar is that the computational cost of evaluating the gradient of
the loss function is one forward-adjoint loop, independent of the size of the control vector being
optimized. This efficiency presents an advantage compared to other optimization algorithms
that only adopt the forward model. For example, in ensemble-variational (EnVar) approaches
[22, 23] the gradient of the loss function is evaluated from an ensemble of forward solutions
whose size, and hence the associated computational cost, are proportional to the dimension of
the control vector. Mons et al. [24] performed a comparison of 4DVar, ensemble-variational
(EnVar) method and ensemble Kalman filter, and concluded that 4DVar is the most accurate.
In this study, the adjoint reconstruction of flow field will be adopted as the benchmark for
evaluating the performance of the physics-informed, machine-learning approach.
Recent innovation in machine learning has presented new opportunities for data assimilation
and flow estimation [25, 26, 27, 28, 29]. Our primary focus will be on physics-informed neural
networks (PINNs). Similar to the adjoint-variational approach, flow estimation using PINNs
is formulated as a minimization problem. The network inputs are the spatial and temporal
coordinates and its outputs are the flow variables at the corresponding coordinates. The loss
function for training the PINN is comprised of different parts: (a) The first part is due to
the mismatch between the network predictions and the available flow measurements; (b) The
2