lec et al. (2020) leverage optimal transport to define a loss function for missing value
imputation. Yoon and Sull (2020) propose a novel Generative Adversarial Multiple Im-
putation Network (GAMIN) for highly missing Data. In this literature, latent spaces
are used to represent high-dimensional observations, but are not identifiable because
their latent spaces may vary with parameter initialization. In addition, all the missing
data models require true values to be partially observed.
However, relatively little research has focused on estimating realizations of latent
variables, which are unobserved, or completely missing. In the economics literature,
Kalman filter and structural vector autoregressions have often been used to estimate
the realizations of latent variables, such as potential output (Kuttner, 1994), natural
rate of interest (Laubach and Williams, 2003; Holston, Laubach, and Williams, 2017),
and natural rate of unemployment (King and Morley, 2007), but the literature makes
parametric assumptions about the dynamics of latent variables and thus belongs to
the estimation of models with latent variables.
In our setting, we argue that the conditional independence restrictions imply the
local identification of the true values. That allows us to provide an estimator in the
continuous case. Our method is nonparametric in the sense that we do not assume
the distribution of the variables belong to a parametric family as in the widely-used
VAEs (Kingma and Welling, 2013), which use the so-called Evidence Lower Bound
(ELBO) to provide a tractable unbiased Monte Carlo estimator. The VAEs focus on
the estimation of a parametric model. In this paper, we focus on the estimation of the
true values in each observation in the sample without imposing a parametric structure
on the distributions.
Our loss function is a distance between two nonparametric density functions with
and without the conditional independence. Such a distance is based on a powerful
nonparametric identification result in the measurement error literature Hu and Schen-
nach (2008). (See Hu (2017) and Schennach (2020) for a review.) It shows that the
joint distribution of a latent variable and its measurements is uniquely determined
by the joint distribution of the observed measurements under a key conditional in-
dependence assumption, together with other technical restrictions. To measure the
distance between two density functions, we choose the Kullback–Leibler divergence
(Kullback and Leibler, 1951), which plays a leading role in machine learning and neu-
roscience (P´erez-Cruz, 2008). A large literature has studied the estimation of the
Kullback–Leibler divergence (Darbellay and Vajda, 1999; Moreno, Ho, and Vasconce-
los, 2003; Wang, Kulkarni, and Verd´u, 2005; Lee and Park, 2006; Wang, Kulkarni, and
Verd´u, 2006; Nguyen, Wainwright, and Jordan, 2010; Nowozin, Cseke, and Tomioka,
3