
art Copula-based baselines.
2 Background
2.1 Related work
Unsupervised learning of multivariate distributions
has seen tremendous progress over recent years, for
the case of PDF modeling in particular. Classical
methods in the literature include kernel density es-
timation (KDE)34,35, histogram density estimation
(HDE), and Orthogonal Series Density Estimation
(OSDE)11,13,41. All of the aforementioned methods,
however are inefficient for datasets with higher di-
mensionalities. Neural network-based approaches for
distribution estimation have recently shown promis-
ing results in high-dimensional problems. Auto-
regressive (AR) models such as30,36 decompose the
distribution into a product of conditionals, where
each conditional is modeled by a parametric distri-
bution (e.g., Gaussian or mixture of Gaussians in the
continuous case). Normalizing flows (NFs)33 repre-
sent a density value though an invertible transforma-
tion of latent variables with known density.
On the down-side, AR models are naturally sen-
sitive to the order of the variables/features while
strong network constraints of NFs can be restrictive
for model expressiveness. Most importantly, AR and
NF do not yield an explicit estimate of the density
function; they are ‘oracles’ that can be queried to
output an estimate of the density at any given input
point, i.e., to generate samples of the sought den-
sity – the difference is important. Therefore, given
a trained model, calculating expectations, marginal
and conditional distributions is not straightforward
with these methods. The same holds for generative
adversarial networks14 (GANs) as they do not al-
low for likelihood evaluation on held-out data. Fur-
thermore, deep multivariate CDF based models such
as6, do not address model identifiability, and can not
guarantee the recovery of the true latent factors that
generated the observed samples.
Tensor modeling of distributions: Tensor
models for estimating distributions have been pro-
posed for both discrete and continuous variables. In
the discrete case, the work in22 showed that any joint
PMF can be represented as an N-way probability
tensor and by introducing a CPD model, every mul-
tivariate PMF can be represented by a latent vari-
able naive Bayes model with a finite number of la-
tent states. For continuous random vectors, the joint
PDF can no longer be directly represented by a ten-
sor. Earlier work (40) has dealt with latent variable
models, but not general distributions. In contrast to
prior work (40,21), we make no assumptions regard-
ing a multivariate mixture model of non-parametric
product distributions in this paper. Another line of
work (see2,3) proposed a “universal” approach for
smooth, compactly supported multivariate densities
by representing the underlying density in terms of a
finite tensor of leading Fourier coefficients. Our work
requires less restrictive assumptions, as it also works
with discrete or mixed random variables, of possibly
unbounded support.
2.2 Notation, Definitions, and Pre-
liminaries
We use the symbols x,X,Xfor vectors, matrices
and tensors respectively. We use the notation x(n),
X(:, n), X(:,:, n) to refer to a particular element of
a vector, a column of a matrix and a slab of a ten-
sor. Symbols ◦,⊗,~,denote the outer, Kronecker,
Hadamard and Khatri-Rao (column-wise Kronecker)
product respectively. The vectorization operator is
denoted as vec(X), vec(X) for a matrix and ten-
sor respectively39. Additionally, diag(x)∈RI×Ide-
notes the diagonal matrix with the elements of vector
x∈RIon its diagonal. Symbols kxk1,kxk2,kXkF,
and dT V correspond to L1norm, L2norm, Frobenius
norm, and total variation distance. The total varia-
tion distance between distributions pand qis defined
as dT V (p, q) = 1
2kp−qk1.
Given an N-dimensional random vector X:=
[X1, . . . , XN]T,X∼FXwill denote that the random
vector Xfollows distribution FX.1(A) is the indica-
tor function of event A, i.e., it is 1 if and only if A
is true. The set of integers {1, . . . , N}is denoted as
[N]. Given Mdata samples, D={xm}M
m=1 denotes
the given dataset.
3