2
a complex time-intensive task.
The key asset of quantum generative modelling is that
quantum measurements (collapse to one of eigenstates of
a measurement operator) provide a new sample with each
shot. Depending on the hardware platform, generating
one sample can take on the order of a few hundreds of
microseconds to several milliseconds [18–24]. For large
probability distributions, represented by entangled 50+
qubit registers, performing the classical inversion is in-
deed much more costly. However, often the challenge
comes from the inference side, when quantum circuits
are required to match specific distributions. Typically,
training of quantum generative models utilizes gradient-
based parametric learning, similarly to training of deep
neural networks [25]. Parameterized quantum circuits
(also referred as quantum neural networks — QNNs) are
trained by estimating the gradient of gate parameters
θ. Moreover, the gradient of a full QGM loss has to
be estimated with respect to θ. For quantum devices
this can be done by the parameter-shift rule and its gen-
eralization [26–28], where number of circuit estimation
increases linearly with the number of parameters. The
overall training cost corresponds to the measurement of
the loss function at each iteration step. In the case of
quantum circuit Born machine the loss may correspond
to Kullback-Leibler(KL) divergence [9], Sinkhorn diver-
gence [29] or maximum mean discrepancy (MMD) [30],
and may require extensive sampling for resolving the loss
as an average. For quantum generative adversarial net-
works (QGAN) the loss minimization is substituted by
the minimax game [31,32] requiring multi-circuit esti-
mation. In all cases the convergence is not guaranteed
due to exponential reduction of gradients [33].
In this work we investigate the possibility of training
the parameters of quantum generative models classically,
while still retaining the quantum advantage in sampling
[34]. For instance, the ability of classical training for a
different paradigm was shown for Gaussian Boson Sam-
pling (GBS) devices [35], but under certain conditions
of fixing an initial set of samples and non-universal op-
eration. For the digital quantum computing operation,
results from previous works [36,37] motivate the possi-
bility that estimating probability density classically can
be feasible without losing the sampling complexity. We
explore this possibility in more detail and use this fur-
ther to develop methods to train circuits classically to
output a desired distribution using a gradient-based ap-
proach. We show that our method is feasible using nu-
merics for up to 30 qubits on a regular desktop computer.
We explore different families of quantum circuits in detail
and perform numerical studies to study their sampling
complexity and expressivity. For expressivity studies in
particular, we look at training a Differentiable Quantum
Generative Model (DQGM) [38–40] architecture which
allows training in the latent (or ‘frequency’) space, and
sampling in the bit-basis. This presents a good test-
ing ground for applying the proposed method to explicit
quantum generative models. We also show that QCBMs
can be trained classically for certain distributions, while
still hard to sample. Our protocols contribute towards
tools for achieving the practical advantage in sampling
once the target distributions are chosen carefully. We
highlight the differences between well known strategies
for QGM and the method discussed in the paper in Fig. 1.
II. METHODS
A. Preliminaries: QCBM and DQGM as implicit
vs explicit generative models
Generally, there are two types of generative models.
Explicit generative models assume a direct access (or
‘inference’) to probability density functions (PDF). At
the same time, implicit generative models are described
by hidden parametric distributions, where samples are
produced by transforming randomness via inversion pro-
cedure. These two types of models have crucial differ-
ences. For example, training explicit models involves
loss functions measuring distances between a given PDF,
ptarget(x) and a model PDF, pmodel(x), for example with
a mean square error (MSE) loss which is defined as
LMSE =X
x|pmodel(x)−ptarget(x)|2(1)
where explicit knowledge of the ptarget(x) is used. On
the other hand, training implicit models involves com-
paring the samples generated by the model with given
data samples (e.g. with a MMD loss [30]). The MMD
loss is defined as
LMMD =E
x∼pmodel ,y∼pmodel
K(x, y)
−2E
x∼pmodel ,y∼ptarget
K(x, y)
+E
x∼ptarget,y∼ptarget
K(x, y)
(2)
where K(x, y) is an appropriate kernel function. The
MMD loss measures the distance between two probabil-
ity distributions using samples drawn from the respec-
tive distributions as shown in the above equation. In the
context of QGM, QCBM is an excellent example of im-
plicit training where typically a MMD like loss-function is
used. On the other hand, recent work showcases how ex-
plicit quantum models such as DQGM [38] and Quantum
Quantile Mechanics [11] benefit from a functional access
to the model probability distributions, allowing input-
differentiable quantum models [39,40] to solve stochas-
tic differential equations or to model distributions with
differential constraints.
Let us consider a quantum state |Ψ⟩created by ap-
plying a quantum circuit ˆ
U(which can be parameter-
ized) to a zero basis state. For a general ˆ
U, simulat-
ing the output PDF values that follow the Born rule
pmodel(x) = |⟨x|Ψ⟩|2and producing samples from |Ψ⟩are
both classically hard. But what if estimating the PDF for