
where δkp is the Kronecker delta. From the Fourier
expansion Eq. (6) and ak= 2 Im(ck), it follows that
hakiθ= 0
hakapiθ= 2 δkph|ck|2iθ,(21)
meaning that Cais a diagonal matrix with entries
ha2
kiθ= 2h|ck|2iθgiven by the expected squares of
the Fourier coefficients.
What remains is determining ha2
kiθ. It is worth
pointing out that while underestimating ha2
kiθcan
lead to suboptimal results, even significantly overesti-
mating the amplitudes will still outperform methods,
where no prior assumptions are made, meaning rough
estimates of ha2
kiθare already sufficient for good per-
formance. One way of estimating them is to use al-
ready existing empirical measurement data from pre-
vious optimization rounds or initial calibration. The
coefficients can be estimated using a Fourier fit. An-
other strategy involves numerically simulating smaller
system sizes and extrapolating to the actual size used
in the VQA.
If the applied unitaries in the VQA are known,
ha2
kiθcan sometimes be derived theoretically. For in-
stance in appendix D.2, we derive analytically exact
results for a VQA with a single layer (L= 1). For a
small constant circuit depth, ha2
kiθcan be computed
efficiently, using Monte-Carlo sampling algorithms,
even for large system sizes. This is convenient, as
the case for deep circuits, under certain assumptions,
ha2
kiθcan be approximated again using only the spec-
tral composition of the generator. This is shown in
the following.
Ergodic limit – barren plateaus
VQA optimization routines have to overcome a gen-
eral phenomenon known as barren plateaus. This term
describes the tendency of a gradient in VQAs to be
exponentially suppressed in the system size with in-
creasing circuit depth and for almost all parameters
θ. This phenomenon has been extensively studied
and while mitigation techniques are proposed [6–8],
it appears to be unavoidable, at least in the general
setup.
For the rigorous analysis of barren plateaus, it is
beneficial to use the language of unitary t-designs.
Effectively, with increasing circuit depth, the overall
applied gate will appear more and more like a Haar-
random unitary, with respect to which the derivative
is suppressed by the Hilbert space dimension. This
assumes that the underlying generators describe a
universal gate set. For this effect to occur, we do
not need convergence to the Haar measure but con-
vergence to a unitary 2-design, which is significantly
quicker [21,22]. For our purposes, such approximate
unitary 2-designs are sufficient since ha2
kiθcan be ex-
pressed as a polynomial in U, U†of degree (2,2). If
this condition is met, we can replace the expectation
value over all angles by the expectation value over all
unitaries. Hence, this condition can be summarized
by the ergodic assumption
hΓ(U(θ))iθ=ZΓ(U)dU , (22)
which holds for any polynomial Γ(U)of degree at most
(2,2) and where the integral is taken w.r.t. the Haar
probability measure on the unitary group.
Under this assumption we derive in appendix Cthat
ha2
kiθ=ξd
σ2
O
dX
i≥j:µk=λi−λj
Tr[Pi/d] Tr[Pj/d](23)
with σ2
O:= Tr[O2/d]−Tr[O/d]2, where dis the Hilbert
space dimension and ξdis a constant close to 1that
depends only on d.Tr[Pi/d]is the relative multi-
plicity of the eigenvalue λi. Notably, the factor of 1
d
shows the exponential suppression of the derivative in
the system size, meaning that gate sets drawn from
a 2-design experience barren plateaus. For L→ ∞
this result confirms the assumption that the relative
frequency of an eigenvalue difference µkin the spec-
trum of the generator determines the expected size
of its respective Fourier coefficient. We will analyze
the strengths and limitations of this approach with an
example in section 5.1.
4 Allocation methods
In this section, we derive several allocation methods
for a given measurement budget. A Python imple-
mentation of these methods is available on GitHub
[23].
The estimation algorithm requires the values w,m
and x. We have already seen that making assump-
tions about the ensemble of configurations allows us
to estimate the error using the second-moment matrix
Cafrom Eq. (20) and an a priori shot noise estimate
σ2. In the following, we devise explicit measurement
procedures by making use of the knowledge of these
quantities. In section 4.1, we show that using convex
optimization procedures, one can derive an optimal
measurement strategy, which we call Bayesian linear
gradient estimator (BLGE).
In section 4.2, we then consider the case, when
the number of total measurements goes to infinity.
We call this method unbiased linear gradient estima-
tor (ULGE), as it does not require access to the es-
timates of Caand σ2and yields an estimate without
systematic error. ULGE is an equivalent formulation
of a known method in literature [14]. In section 4.2.1,
we show strong similarity between ULGE and another
popular generalized PSR found in the literature [12].
Finally, in section 4.3, we restrict the number of mea-
surement positions nxto 1 and derive a strategy that
is optimal under this constraint. We call this method
single Bayesian linear gradient estimator (SLGE). We
5