Fast gradient estimation for variational quantum algorithms

2025-04-26 0 0 1.28MB 26 页 10玖币
侵权投诉
Fast gradient estimation for variational quantum algorithms
Lennart Bittel, Jens Watty, and Martin Kliesch
Institute for Theoretical Physics, Heinrich Heine University Düsseldorf, Germany
Many optimization methods for training
variational quantum algorithms are based on
estimating gradients of the cost function. Due
to the statistical nature of quantum measure-
ments, this estimation requires many circuit
evaluations, which is a crucial bottleneck of the
whole approach. We propose a new gradient
estimation method to mitigate this measure-
ment challenge and reduce the required mea-
surement rounds. Within a Bayesian frame-
work and based on the generalized parame-
ter shift rule, we use prior information about
the circuit to find an estimation strategy that
minimizes expected statistical and systematic
errors simultaneously. We demonstrate that
this approach can significantly outperform tra-
ditional gradient estimation methods, reduc-
ing the required measurement rounds by up
to an order of magnitude for a common QAOA
setup. Our analysis also shows that an estima-
tion via finite differences can outperform the
parameter shift rule in terms of gradient ac-
curacy for small and moderate measurement
budgets.
1 Introduction
It has been demonstrated that quantum devices
can outperform classical computers on computational
problems specifically tailored to the hardware [1,2].
While this has been an important milestone, the ulti-
mate goal is a useful quantum advantage, i.e. a similar
speedup for a problem with relevant applications. The
central practical challenge is that only noisy and inter-
mediate scale quantum (NISQ) hardware is available
for the foreseeable future [3]. This restriction means
that quantum devices have limited qubit numbers and
can only run short quantum circuits, as the quantum
computation must be finished before noise effects be-
come too dominant. For this reason, great efforts are
being made to design quantum algorithms in a NISQ-
friendly way. One central idea in this effort is to trade
an increased number of circuit evaluations and addi-
tional classical computation for reduced qubit num-
bers and lower circuit depths.
One of the leading approaches toward achieving
Lennart Bittel: lennart.bittel@uni-duesseldorf.de
Jens Watty: jens.schneider@uni-duesseldorf.de
Martin Kliesch: science@mkliesch.eu
Classical
optimization
routine
Allocator
Quantum Computer
U1(θ1)U2(θ2)U3(θ3)UM
|Ψ0i
Low level control
Post processing
Estimator
θ{θi, mi}
{yi, σ2
i}
{ωi}
ˆ
G(θ)
(m, Ca, σ)
θ0
θ, G(θ)
Figure 1: Sketch of the gradient estimation for VQA opti-
mization routines as shown in algorithm 1.
useful quantum advantages is given by variational
quantum algorithms (VQAs). They address the prob-
lem of estimating the ground state energy of a quan-
tum many-body Hamiltonian via a variational opti-
mization, as follows. The quantum part of VQAs
is implemented using parametrized quantum circuits
(PQCs), which are used to prepare the variational
quantum states. In order to interface it with a clas-
sical computer, the energy and the energy gradient
w.r.t. the variational parameters are typically esti-
mated. Then a classical computer, which typically
runs some gradient descent-based algorithm, is used
to minimize the energy via repeated parameter up-
dates and estimations of the energy functional.
Several challenges occur in this approach. First, on
the classical computation side, the optimization might
reach a barren plateau for the objective function [4]
or get stuck in local minima [5]. Barren plateaus
can sometimes be avoided by using smart initializa-
tions for the parameters [6]. Moreover, sophisticated
constructions of the quantum circuit family can help
to bypass such problematic regions in the parameter
space [7,8]. Local minima can at least partially be
avoided using natural gradients [9]. Second, the mea-
surement effort of the quantum computer can pose a
critical bottleneck for VQAs. The reason is that
(i) many iterations steps are done in the classical
optimization,
(ii) several partial derivatives are needed for each
gradient update step,
(iii) multiple measurement settings might be needed
for the estimation of observables such as local
Hamiltonians,
(iv) quantum measurements are probabilistic, re-
1
arXiv:2210.06484v1 [quant-ph] 12 Oct 2022
quiring O(1/2)measurement rounds for ac-
curacy
and this can add up to a large number of total num-
ber of measurements rounds. Since quantum mea-
surements are destructive, one also needs to prepare
the entire variational state from scratch for each mea-
surement round.
In this work, we develop a new gradient estima-
tion algorithm that balances statistical and system-
atic errors which achieve a better gradient estimate
with fewer measurement rounds than conventional es-
timators. Specifically, we first characterize both the
statistical and systematic error that arise in the esti-
mation procedure, where for the systematic error, we
introduce a Bayesian framework using prior informa-
tion and assumptions about the system to estimate it.
Then, we develop allocator methods, which for a given
measurement budget determine an optimal strategy,
namely what and how often we want to measure each
circuit configuration, in order to minimize the total
error. Finally the estimator returns a gradient esti-
mate based on the measurement outcomes. A sketch
of the procedure is shown in figure 1.
The Bayesian approach takes advantage, but also
requires prior knowledge about the system. We de-
velop strategies to obtain this prior information de-
pending on the circuit depth:
(i) For short circuit, we use experimental or numer-
ical/analytical observations.
(ii) For higher depths, we use unitary 2-design prop-
erties of random circuits.
(iii) In the intermediate regime, we use an interpola-
tion of the two.
For the analysis in this paper we neglect all error
sources arising from imperfect quantum hardware and
only focus on noise due to finite measurements (i.e.
shot noise). Additionally, we assume time periodic
unitaries, meaning that, without loss of generality, the
eigenenergies of the generators can be assumed to be
integers.
Finally, we demonstrate numerically, that the
Bayesian approach outperforms previous parameter
shift rule (PSR) approaches in terms of gradient esti-
mation and VQA optimization accuracy.
1.1 Related work
There are several approaches to estimate gradients
in VQAs, notably the PSR [10,11] is able to ob-
tain unbiased gradient estimates for generators with
only two distinct eingenvalues. Further generaliza-
tions were made in Ref. [12] for a wider class of
Hamiltonians. This approach often requires ancillary
qubits or unitaries generated by commuting genera-
tors with two eigenvalues. There are also general-
izations proposed for non-commuting generators [13]
in a stochastic framework. These approaches gener-
ally require measurement settings not contained in the
VQA-ansatz, but which can be assumed to be feasible
for real hardware. There also exists unbiased estima-
tors for arbitrary periodic unitaries [14,15], where all
measurements are contained in the ansatz class.
There are also strategies [1618] that replace the
actual observable underlying the gradient estimation
with some surrogate observables. Another research
direction has been to find efficient estimation schemes
for the whole gradient [19,20] instead of its individ-
ual partial derivatives and thus reducing the required
measurement resources.
We approach the gradient estimation differently.
Namely, we use that VQAs due to their setup expe-
rience some typical behavior, which can be analyzed
in advance and during the VQA optimization. This
allows us to also evaluate the performance of biased
estimation strategies, which under very reasonable as-
sumptions on measurement budget, can significantly
outperform their unbiased counterparts. We use the
general framework of periodic parametrized quantum
gates [14] but believe that a similar Bayesian reason-
ing can also benefit other VQA ansatz classes and gra-
dient strategies. It should therefore not be regarded
as a competitor to existing methods, but as a comple-
mentary strategy to further reduce the measurement
effort of gradient estimation strategies.
1.2 Notation
We use the notation [n]:={1, . . . , n}. The Pauli ma-
trices are denoted by X,Yand Z. An operator O
acting on subsystem jof a larger quantum system is
denoted by Oj, e.g. X1is the Pauli-X-matrix acting
on subsystem 1.`p-norms including p= 0 are denoted
by k · kp. We use several symbols that are summarized
in section 8.
2 Variational quantum algorithms
In a VQA the goal is to find parameters θsuch that
a cost function is minimized. In general, this cost
function is given by
G(θ):=hΨ0|"L
Y
α=1
Uα(θα)#
O"L
Y
α=1
Uα(θα)#|Ψ0i(1)
where |Ψ0iis the initial state, Uα(θα) = eiθαHαare
the unitary gates generated by Hαand Ois the ob-
servable encoding the optimization problem. In this
work, we are considering unitaries that are T-periodic
in θl(w.l.o.g. T= 2π), which implies that all eigen-
values of Hlare integers.
Estimating the gradient of G(θ)w.r.t. θis an im-
portant task, as most optimization algorithms are gra-
dient descent based and thus require an efficient ap-
proximation of the gradient. Our strategy estimates
the gradient by the functions partial derivatives. For
2
this it is convenient to define the cost function at a
point shifted by a value of xin the parameter θl
Fl(x):=G(θ+xel)
=hΨ0|U
l(x)O0Ul(x)|Ψ0i,(2)
where elis the l-th canonical basis vector and the
other layers are absorbed into the observable as O0
and the initial state as |Ψ0i. Furthermore, we used
that Ul(θl+x) = Ul(θl)Ul(x). For later reference, the
modified state and observable are
|Ψ0i="l1
Y
α=1
Uα(θα)#|Ψ0i, and
O0="L
Y
α=l
Uα(θα)#
O"L
Y
α=l
Uα(θα)#.
(3)
The evaluation at point θ, is therefore just G(θ) =
Fl(0) and dG(θ)
dθl=F0
l(0). We will henceforth focus
only on the estimation of a single partial derivative
w.r.t. a parameter θl. In the interest of readability we
write U, and Finstead of Ul, and Fl, as well as |Ψi
and Oinstead of |Ψ0iand O0.
In this restricted view, we are now going to examine
the structure of the function F(x)more closely. The
parametrized unitary that defines this function has
the form
U(θ)=eiθH =
nλ
X
i=1
Pieiλiθ,(4)
where His a Hermitian generator and the Piare the
projectors onto the eigenspaces corresponding to the
eigenvalues {λ1, . . . , λnλ}of Hin ascending order.
Using this notation, we obtain
F(x) = hΨ|U(x)OU(x)|Ψi
=
nλ
X
i,j=1
ei(λiλj)xhΨ|PiOPj|Ψi,(5)
where each cij :=hΨ|PiOPj|Ψiis just a scalar. This
lets us rewrite the function as
F(x) =
nλ
X
i,j=1
ei(λiλj)xcij =
nµ
X
k=1
ckeiµkx+c
keiµkx
=
nµ
X
k=1aksin(µkx) + bkcos(µkx),(6)
where µk∈ {|λiλj|} are all possible eigenvalue dif-
ferences of the generator H,
ck=X
i,j:λiλj=µk
cij (7)
and c=b+ia
2with a,bRnµare Fourier coefficients.
For the total number of frequencies, it follows nµ=
|{µk}| ≤ nλ
2. The derivative at x= 0 is
δ:=F0(0) =
nµ
X
k=1
µkak.(8)
For generators with two eigenvalues, where we can
set w.l.o.g. µk∈ {0, ν}, it has been shown that an
unbiased estimate for the partial derivative can be
obtained via
δ=νF(π
2ν)F(π
2ν)
2,(9)
which is known as the PSR [12].
In essence, we are going to generalize this method.
A helpful tool for this task is the antisymmetric pro-
jection
f(x) = F(x)F(x)
2=
nµ
X
k=1
aksin(µkx).(10)
We are only considering symmetric measurement
schemes as symmetrizing an estimation method will
not make the prediction worse [14]. As such, we refer
only to the positive measurement positions xof f(x),
knowing that estimates of F(+x)and F(x)are re-
quired to determine it. We will also omit the µ= 0
frequency, since it does not affect the derivative. Ad-
ditionally, ν:=kµkrefers to the spectral width of
the generator, meaning µ[ν], since the periodic-
ity of U(x)implies that the entries of µare positive
integers.
3 Gradient estimation approach
The allocator decides the measurement resource al-
location and how to generate the estimate. In par-
ticular, we use a symmetric linear estimator of the
derivative which for a set of measurement positions
x[0, π)nxand number of measurements for each
position mNnxreturns a gradient estimate
ˆ
δ=
nx
X
i=1
wiyi,(11)
where yiare the empirical estimates of f(xi)using mi
measurement rounds. Since each xirequires 2mea-
surement settings, the total number of settings is 2nx.
In the following we develop strategies to find optimal
x,mand w.
The error
tot :=ˆ
δδ(12)
between our estimator guess ˆ
δand the true derivative
δis an important metric that we are going to use as
a figure of merit for choosing our estimation param-
eters. In practice, imperfections of current quantum
hardware and the lack of quantum error correction
3
will cause a significant noise level when evaluating the
cost function on the quantum device. However, even
on a perfect device, the measurement process intro-
duces a shot noise error as the estimates are deter-
mined by sampling from the underlying multinomial
distribution. We denote the expectation value over
the shot noise by h · is.
The expected error under shot noise can then be
written as
htotis=
nx
X
i=1
wihyiisδ(13)
=
nx
X
i=1
wi nµ
X
k=1
aksin(µkxi)!
nµ
X
k=1
µkak(14)
=:(Sxwµ)Ta,(15)
where we have used the definition (Eq. (8)) of δ, the
unbiased nature of the estimate hyiis=f(xi)and
Eq. (10) for f. We also defined the matrix Sxwith
entries Sx
ki := sin(µkxi).
For the mean squared error this means
h2
totis=*nx
X
i=1
wiyiδ2+s
=(Sxwµ)Ta2+X
i
w2
ihy2
iis− hyii2
s
=:2
sys +h2
statis,(16)
where the first term describes the systematic error
resulting from the method not accurately determining
the derivative even for exact measurements and the
second term describes the statistical error arising from
measurement shot noise.
3.1 Estimating the statistical error
For the statistical error, each term hy2
iis− hyii2
sis
the variance for the measurement position xiresulting
from shot-noise errors. If the single shot variance at
position xiis σ2
i, we find the expression
hy2
iis− hyii2
s=σ2
i
mi
(17)
where miis the number of measurement rounds per-
formed for xi. For a fixed measurement budget
m=Pnx
i=1 mi, the optimal measurement allocation
is given by
h2
statis= min
m:kmk1=m
nx
X
i=1
w2
i
σ2
i
mi
=(Pnx
i=1 |wi|σi)2
m,
with mi=m|wi|σi
Pnx
j=1 |wj|σj. If one assumes constant
shot noise σiσregardless of the measurement po-
sition which we will hence force do, this simplifies to
h2
statis=σ2
mkwk2
1with mi=|wi|
kwk1
m . (18)
While σior σare a priori unknown, it is generally
possible to give a rough estimate of σbeforehand and
to determine an estimate of σiafter only a few mea-
surement rounds are performed. For this reason we
assume that σis a known quantity in the following
sections.
3.2 Estimating the systematic error through a
Bayesian approach
Determining the systematic error is more challenging
because it depends explicitly on the Fourier coeffi-
cients awhich are not known. For our analysis we
assume that the estimator will be used for an ensem-
ble of multiple different positions θ, as one expects
to occur during a full gradient descent optimization
routine. So instead of a particular instance we want
to find a strategy where the average total error over
the entire ensemble is minimized. The benefit of this
approach is that the average only requires knowledge
of the general behavior of the Fourier coefficients, not
specific values of the particular realization.
Formally, we assume that a distribution Dθover
the relevant positions θinduces a distribution of the
Fourier coefficients a. One natural distribution Dθis
the uniform distribution over all parameter points θ
[π, π)L, which can be motivated as a model for the
case of random initialization θ0. For an expectation
value over the distribution Dθ, we write h · iθ. In this
framework, taking the expectation value for Dθyields
h2
sysiθ=D(Sxwµ)Ta2Eθ
= (Sxwµ)ThaaTiθ(Sxwµ)
=:(Sxwµ)TCa(Sxwµ),
(19)
meaning that the estimation of the expected squared
systematic error requires knowledge of the second mo-
ment matrix
Ca:=haaTiθRnµ×nµ.(20)
In the following, we derive properties of the of Ca
assuming the uniform distribution over θ.
First, we note that for a shift θθ+elzin the
layer l, the complex Fourier coefficients transform as
ckckeiµkz, see Eq. (6). As Dθis invariant under
such a shift, we have
hckiθ=1
2πZπ
π
hckiθdz=1
2πZπ
π
hckeiµkziθdz= 0
implying that ckis centered around ck= 0 in expec-
tation over θ. Similarly for the second moment, we
compute
hckcpiθ=1
2πZπ
π
hckcpiθei(µk+µp)zdz= 0 ,
hc
kcpiθ=1
2πZπ
π
hc
kcpiθei(µkµp)zdz=h|ck|2iθδkp ,
4
where δkp is the Kronecker delta. From the Fourier
expansion Eq. (6) and ak= 2 Im(ck), it follows that
hakiθ= 0
hakapiθ= 2 δkph|ck|2iθ,(21)
meaning that Cais a diagonal matrix with entries
ha2
kiθ= 2h|ck|2iθgiven by the expected squares of
the Fourier coefficients.
What remains is determining ha2
kiθ. It is worth
pointing out that while underestimating ha2
kiθcan
lead to suboptimal results, even significantly overesti-
mating the amplitudes will still outperform methods,
where no prior assumptions are made, meaning rough
estimates of ha2
kiθare already sufficient for good per-
formance. One way of estimating them is to use al-
ready existing empirical measurement data from pre-
vious optimization rounds or initial calibration. The
coefficients can be estimated using a Fourier fit. An-
other strategy involves numerically simulating smaller
system sizes and extrapolating to the actual size used
in the VQA.
If the applied unitaries in the VQA are known,
ha2
kiθcan sometimes be derived theoretically. For in-
stance in appendix D.2, we derive analytically exact
results for a VQA with a single layer (L= 1). For a
small constant circuit depth, ha2
kiθcan be computed
efficiently, using Monte-Carlo sampling algorithms,
even for large system sizes. This is convenient, as
the case for deep circuits, under certain assumptions,
ha2
kiθcan be approximated again using only the spec-
tral composition of the generator. This is shown in
the following.
Ergodic limit – barren plateaus
VQA optimization routines have to overcome a gen-
eral phenomenon known as barren plateaus. This term
describes the tendency of a gradient in VQAs to be
exponentially suppressed in the system size with in-
creasing circuit depth and for almost all parameters
θ. This phenomenon has been extensively studied
and while mitigation techniques are proposed [68],
it appears to be unavoidable, at least in the general
setup.
For the rigorous analysis of barren plateaus, it is
beneficial to use the language of unitary t-designs.
Effectively, with increasing circuit depth, the overall
applied gate will appear more and more like a Haar-
random unitary, with respect to which the derivative
is suppressed by the Hilbert space dimension. This
assumes that the underlying generators describe a
universal gate set. For this effect to occur, we do
not need convergence to the Haar measure but con-
vergence to a unitary 2-design, which is significantly
quicker [21,22]. For our purposes, such approximate
unitary 2-designs are sufficient since ha2
kiθcan be ex-
pressed as a polynomial in U, Uof degree (2,2). If
this condition is met, we can replace the expectation
value over all angles by the expectation value over all
unitaries. Hence, this condition can be summarized
by the ergodic assumption
hΓ(U(θ))iθ=ZΓ(U)dU , (22)
which holds for any polynomial Γ(U)of degree at most
(2,2) and where the integral is taken w.r.t. the Haar
probability measure on the unitary group.
Under this assumption we derive in appendix Cthat
ha2
kiθ=ξd
σ2
O
dX
ij:µk=λiλj
Tr[Pi/d] Tr[Pj/d](23)
with σ2
O:= Tr[O2/d]Tr[O/d]2, where dis the Hilbert
space dimension and ξdis a constant close to 1that
depends only on d.Tr[Pi/d]is the relative multi-
plicity of the eigenvalue λi. Notably, the factor of 1
d
shows the exponential suppression of the derivative in
the system size, meaning that gate sets drawn from
a 2-design experience barren plateaus. For L→ ∞
this result confirms the assumption that the relative
frequency of an eigenvalue difference µkin the spec-
trum of the generator determines the expected size
of its respective Fourier coefficient. We will analyze
the strengths and limitations of this approach with an
example in section 5.1.
4 Allocation methods
In this section, we derive several allocation methods
for a given measurement budget. A Python imple-
mentation of these methods is available on GitHub
[23].
The estimation algorithm requires the values w,m
and x. We have already seen that making assump-
tions about the ensemble of configurations allows us
to estimate the error using the second-moment matrix
Cafrom Eq. (20) and an a priori shot noise estimate
σ2. In the following, we devise explicit measurement
procedures by making use of the knowledge of these
quantities. In section 4.1, we show that using convex
optimization procedures, one can derive an optimal
measurement strategy, which we call Bayesian linear
gradient estimator (BLGE).
In section 4.2, we then consider the case, when
the number of total measurements goes to infinity.
We call this method unbiased linear gradient estima-
tor (ULGE), as it does not require access to the es-
timates of Caand σ2and yields an estimate without
systematic error. ULGE is an equivalent formulation
of a known method in literature [14]. In section 4.2.1,
we show strong similarity between ULGE and another
popular generalized PSR found in the literature [12].
Finally, in section 4.3, we restrict the number of mea-
surement positions nxto 1 and derive a strategy that
is optimal under this constraint. We call this method
single Bayesian linear gradient estimator (SLGE). We
5
摘要:

FastgradientestimationforvariationalquantumalgorithmsLennartBittel,JensWatty,andMartinKlieschInstituteforTheoreticalPhysics,HeinrichHeineUniversityDüsseldorf,GermanyManyoptimizationmethodsfortrainingvariationalquantumalgorithmsarebasedonestimatinggradientsofthecostfunction.Duetothestatisticalnatureo...

展开>> 收起<<
Fast gradient estimation for variational quantum algorithms.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:1.28MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注