Quantum state tomography via non-convex Riemannian gradient descent Ming-Chien Hsu1 En-Jui Kuo12 Wei-Hsuan Yu3 Jian-Feng Cai4 and Min-Hsiu Hsieh1

2025-04-29 0 0 486.99KB 25 页 10玖币
侵权投诉
Quantum state tomography via non-convex Riemannian gradient
descent
Ming-Chien Hsu1, En-Jui Kuo1,2, Wei-Hsuan Yu3, Jian-Feng Cai4, and Min-Hsiu Hsieh1
1Hon Hai Quantum Computing Research Center, Taipei, Taiwan
2Joint Center for Quantum Information and Computer Science, NIST and University of
Maryland, College Park, Maryland, USA
3Department of Mathematics, National Central University, Taiwan
4Department of Mathematics, Hong Kong University of Science and Technology, Hong
Kong
Abstract
The recovery of an unknown density matrix of large size requires huge computational re-
sources. The recent Factored Gradient Descent (FGD) algorithm and its variants achieved
state-of-the-art performance since they could mitigate the dimensionality barrier by utilizing
some of the underlying structures of the density matrix. Despite their theoretical guarantee of
a linear convergence rate, the convergence in practical scenarios is still slow because the con-
tracting factor of the FGD algorithms depends on the condition number κof the ground truth
state. Consequently, the total number of iterations can be as large as O(κln(1
ε)) to achieve
the estimation error ε. In this work, we derive a quantum state tomography scheme that im-
proves the dependence on κto the logarithmic scale; namely, our algorithm could achieve the
approximation error εin O(ln( 1
κε )) steps. The improvement comes from the application of the
non-convex Riemannian gradient descent (RGD). The contracting factor in our approach is thus
a universal constant that is independent of the given state. Our theoretical results of extremely
fast convergence and nearly optimal error bounds are corroborated by numerical results.
1 Introduction
The density matrix is crucial in describing the quantum state in quantum systems. Knowing the
exact form of a density matrix ρplays an important role in inferring further properties of the
system. In some cases, depending on the purpose, only the expectation values of some observables
are of concern. In such cases, shadow tomography is used with the focus only on predicting some
aspects or properties of the density matrices, rather than the whole [1,2,3]. However, arguably,
it is always desirable to be able to reconstruct the whole density matrix, whether for the sake of
comparison or for more general purposes.
Quantum state tomography involves recovering the density matrix from a given collection of
measurements [4,5,6]. This can be translated into the optimization problem of finding the best
solution with information of given input and certain constraints. The tomography problem can be
formulated and solved in different ways, depending on the different aspects, by using the maximum
likelihood estimator (MLE) [7,8,9,10,11], maximal entropy method [12,13], and so on. It has
also been shown that the MLE can be converted into a least square minimizer [14].
1
arXiv:2210.04717v1 [quant-ph] 10 Oct 2022
Most density matrices of interest of size d×dhave some underlying structures. Such structures,
for example, the low rank rstructure [15,16,17,18,19,20] or the permutation property [21,22],
can be utilized for efficient matrix construction. Specifically, the low rank rmatrices are suitable
for compressed sensing frameworks, given that the number of Pauli observable measurements,
mO(rd) (ignoring some log ddependence), is sufficient to recover the density matrix ρ, instead
of having to find the full d2information set [16,17,23,24,25,26]. Guaranteed reconstruction
is reliant on the restricted isometry property (RIP), which is proven to exist for ordinary Pauli
observable measurements [27].
Since the dimension of the matrix d= 2kgrows exponentially with the qubit number k, the
complexity of the reconstruction increases very quickly. The fact that, up to now, experimental
demonstrations of tomography have only been performed for small qubit numbers [28,29], shows
the difficulties. When the matrix dimension dis large, two aspects are particularly relevant in
deciding the quality of the tomography. One is the sample complexity and the other is the time
complexity. In addition, the algorithm used to recover the density matrix must also guarantee the
accuracy.
Sample complexity relates to the fundamental question, how many copies of ρare necessary
and sufficient to determine a state [30]. Theoretically, according to the positive operator-value
measurement (POVM) scheme, Ref. [30] showed that O(dr log(d/)/) copies of ρare sufficient for
tomography to obtain a description with 1fidelity, and the necessary lower bound is Ω( rd
log(d/r)).
In another study, [31] improved the lower bound by changing the number of copies to Ω(rd/). For
Pauli measurement, [32] showed that O(10k
δ2) copies are sufficient to can accomplish the tomography
of a k-qubit system down to a trace distance error δ.
Time complexity determines the efficiency of an algorithm, which is crucial for matrix recovery
in practical applications. The computational time can be slow for calculations involving the entire
matrix, especially when the system size is large. Many standard and state-of-the-art algorithms
require solving the eigen systems or doing the singular value decomposition (SVD), especially
when they involve projection related to the eigenspectrum [10,33,34], singular value contracting
operator [35,36,37], or a unitary transformation of eigenbasis [38,14]. Both SVD or eigenvalue
decomposition have time complexity O(d3) and thus can be slow. Other time consuming operations
involving the full d×dmatrix include Hessian calculation [39] and matrix inverse [40,41,42].
Although the efficiency can be improved in [39] by switching over from initial costly rapid descent
and computing less costly proxy for Hessian [39], it is still heuristic and provides no theoretical
guarantee of performance and convergence yet. Some extension of the matrix inversion case [40]
can also improve the efficiency [43]. However, this relies on the graphical-processing unit (GPU)
and the linear matrix inversion of the full matrix is not an efficient approach from the algorithmic
point of view. Without utilization of the structures behind the matrix, these algorithms tend to be
slow in recovering matrices when the system is large.
A good algorithm should guarantee both the accuracy and efficiency in finding the answer.
The difference between the final constructed matrix ˆρand the underlying true density matrix ρ
ultimately contains both the error due to the algorithm itself and the error intrinsic to the input
measurement data. The analysis and control of the error bound of this estimated difference is
important for the correctness of the algorithm. The metric of the error can vary from algorithm to
algorithm. The error bound is shown in nuclear norm in the projected least squares error approach
[44]. In the convex optimization approach within the compressed sensing framework, the error
bounds are shown in both the nuclear norm [45,46] and the Frobenius norm [45].
Since the difficulty in tomography is largely caused by the limitations of the algorithms, it is
important to find more efficient algorithms. Time complexity can be reduced if the underlying
2
structure of the matrices can be utilized. Since the density matrices of interest are mostly of
low rank, non-convex approaches having the rank structure inherent to the algorithm can perform
much better [47,48,49,50,51,52]. In particular, [53] adopted the non-convex projected Factored
Gradient Descent (FGD) to do the tomography. The Momentum-Inspired version (MIFGD) [54]
and the stochastic version [55] are the further improved variants of the FGD. Their results indeed
confirm that the FGD method outperforms other approaches, especially when there is an increase
in system size. This process, however, ignores the eigenvalue dependence during factorization;
therefore, each update is heavily dependent on the condition number of the underlying matrix.
Moreover, the minimization of errors in each step is related to the eigenvalues and the contracting
factor is close to 1. Therefore, it still takes numerous iterations to obtain the final estimation.
In this paper, we use a much more efficient non-convex Riemannian Gradient Descent (RGD)
algorithm that can overcome these difficulties, while still maintaining high guaranteed accuracy.
The RGD algorithm has proven to be both useful and efficient in both matrix recovery problems [56]
and matrix completion problems [57]. Its success comes from suitably taking care of the eigenvalues
(or singular values in general) in each iteration, so that much more efficient convergence can be
expected, while maintaining high accuracy. The results show that it takes logarithmic steps to
achieve the desired accuracy and that nearly optimal error bounds under noise are guaranteed.
The rest of the paper is structured as follows. In Sec. 1.1 we give an overview of the main
results and in Sec. 1.2 we discuss the technical contribution. The related work is reviewed in Sec.
1.3. A preliminary background is presented in Sec. 2. The RGD algorithm and the main results
are illustrated in Sec. 3, while the numerical results are shown in Sec. 4. Finally, some conclusions
are offered in Sec. 5.
1.1 Overview of the main results
The aim of quantum tomography is to recover an unknown density matrix ρof size d×dfrom the
measurement outcome yRm, where the i-th component yiof ycorresponds to the expectation
value Tr(Siρ) of one sampled Pauli observable Si. Since most, if not all, of the density matrices of
interests are of low rank, we assume ρto be of rank r. Let Adenote the Pauli sampling which is the
mapping acting on ρto get mcollections of the expectation values of Pauli observables. Since the
measurements inevitably carry noise z, the measurement result is written as y=A(ρ) + zRm.
With yas the input, the quantum state tomography problem could be reformulated and relaxed as
an optimization; namely, minimizing the function f(X) := 1
2kyA(X)k2
2over all matrices Xsuch
that rank(X)r.
This is a non-convex problem that can be efficiently solved with the RGD algorithm. The
initial guess X0is chosen to be the rank rapproximation of A(y) from the measurement vector y.
Suppose that the noise zRmobeys the condition kA(z)k ≤ λ. The power of the RGD algorithm
is shown by the error analysis and time complexity analysis in the following theorem and corollary.
Theorem 1. (Simplified) When provided with a small enough λand a large enough samples m,
the iterate Xkafter ksteps of the RGD algorithm is guaranteed to be close to ρin the Frobenius
norm
kXkρkF≤ kX0ρkF·¯γk+Crλ,
with a universal contracting factor ¯γ < 1, which is independent of rank r, condition number κ, and
so on, where C=O(1) is constant.
Here, σ1and σrdenote the largest and smallest singular values, respectively. More precisely,
the sufficient conditions for the above convergent guarantee ¯γ < 1 are that the sampled number of
3
Pauli observables m&O(κ2r2dlog6d) and each Pauli measurement requires a statistical average
from ld/(rσ2
1log5d) number of measurements. With the universal contracting factor ¯γ < 1, the
time complexity analysis for the superfast convergence is as follows.
Corollary 1. (Simplified) The RGD algorithm outputs the estimated matrix ˆρwith error bound
kˆρρkFC1rλ after
C2
ln(1/¯γ)ln kρkF
rκλ
iteration steps for some positive constants C1, C2both being O(1).
1.2 Technical Contribution
Under the conditions of small noise λC1σr/rand large enough samples mC2κ2r2dlog6d,
the RGD estimated matrix will be close to the underlying density matrix, with a nearly optimal
error distance. The convergence is extremely fast, since the required steps are logarithmic with
respect to the final errors. Further explanations, as well as some advantages over the non-convex
FGD-type algorithms, follow.
From the convergent time aspect, the error is reduced by a multiplicative contracting factor
in each update, leading to a favorable linear convergence rate. Specifically, the contracting
factor ¯γin our RGD algorithm is a universal constant, which is independent of all parameters,
including the RIP constant, the condition number κof the underlying density matrix and so
on. In other words, the error decays exponentially with a constant factor. Together with the
fact that we could assure the initial approximation error to be inversely proportional to the
condition number κ, the required total number of iteration steps to achieve the final error ε
is at the order of O(ln( 1
κε )).
This logarithmic dependence on κin convergent steps is an exponential improvement over the
FGD algorithm and its variants. Although the FGD-type algorithms can also achieve a linear
convergence rate, their iterative contracting factor is not universal. Its form can be written
as 1 1
κα, where αcan be improved to 0.5 in some variants. This gives a total number of
iteration steps O(καln(1)) to achieve the final error ε.
In each iteration, the step size of each iteration is determined from an exact line search, since
the RGD directly minimizes the object function that is quadratic over the set of matrices.
The RGD algorithm is therefore easy to execute, as well as implement. In contrast, each
matrix Xis factored as the form of AAin the FGD-type algorithms such that the objective
function is quartic in the factored matrix A. This makes it impossible to do an exact line
search and therefore some prior knowledge or parameters are required to decide the step size
in FGD.
In terms of the estimation error, the recovered matrix is nearly optimal in distance to the
underlying ground truth density matrix. The distance error bound εis provided in Frobenius
norm, which is tighter than the commonly seen nuclear norm. The final achievable error
bound depends on the initial input noise zRm. In the noiseless case, where z= 0, the
error can be reduced to nearly zero with arbitrary precision. In the noisy case, the final error
bound is at the same order of those best known theoretical results from convex optimization
approaches [45,46], and hence are nearly optimal.
4
1.3 Related work
Since the work of [16,17], the process of convex optimization has been shown to be useful for
recovering density matrices, particularly in compressed sensing frameworks. For the noiseless case,
m=cdr log2drandomly chosen Pauli expectations can uniquely reconstruct the density matrix
with high probability [16]. For the noisy case, both the Dantzig selector and the Lasso have
been shown to produce similar error bound results [46]. Suppose that the true underlying matrix
ρis of rank r. The estimated matrix is denoted through the algorithms by ˆρ. Then provided
mC1
δ2rd log6dand kA(z)k ≤ λ, there is a high probability that the error bound in the nuclear
norm is kˆρρkC. In comparison, our results show the error bound in both the Frobenius
norm kˆρρkFCrλ and the nuclear norm kˆρρkCrλ.
In terms of the total sample size needed to achieve the nuclear norm error bound ε, the convex
optimization methods (Dantzig and Lasso) require O(( rd
ε)2log d) copies [46]. Our scheme requires
the same total sample size ml but it applies to both nuclear norm error bounded by εand Frobenius
norm error bounded by ε/r. The projected least squares (PLS) approach also requires a similar
sample size O((rd
ε)2log d) for Pauli measurements to have an accuracy εin the nuclear norm
[44]. The demonstrated PLS is based on using all Pauli observables, while our compressed sensing
method allows more delicate separate treatment for the number of sampled Pauli matrices mand
the number of measurements lrequired for each Pauli observable.
Note that the convex optimization method searches for the solution over d×dmatrices while
the RGD algorithm searches for the candidate over a tangent space whose size is d×r. In addition,
the PLS requires a full matrix SVD whose complexity is O(d3), while the RGD has a complexity
O(dr2+r3) for QR decompositions and a SVD. This means that the RGD algorithm is much less
costly in each iterative step than either the convex optimization or the PLS approach. Besides,
the logarithmic steps of the RGD make its overall computational demand much less than the other
approaches while at the same time obtaining the same order of optimal error bounds.
Non-convex approaches utilizing the low rank structures have also been adopted for tomography
in past studies [53,54,55]. Unlike its convex optimization counterpart, the non-convex approach is
usually more efficient in terms of computational resources due to the low rank structure utilized. In
particular, the projected FGD approach and its variants decompose each low rank rdensity matrix
as ρ=AAfor ACd×rto maintain low rank structures. Indeed, faster estimation of quantum
states is achieved by the variant MiFGD [54], compared to state-of-the-art convex [58,59,60] and
non-convex [61] algorithms, including recent deep learning approaches [62,63,64,65]. However,
the FGD-type algorithms still have some shortages due to the factorization. This was shown in the
previous section in parallel of our technical contribution.
2 Preliminary
In this section, we first introduce some necessary notations for our problem setting. Then we
describe the compressed sensing and the concept of restricted isometry property (RIP). The RIP
condition is crucial for the matrix recovery problem. Finally we describe the noise and how to
obtain its bound by matrix concentration.
2.1 Notations
For a matrix MCd×d, its adjoint is denoted as M. Matrix identity is written as I. In the
quantum system, mostly we discuss the matrices and their mapping in the Hermitian space Hd(C) =
{M|MCd×d, M =M}. We equip the matrix space with the Hilbert-Schmidt inner product
5
摘要:

Quantumstatetomographyvianon-convexRiemanniangradientdescentMing-ChienHsu1,En-JuiKuo1,2,Wei-HsuanYu3,Jian-FengCai4,andMin-HsiuHsieh11HonHaiQuantumComputingResearchCenter,Taipei,Taiwan2JointCenterforQuantumInformationandComputerScience,NISTandUniversityofMaryland,CollegePark,Maryland,USA3Departmentof...

展开>> 收起<<
Quantum state tomography via non-convex Riemannian gradient descent Ming-Chien Hsu1 En-Jui Kuo12 Wei-Hsuan Yu3 Jian-Feng Cai4 and Min-Hsiu Hsieh1.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:486.99KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注