Quantum state tomography via non-convex Riemannian gradient descent Ming-Chien Hsu1 En-Jui Kuo12 Wei-Hsuan Yu3 Jian-Feng Cai4 and Min-Hsiu Hsieh1

2025-04-29 0 0 486.99KB 25 页 10玖币

侵权投诉

Quantum state tomography via non-convex Riemannian gradient

descent

Ming-Chien Hsu1, En-Jui Kuo1,2, Wei-Hsuan Yu3, Jian-Feng Cai4, and Min-Hsiu Hsieh1

1Hon Hai Quantum Computing Research Center, Taipei, Taiwan

2Joint Center for Quantum Information and Computer Science, NIST and University of

Maryland, College Park, Maryland, USA

3Department of Mathematics, National Central University, Taiwan

4Department of Mathematics, Hong Kong University of Science and Technology, Hong

Kong

Abstract

The recovery of an unknown density matrix of large size requires huge computational re-

sources. The recent Factored Gradient Descent (FGD) algorithm and its variants achieved

state-of-the-art performance since they could mitigate the dimensionality barrier by utilizing

some of the underlying structures of the density matrix. Despite their theoretical guarantee of

a linear convergence rate, the convergence in practical scenarios is still slow because the con-

tracting factor of the FGD algorithms depends on the condition number κof the ground truth

state. Consequently, the total number of iterations can be as large as O(√κln(1

ε)) to achieve

the estimation error ε. In this work, we derive a quantum state tomography scheme that im-

proves the dependence on κto the logarithmic scale; namely, our algorithm could achieve the

approximation error εin O(ln( 1

κε )) steps. The improvement comes from the application of the

non-convex Riemannian gradient descent (RGD). The contracting factor in our approach is thus

a universal constant that is independent of the given state. Our theoretical results of extremely

fast convergence and nearly optimal error bounds are corroborated by numerical results.

1 Introduction

The density matrix is crucial in describing the quantum state in quantum systems. Knowing the

exact form of a density matrix ρplays an important role in inferring further properties of the

system. In some cases, depending on the purpose, only the expectation values of some observables

are of concern. In such cases, shadow tomography is used with the focus only on predicting some

aspects or properties of the density matrices, rather than the whole [1,2,3]. However, arguably,

it is always desirable to be able to reconstruct the whole density matrix, whether for the sake of

comparison or for more general purposes.

Quantum state tomography involves recovering the density matrix from a given collection of

measurements [4,5,6]. This can be translated into the optimization problem of ﬁnding the best

solution with information of given input and certain constraints. The tomography problem can be

formulated and solved in diﬀerent ways, depending on the diﬀerent aspects, by using the maximum

likelihood estimator (MLE) [7,8,9,10,11], maximal entropy method [12,13], and so on. It has

also been shown that the MLE can be converted into a least square minimizer [14].

arXiv:2210.04717v1 [quant-ph] 10 Oct 2022

Most density matrices of interest of size d×dhave some underlying structures. Such structures,

for example, the low rank rstructure [15,16,17,18,19,20] or the permutation property [21,22],

can be utilized for eﬃcient matrix construction. Speciﬁcally, the low rank rmatrices are suitable

for compressed sensing frameworks, given that the number of Pauli observable measurements,

m∼O(rd) (ignoring some log ddependence), is suﬃcient to recover the density matrix ρ, instead

of having to ﬁnd the full d2information set [16,17,23,24,25,26]. Guaranteed reconstruction

is reliant on the restricted isometry property (RIP), which is proven to exist for ordinary Pauli

observable measurements [27].

Since the dimension of the matrix d= 2kgrows exponentially with the qubit number k, the

complexity of the reconstruction increases very quickly. The fact that, up to now, experimental

demonstrations of tomography have only been performed for small qubit numbers [28,29], shows

the diﬃculties. When the matrix dimension dis large, two aspects are particularly relevant in

deciding the quality of the tomography. One is the sample complexity and the other is the time

complexity. In addition, the algorithm used to recover the density matrix must also guarantee the

accuracy.

Sample complexity relates to the fundamental question, how many copies of ρare necessary

and suﬃcient to determine a state [30]. Theoretically, according to the positive operator-value

measurement (POVM) scheme, Ref. [30] showed that O(dr log(d/)/) copies of ρare suﬃcient for

tomography to obtain a description with 1−ﬁdelity, and the necessary lower bound is Ω( rd

log(d/r)).

In another study, [31] improved the lower bound by changing the number of copies to Ω(rd/). For

Pauli measurement, [32] showed that O(10k

δ2) copies are suﬃcient to can accomplish the tomography

of a k-qubit system down to a trace distance error δ.

Time complexity determines the eﬃciency of an algorithm, which is crucial for matrix recovery

in practical applications. The computational time can be slow for calculations involving the entire

matrix, especially when the system size is large. Many standard and state-of-the-art algorithms

require solving the eigen systems or doing the singular value decomposition (SVD), especially

when they involve projection related to the eigenspectrum [10,33,34], singular value contracting

operator [35,36,37], or a unitary transformation of eigenbasis [38,14]. Both SVD or eigenvalue

decomposition have time complexity O(d3) and thus can be slow. Other time consuming operations

involving the full d×dmatrix include Hessian calculation [39] and matrix inverse [40,41,42].

Although the eﬃciency can be improved in [39] by switching over from initial costly rapid descent

and computing less costly proxy for Hessian [39], it is still heuristic and provides no theoretical

guarantee of performance and convergence yet. Some extension of the matrix inversion case [40]

can also improve the eﬃciency [43]. However, this relies on the graphical-processing unit (GPU)

and the linear matrix inversion of the full matrix is not an eﬃcient approach from the algorithmic

point of view. Without utilization of the structures behind the matrix, these algorithms tend to be

slow in recovering matrices when the system is large.

A good algorithm should guarantee both the accuracy and eﬃciency in ﬁnding the answer.

The diﬀerence between the ﬁnal constructed matrix ˆρand the underlying true density matrix ρ

ultimately contains both the error due to the algorithm itself and the error intrinsic to the input

measurement data. The analysis and control of the error bound of this estimated diﬀerence is

important for the correctness of the algorithm. The metric of the error can vary from algorithm to

algorithm. The error bound is shown in nuclear norm in the projected least squares error approach

[44]. In the convex optimization approach within the compressed sensing framework, the error

bounds are shown in both the nuclear norm [45,46] and the Frobenius norm [45].

Since the diﬃculty in tomography is largely caused by the limitations of the algorithms, it is

important to ﬁnd more eﬃcient algorithms. Time complexity can be reduced if the underlying

structure of the matrices can be utilized. Since the density matrices of interest are mostly of

low rank, non-convex approaches having the rank structure inherent to the algorithm can perform

much better [47,48,49,50,51,52]. In particular, [53] adopted the non-convex projected Factored

Gradient Descent (FGD) to do the tomography. The Momentum-Inspired version (MIFGD) [54]

and the stochastic version [55] are the further improved variants of the FGD. Their results indeed

conﬁrm that the FGD method outperforms other approaches, especially when there is an increase

in system size. This process, however, ignores the eigenvalue dependence during factorization;

therefore, each update is heavily dependent on the condition number of the underlying matrix.

Moreover, the minimization of errors in each step is related to the eigenvalues and the contracting

factor is close to 1. Therefore, it still takes numerous iterations to obtain the ﬁnal estimation.

In this paper, we use a much more eﬃcient non-convex Riemannian Gradient Descent (RGD)

algorithm that can overcome these diﬃculties, while still maintaining high guaranteed accuracy.

The RGD algorithm has proven to be both useful and eﬃcient in both matrix recovery problems [56]

and matrix completion problems [57]. Its success comes from suitably taking care of the eigenvalues

(or singular values in general) in each iteration, so that much more eﬃcient convergence can be

expected, while maintaining high accuracy. The results show that it takes logarithmic steps to

achieve the desired accuracy and that nearly optimal error bounds under noise are guaranteed.

The rest of the paper is structured as follows. In Sec. 1.1 we give an overview of the main

results and in Sec. 1.2 we discuss the technical contribution. The related work is reviewed in Sec.

1.3. A preliminary background is presented in Sec. 2. The RGD algorithm and the main results

are illustrated in Sec. 3, while the numerical results are shown in Sec. 4. Finally, some conclusions

are oﬀered in Sec. 5.

1.1 Overview of the main results

The aim of quantum tomography is to recover an unknown density matrix ρof size d×dfrom the

measurement outcome y∈Rm, where the i-th component yiof ycorresponds to the expectation

value Tr(Siρ) of one sampled Pauli observable Si. Since most, if not all, of the density matrices of

interests are of low rank, we assume ρto be of rank r. Let Adenote the Pauli sampling which is the

mapping acting on ρto get mcollections of the expectation values of Pauli observables. Since the

measurements inevitably carry noise z, the measurement result is written as y=A(ρ) + z∈Rm.

With yas the input, the quantum state tomography problem could be reformulated and relaxed as

an optimization; namely, minimizing the function f(X) := 1

2ky−A(X)k2

2over all matrices Xsuch

that rank(X)≤r.

This is a non-convex problem that can be eﬃciently solved with the RGD algorithm. The

initial guess X0is chosen to be the rank rapproximation of A†(y) from the measurement vector y.

Suppose that the noise z∈Rmobeys the condition kA†(z)k ≤ λ. The power of the RGD algorithm

is shown by the error analysis and time complexity analysis in the following theorem and corollary.

Theorem 1. (Simpliﬁed) When provided with a small enough λand a large enough samples m,

the iterate Xkafter ksteps of the RGD algorithm is guaranteed to be close to ρin the Frobenius

norm

kXk−ρkF≤ kX0−ρkF·¯γk+C√rλ,

with a universal contracting factor ¯γ < 1, which is independent of rank r, condition number κ, and

so on, where C=O(1) is constant.

Here, σ1and σrdenote the largest and smallest singular values, respectively. More precisely,

the suﬃcient conditions for the above convergent guarantee ¯γ < 1 are that the sampled number of

Pauli observables m&O(κ2r2dlog6d) and each Pauli measurement requires a statistical average

from l∼d/(rσ2

1log5d) number of measurements. With the universal contracting factor ¯γ < 1, the

time complexity analysis for the superfast convergence is as follows.

Corollary 1. (Simpliﬁed) The RGD algorithm outputs the estimated matrix ˆρwith error bound

kˆρ−ρkF≤C1√rλ after

ln(1/¯γ)ln kρkF

rκλ 

iteration steps for some positive constants C1, C2both being O(1).

1.2 Technical Contribution

Under the conditions of small noise λ≤C1σr/√rand large enough samples m≥C2κ2r2dlog6d,

the RGD estimated matrix will be close to the underlying density matrix, with a nearly optimal

error distance. The convergence is extremely fast, since the required steps are logarithmic with

respect to the ﬁnal errors. Further explanations, as well as some advantages over the non-convex

FGD-type algorithms, follow.

•From the convergent time aspect, the error is reduced by a multiplicative contracting factor

in each update, leading to a favorable linear convergence rate. Speciﬁcally, the contracting

factor ¯γin our RGD algorithm is a universal constant, which is independent of all parameters,

including the RIP constant, the condition number κof the underlying density matrix and so

on. In other words, the error decays exponentially with a constant factor. Together with the

fact that we could assure the initial approximation error to be inversely proportional to the

condition number κ, the required total number of iteration steps to achieve the ﬁnal error ε

is at the order of O(ln( 1

κε )).

This logarithmic dependence on κin convergent steps is an exponential improvement over the

FGD algorithm and its variants. Although the FGD-type algorithms can also achieve a linear

convergence rate, their iterative contracting factor is not universal. Its form can be written

as 1 −1

κα, where αcan be improved to 0.5 in some variants. This gives a total number of

iteration steps O(καln(1/ε)) to achieve the ﬁnal error ε.

•In each iteration, the step size of each iteration is determined from an exact line search, since

the RGD directly minimizes the object function that is quadratic over the set of matrices.

The RGD algorithm is therefore easy to execute, as well as implement. In contrast, each

matrix Xis factored as the form of AA†in the FGD-type algorithms such that the objective

function is quartic in the factored matrix A. This makes it impossible to do an exact line

search and therefore some prior knowledge or parameters are required to decide the step size

in FGD.

•In terms of the estimation error, the recovered matrix is nearly optimal in distance to the

underlying ground truth density matrix. The distance error bound εis provided in Frobenius

norm, which is tighter than the commonly seen nuclear norm. The ﬁnal achievable error

bound depends on the initial input noise z∈Rm. In the noiseless case, where z= 0, the

error can be reduced to nearly zero with arbitrary precision. In the noisy case, the ﬁnal error

bound is at the same order of those best known theoretical results from convex optimization

approaches [45,46], and hence are nearly optimal.

1.3 Related work

Since the work of [16,17], the process of convex optimization has been shown to be useful for

recovering density matrices, particularly in compressed sensing frameworks. For the noiseless case,

m=cdr log2drandomly chosen Pauli expectations can uniquely reconstruct the density matrix

with high probability [16]. For the noisy case, both the Dantzig selector and the Lasso have

been shown to produce similar error bound results [46]. Suppose that the true underlying matrix

ρis of rank r. The estimated matrix is denoted through the algorithms by ˆρ. Then provided

m≥C1

δ2rd log6dand kA†(z)k ≤ λ, there is a high probability that the error bound in the nuclear

norm is kˆρ−ρk∗≤Crλ. In comparison, our results show the error bound in both the Frobenius

norm kˆρ−ρkF≤C√rλ and the nuclear norm kˆρ−ρk∗≤Crλ.

In terms of the total sample size needed to achieve the nuclear norm error bound ε, the convex

optimization methods (Dantzig and Lasso) require O(( rd

ε)2log d) copies [46]. Our scheme requires

the same total sample size ml but it applies to both nuclear norm error bounded by εand Frobenius

norm error bounded by ε/√r. The projected least squares (PLS) approach also requires a similar

sample size O((rd

ε)2log d) for Pauli measurements to have an accuracy εin the nuclear norm

[44]. The demonstrated PLS is based on using all Pauli observables, while our compressed sensing

method allows more delicate separate treatment for the number of sampled Pauli matrices mand

the number of measurements lrequired for each Pauli observable.

Note that the convex optimization method searches for the solution over d×dmatrices while

the RGD algorithm searches for the candidate over a tangent space whose size is d×r. In addition,

the PLS requires a full matrix SVD whose complexity is O(d3), while the RGD has a complexity

O(dr2+r3) for QR decompositions and a SVD. This means that the RGD algorithm is much less

costly in each iterative step than either the convex optimization or the PLS approach. Besides,

the logarithmic steps of the RGD make its overall computational demand much less than the other

approaches while at the same time obtaining the same order of optimal error bounds.

Non-convex approaches utilizing the low rank structures have also been adopted for tomography

in past studies [53,54,55]. Unlike its convex optimization counterpart, the non-convex approach is

usually more eﬃcient in terms of computational resources due to the low rank structure utilized. In

particular, the projected FGD approach and its variants decompose each low rank rdensity matrix

as ρ=AA†for A∈Cd×rto maintain low rank structures. Indeed, faster estimation of quantum

states is achieved by the variant MiFGD [54], compared to state-of-the-art convex [58,59,60] and

non-convex [61] algorithms, including recent deep learning approaches [62,63,64,65]. However,

the FGD-type algorithms still have some shortages due to the factorization. This was shown in the

previous section in parallel of our technical contribution.

2 Preliminary

In this section, we ﬁrst introduce some necessary notations for our problem setting. Then we

describe the compressed sensing and the concept of restricted isometry property (RIP). The RIP

condition is crucial for the matrix recovery problem. Finally we describe the noise and how to

obtain its bound by matrix concentration.

2.1 Notations

For a matrix M∈Cd×d, its adjoint is denoted as M†. Matrix identity is written as I. In the

quantum system, mostly we discuss the matrices and their mapping in the Hermitian space Hd(C) =

{M|M∈Cd×d, M =M†}. We equip the matrix space with the Hilbert-Schmidt inner product

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Quantumstatetomographyvianon-convexRiemanniangradientdescentMing-ChienHsu1,En-JuiKuo1,2,Wei-HsuanYu3,Jian-FengCai4,andMin-HsiuHsieh11HonHaiQuantumComputingResearchCenter,Taipei,Taiwan2JointCenterforQuantumInformationandComputerScience,NISTandUniversityofMaryland,CollegePark,Maryland,USA3Departmentof...

展开>> 收起<<

Quantum state tomography via non-convex Riemannian gradient descent Ming-Chien Hsu1 En-Jui Kuo12 Wei-Hsuan Yu3 Jian-Feng Cai4 and Min-Hsiu Hsieh1.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Quantum state tomography via non-convex Riemannian gradient descent Ming-Chien Hsu1 En-Jui Kuo12 Wei-Hsuan Yu3 Jian-Feng Cai4 and Min-Hsiu Hsieh1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: