Classically Approximating Variational Quantum Machine Learning with Random Fourier Features Jonas Landman1 2Slimane Thabet3 4Constantin Dalyac3 4Hela Mhiri3 5and Elham Kashe1 3

2025-05-01 0 0 1.82MB 17 页 10玖币
侵权投诉
Classically Approximating Variational Quantum Machine Learning with Random
Fourier Features
Jonas Landman,1, 2 Slimane Thabet,3, 4 Constantin Dalyac,3, 4 Hela Mhiri,3, 5 and Elham Kashefi1, 3
1School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, United Kingdom
2QC Ware, Palo Alto, USA and Paris, France
3Laboratoire d’Informatique de Paris 6, CNRS, Sorbonne Universit´e, 4 Place Jussieu, 75005 Paris, France
4PASQAL SAS, 7 avenue L´eonard de Vinci, 91300 Massy, France
5ENSTA Paris, Institut polytechnique de Paris
(Dated: October 25, 2022)
Many applications of quantum computing in the near term rely on variational quantum circuits
(VQCs). They have been showcased as a promising model for reaching a quantum advantage
in machine learning with current noisy intermediate scale quantum computers (NISQ). It is often
believed that the power of VQCs relies on their exponentially large feature space, and extensive works
have explored the expressiveness and trainability of VQCs in that regard. In our work, we propose
a classical sampling method that may closely approximate a VQC with Hamiltonian encoding,
given only the description of its architecture. It uses the seminal proposal of Random Fourier
Features (RFF) and the fact that VQCs can be seen as large Fourier series. We provide general
theoretical bounds for classically approximating models built from exponentially large quantum
feature space by sampling a few frequencies to build an equivalent low dimensional kernel, and we
show experimentally that this approximation is efficient for several encoding strategies. Precisely, we
show that the number of required samples grows favorably with the size of the quantum spectrum.
This tool therefore questions the hope for quantum advantage from VQCs in many cases, but
conversely helps to narrow the conditions for their potential success. We expect VQCs with various
and complex encoding Hamiltonians, or with large input dimension, to become more robust to
classical approximations.
INTRODUCTION
Many applications of quantum computing in the near
term rely on variational quantum circuits (VQCs) [1,2],
and in particular for solving machine learning (ML)
tasks. VQCs are trained using classical optimization of
their gates’ parameters, a method borrowed from clas-
sical neural networks. Many early works have shown
promising results, both empirically and in theory [35].
However, whether these variational methods can provide
a quantum advantage in the general case with a scaling
number of qubits has not been proven.
What would make VQCs advantageous compared to
classical algorithms for machine learning on classical
data? The most common and intuitive answer is the
formation of a large feature space, due to the projection
of data points in an exponentially large Hilbert space.
Understanding what is happening in this Hilbert space
[6], and most importantly, knowing how we can exploit its
size is an important question for this field. When it comes
to trainability, it has been shown that these exponential
spaces are in fact drawbacks [7]. Regarding expressivity,
it is crucial to understand what kind of functions VQCs
can learn better than a classical algorithm.
One of the biggest challenges we face while trying to
answer these questions is to find the fair comparison be-
tween VQCs and classical ML algorithms. Recent works
[6,8,9] have shown equivalence between VQCs and both
Fourier series and kernel methods. In fact, VQCs are ex-
tremely powerful Fourier series: the functions they can
learn are predetermined by a set of frequencies which can
become numerous with the dimension of the input, and
the number and complexity of the quantum gates used.
In this work, we adapt Random Fourier Features
(RFF) [10], a successful classical sampling algorithm
aimed at efficiently approximating some large classical
kernel methods. We designed three different RFF strate-
gies to approximate VQCs. For each one, we analyze
its efficiency in reproducing results obtained by a VQC.
To do so, we have studied in details the expressivity of
VQCs, understood where their power could come from,
and compared it each time with RFF.
Our method consists in analyzing the encoding gates
of the VQC to extract the final frequencies of its model
and sample from them. Notably, if the VQCs possesses
simple encoding gates such as Pauli gates, we show that
the large quantum feature space is not fully exploited,
making RFF even more efficient. If the number of fre-
quencies in the VQC grows exponentially, the number of
samples required by RFF grows only linearly. Finally, we
have empirically compared VQCs and RFF on real and
artificial use cases. On these, RFF were able to match
the VQC’s answer, and sometimes outperform it. Some
hope resides for VQCs with non-diagonalizable encoding
Hamiltonians, preventing the use of usual RFF. However,
even in this case, we provide an alternative RFF strategy,
along with theoretical approximation bounds.
Our conclusion is that, despite the potentially expo-
nential number of frequencies in the functions that a
VQC can create, Random Fourier Features can be used
to approximate the same resulting function on a given
dataset, in many cases. This restrains the regime in
arXiv:2210.13200v1 [quant-ph] 24 Oct 2022
2
FIG. 1: Random Fourier Features as a classical approximator of quantum models. Instead of training a
Variational Quantum Circuit by using a quantum computer, we propose to train a classical kernel built by sampling
a few frequencies of the quantum model. These frequencies can be derived from the quantum circuit architecture, in
particular from the encoding gates. Using Random Fourier Features, one can build a classical model which performs
as good as the quantum model with a bounded error and a number of samples that grows nicely.
which VQCs can offer a true advantage over classical in
learning tasks over classical data, or invites us to change
the way VQCs are defined.
The manuscript is structured as follows. In Section I,
we recall the properties of variational quantum circuits,
and the fact that they can be expressed as Fourier series
by detailing the structure of their spectrum. In Section
II, we introduce the classical method of Random Fourier
Features. In Section III, we finally present three ways
of applying RFF to approximate VQC’s models. These
theoretical proposals are tested over real and artificial
datasets in Section IV.
I. SPECTRAL PROPERTIES OF VARIATIONAL
QUANTUM CIRCUITS
A. Definitions
We consider a standard ML task where a function f,
named model, must be optimized to map data points
to their target values. The data used for the train-
ing is made of Mpoints x= (x1, . . . , xd) in X=Rd
along with their target values yin Y=R. We define a
quantum model as the family of parametrized functions
f: (X,Θ) → Y, such that
f(x;θ) = h0|U(x;θ)OU(x;θ)|0i(1)
where U(x;θ) is a unitary that represents the
parametrized quantum circuits, θrepresents the train-
able parameters from a space Θ, and Ois an observable.
We can always describe the parametrized quantum cir-
cuit as a series of two types of gates. The first are called
encoding gates as they only depend on input data val-
ues, whereas the trainable gates depend on internal pa-
rameters that are optimized during training. A typical
instance of a Variational Quantum Circuit (VQC) is illus-
trated in Fig.3. In the recent literature, these gates are
often grouped as layers, which is not mandatory, since
any circuit can be sliced into alternating sequences of en-
coding and training blocks (even if containing a single
gate).
Any quantum unitary implements the evolution of a
quantum system under a Hamiltonian. Thus, we choose
to write the `th encoding gates as exp(ixiH`), where xi
is one of the dcomponents of x, and H`is a Hamiltonian
matrix of size 2pif pis the number of qubits this gate
acts on. We will note Lthe number of encoding gates
for each dimension of x(the same for each dimension, for
notation simplicity).
In this framework, the aim is to find the optimal map-
ping between data points and their target values. This
is done by optimizing the parameters θto find the best
guess fsuch that
f= argmin
θ
1
M
M
X
i=1
l(f(xi;θ), yi) (2)
where lis a cost function adapted to the task. For a
standard regression tasks, we can choose l(z, y) = |zy|2
3
B. Quantum Models are Large Fourier Series
We know since [6] that the family of quantum models
defined in Eq.(1) can be rewritten as a Fourier series:
f(x;θ) = X
ω
cωex (3)
where the spectrum Ω of frequencies is determined by
the ensemble of eigenvalues of the encoding Hamiltoni-
ans and the coefficients cωdepend on the parametrized
ansatz, as pictured in Fig.3.
In order to familiarize the reader with the structure of
the spectrum, we explicitly build Ω in the case of a one
dimensional data input (X=R) and with a variational
circuit containing only Lencoding gates. The accessible
frequency spectrum Ω is the ensemble of all the differ-
ences between all possible sums of the eigenvalues of the
encoding gates as shown in Fig.2.
We note λk
`the kth eigenvalue of the `th encoding
Hamiltonian H`having d`eigenvalues. We use the multi-
index i= (i1, . . . , iL) indicating which eigenvalue is taken
from each encoding Hamiltonian. We define Λias
Λi=λi1
1+· · · +λiL
L(4)
Finally, we can express Ω, the set of frequencies as in
Eq.5:
Ω = ΛiΛj,i,j
L
Y
`=1
[|1, d`|],(5)
The simplest case is called Pauli encoding, when all
encoding Hamiltonians are in fact Pauli matrices (e.g.
encoding gates RZ(x) = eix
2σZ) as in [6,11]. In this
case, all the eigenvalues are λ=±1/2, and therefore, the
Λiare all the integers (or half-integers, if Lis odd) in
[L/2, L/2]. It follows that the set of distinct values in
Ω is simply the set of integers in JL, LK. In this case,
there are many redundant frequencies, due to the fact
that all Pauli eigenvalues are the same. Namely, only
2L+1 distinct frequencies among the 22Lpossible values
of ΛiΛj. As shown in Fig.2, more various eigenvalues
would create more distinct frequencies in the end. In
the rest of the paper, Ω will denote the set of distinct
frequencies, without redundancy.
In our experiments (Section IV), we observe an im-
portant phenomenon: redundant frequencies are likely
to have high coefficients. Unique frequencies might of-
ten have in contrast small coefficients, reducing the po-
tential expressivity of the VQC. We see that the redun-
dancy might therefore play an important role in the ex-
pressivity of VQCs, and leave theoretical proof for future
work. In fact, the impact of the frequencies redundancy
on the quantum model has been recently highlighted in
[12] where an analytical correspondence between a fre-
quency redundancy and its Fourier coefficient has been
proven for a simplified class of parameterized quantum
models.
These arguments give some intuition on why one
should use encoding gates from Hamiltonians with rich
and various eigenvalues, by taking complex interactions
over many qubits. A global Hamiltonian over nqubits,
hard to implement, could potentially have 2ndistinct
eigenvalues, thus enlarging Ω and avoiding redundancy.
Another approach from [13] consists in adding scaling
factors in the Pauli encoding gate to modify their eigen-
values and avoid redundancy. It results in an exponential
number of integer frequencies, with respect to L, with
many very high frequencies.
FIG. 2: From encoding Hamiltonians to
Frequencies. The frequencies composing the VQC
model (on one dimensional input) come from all the
combinations of eigenvalues from each encoding
Hamiltonians. This can be seen as a tree, with L= 3
Hamiltonians in this figure. We also see potential
redundancy in the leaves.
We can now generalize, if we now suppose that X=
Rd, such that we encode a vector x= (x1, . . . , xd)
in our quantum model, then Ω becomes the following
ddimensional Cartesian product
Ω=Ω1×2× · · · × d(6)
where each Ωκis defined as in Eq.5on its own set of
Hamiltonians.
In this context, note that the frequencies ωare now
vectors in Rdand there are ddifferent trees to build Ω (see
Fig. 2). Note that for notation simplicity, we assumed
that Lgates were applied on each input’s component,
but it can be generalized to any number of gates per
dimension.
We therefore see that the size of the spectrum ||can
potentially grow exponentially with the number of en-
coding gates and the dimension of the input data. For
4
instance, if we consider a d-dimensional vector xand L
Pauli-encoding gates for each dimension in such a way
that there are Ld encoding gates in the VQC. According
to equation (6), the size of the spectrum Ω would scale as
O(Ld), which becomes quickly intractable as dincreases.
As an example, the spectrum associated to a VQC with
L= 20 encoding gates and d= 16 would require more
than one hundred times the world’s storage data capacity
available in 2007 to be stored [14].
However, can such large Fourier series be approximated
with classical methods? It would consist in building a
classical approximator ˜
fas
˜
f(x) = X
ω˜
˜cωex (7)
such that ˜
Ω is of tractable size and the two solution are
close, written as
f(x)˜
f(x)
ε, using a given error
measure (infinite or `2norms, for instance).
In the cases where the construction of such a model
is possible, it would imply that although classically sim-
ulating the encoding VQC might not be possible, the
quantum model that emerges from it can be efficiently
approximated in a classical way.
C. Quantum models are shift-invariant kernel
methods
As the quantum model is a real-valued function, it fol-
lows that ωΩ implies ωΩ and cω=c
ω. We
express the Fourier series of the quantum model as a sum
of trigonometric functions by defining for every ωΩ:
aω:= cω+cωR(8)
bω:= 1
i(cωcω)R(9)
such that
f(x;θ) = X
ω+
cωex +cωeiωx
=X
ω+
aωcos(ωx) + bωsin(ωx)
(10)
where Ω+contains only half of the frequencies from
Ω. Considering only Pauli gates, if d= 1, we simply
have Ω = JL, LKand Ω+=J0, LK. In dimension d, we
have Ω = JL, LKdand Ω+is built by keeping half of the
frequencies (after removing those of opposite sign), plus
the null vector. In the end, we have
|+|=(2L+ 1)d1
2+ 1 (11)
With a more general encoding scheme, if there is a differ-
ent number of distinct positive frequencies per dimension,
the formula is different but is built similarly.
In the following parts, we will focus solely on Ω+and
conveniently drop the + subscript.
FIG. 3: Variational Quantum Circuits give rise to
Fourier Series. In a quantum machine learning task,
classical data is encoded in a subset of variational gates
of a quantum circuit (green), while blue gates are
trainable.
We can also define the feature map of the quantum
model [8] as
f(x;θ) = hψ(x;θ)|O|ψ(x;θ)i=w(θ)Tφ(x) (12)
where φ(x) is the feature vector, the mapping of the
initial input into a larger feature space, where the new
distribution of the data is supposed to make the classifi-
cation (or regression) solvable with only a linear model.
This linear model is in fact the inner product between
φ(x) and a trainable weight vector w. In the case of
VQCs, we can explicitly express them as:
φ(x) = 1
p||
cos(ωTx)
sin(ωTx)
.
.
.
ω
,w(θ) =
aω
bω
.
.
.
ω
(13)
If the spectrum Ω is known and accessible, one can fit
the quantum model by retrieving the coefficients aω, bω
associated to each frequency ω. This can be done by
using general linear ridge regression techniques (see Ap-
pendix A). Interestingly, there exists a dual formulation
of the linear ridge regression that depends entirely on the
kernel function associated to the model [15]. In our case,
the related kernel function is defined by:
k(x, x0) = hφ(x), φ(x0)i
=1
||X
ω
cos(ωx) cos(ωx0) + sin(ωx) sin(ωx0)
=1
||X
ω
cos(ω(xx0))
(14)
摘要:

ClassicallyApproximatingVariationalQuantumMachineLearningwithRandomFourierFeaturesJonasLandman,1,2SlimaneThabet,3,4ConstantinDalyac,3,4HelaMhiri,3,5andElhamKashe 1,31SchoolofInformatics,UniversityofEdinburgh,10CrichtonStreet,Edinburgh,UnitedKingdom2QCWare,PaloAlto,USAandParis,France3Laboratoired'Inf...

展开>> 收起<<
Classically Approximating Variational Quantum Machine Learning with Random Fourier Features Jonas Landman1 2Slimane Thabet3 4Constantin Dalyac3 4Hela Mhiri3 5and Elham Kashe1 3.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:1.82MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注