Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks Louis Schatzki1 2Martín Larocca3 4Quynh T. Nguyen3 5Frédéric Sauvage3and M. Cerezo1 1Information Sciences Los Alamos National Laboratory Los Alamos New Mexico 87545 USA

2025-05-06 0 0 1.37MB 41 页 10玖币
侵权投诉
Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks
Louis Schatzki,1, 2, Martín Larocca,3, 4, Quynh T. Nguyen,3, 5 Frédéric Sauvage,3and M. Cerezo1,
1Information Sciences, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
2Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, USA
3Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
4Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
5Harvard Quantum Initiative, Harvard University, Cambridge, Massachusetts 02138, USA
Despite the great promise of quantum machine learning models, there are several challenges
one must overcome before unlocking their full potential. For instance, models based on quantum
neural networks (QNNs) can suffer from excessive local minima and barren plateaus in their training
landscapes. Recently, the nascent field of geometric quantum machine learning (GQML) has emerged
as a potential solution to some of those issues. The key insight of GQML is that one should design
architectures, such as equivariant QNNs, encoding the symmetries of the problem at hand. Here,
we focus on problems with permutation symmetry (i.e., symmetry group Sn), and show how to
build Sn-equivariant QNNs. We provide an analytical study of their performance, proving that
they do not suffer from barren plateaus, quickly reach overparametrization, and generalize well from
small amounts of data. To verify our results, we perform numerical simulations for a graph state
classification task. Our work provides theoretical guarantees for equivariant QNNs, thus indicating
the power and potential of GQML.
INTRODUCTION
Symmetry studies and formalizes the invariance of ob-
jects under some set of operations. A wealth of theory has
gone into describing symmetries as mathematical entities
through the concept of groups and representations. While
the analysis of symmetries in nature has greatly improved
our understanding of the laws of physics, the study of sym-
metries in data has just recently gained momentum within
the framework of learning theory. In the past few years,
classical machine learning practitioners realized that mod-
els tend to perform better when constrained to respect the
underlying symmetries of the data. This has led to the
blossoming field of geometric deep learning [15], where
symmetries are incorporated as geometric priors into the
learning architectures, improving trainability and general-
ization performance [613].
The tremendous success of geometric deep learning has
recently inspired researchers to import these ideas to the
realm of quantum machine learning (QML) [1416]. QML
is a new and exciting field at the intersection of classi-
cal machine learning, and quantum computing. By run-
ning routines in quantum hardware, and thus exploiting
the exponentially large dimension of the Hilbert space, the
hope is that QML algorithms can outperform their classical
counterparts when learning from data [17].
louisms2@illinois.edu
The two first authors contributed equally.
cerezo@lanl.gov
The infusion of ideas from geometric deep learning
to QML has been termed "geometric quantum machine
learning"(GQML) [1824]. GQML leverages the ma-
chinery of group and representation theory [25] to build
quantum architectures that encode symmetry information
about the problem at hand. For instance, when the
model is parametrized through a quantum neural network
(QNN) [16,2628], GQML indicates that the layers of the
QNN should be equivariant under the action of the sym-
metry group associated to the dataset. That is, applying a
symmetry transformation on the input to the QNN layers
should be the same as applying it to its output.
One of the main goals of GQML is to create architec-
tures that solve, or at least significantly mitigate, some
of the known issues of standard symmetry non-preserving
QML models [16]. For instance, it has been shown that
the optimization landscapes of generic QNNs can exhibit
a large number of local minima [2932], or be prone to
the barren plateau phenomenon [3345] whereby the loss
function gradients vanish exponentially with the problem
size. Crucially, it is known that barren plateaus and
excessive local minima are connected to the expressibil-
ity [30,32,37,43,46] of the QNN, so that problem-agnostic
architectures are more likely to exhibit trainability issues.
In this sense, it is expected that following the GQML pro-
gram of baking symmetry directly into the algorithm, will
lead to models with sharp inductive biases that suitably
limit their expressibility and search space.
In this work we leverage the GQML toolbox to cre-
ate models that are permutation invariant, i.e., mod-
els whose outputs remain invariant under the action of
arXiv:2210.09974v3 [quant-ph] 14 Feb 2024
2
the symmetric group Sn(see Fig. 1). We focus on this
particular symmetry as learning problems with permu-
tation symmetries abound. Examples include learning
over sets of elements [47,48], modeling relations between
pairs (graphs) [4954] or multiplets (hypergraphs) of enti-
ties [5557], problems defined on grids (such as condensed
matter systems) [5861], molecular systems [6264], evalu-
ating genuine multipartite entanglement [6568], or work-
ing with distributed quantum sensors [6971].
Our first contribution is to provide guidelines to build
unitary Sn-equivariant QNNs. We then derive rigorous
theoretical guarantees for these architectures in terms of
their trainability and generalization capabilities. Specifi-
cally, we prove that Sn-equivariant QNNs do not lead to
barren plateaus, can be overparametrized with polynomi-
ally deep circuits, and generalize well with a only a poly-
nomial number of training points. We also identify prob-
lems (i.e., datasets) for which the model is trainable, but
also datasets leading to untrainability. All these appealing
properties are also demonstrated in numerical simulations
of a graph classification task. Our empirical results verify
our theoretical ones, and even show that the performance
of Sn-equivariant QNNs can, in practice, be better than
that guaranteed by our theorems.
RESULTS
Preliminaries
While the formalism of GQML can be readily applied
to a wide range of tasks with Snsymmetry, here we will
focus on supervised learning problems. We note however
that our results can be readily extended to more general
scenarios such as unsupervised learning [72,73], reinforced
learning [74,75], generative modeling [7679], or to the
more task-oriented computational paradigm of variational
quantum algorithms [63,80].
Generally, a supervised quantum machine learning task
can be phrased in terms of a data space R-a set of quantum
states on some Hilbert space H- and a real-valued label
space Y. We will assume Hto be a tensor product of ntwo-
dimensional subsystems (qubits) and thus of dimension d=
2n. We are given repeated access to a training dataset
S={(ρi, yi)}M
i=1, where ρiis sampled from Raccording
to some probability P, and where yi∈ Y. We further
assume that the labels are assigned by some underlying
(but unknown) function f:R 7→ Y, that is, yi=f(ρi). We
make no assumptions regarding the origins of ρi, meaning
that these can correspond to classical data embedded in
quantum states [81,82], or to quantum data obtained from
some quantum mechanical process [60,61,83].
Figure 1. GQML embeds geometric priors into a
QML model. Incorporating prior knowledge through Sn-
equivariance heavily restricts the search space of the model. We
show that such inductive biases lead to models that do not ex-
hibit barren plateaus, can be efficiently overparametrized, and
require small amounts of data to generalizing well.
The goal is to produce a parametrized function hθ:R 7→
Yclosely modeling the outputs of the unknown target f,
where θare trainable parameters. That is, we want hθto
accurately predict labels for the data in the training set
S(low training error), as well as to predict the labels for
new and previously unseen states (small generalization er-
ror). We will focus on QML models that are parametrized
through a QNN, a unitary channel Uθ:B(H)→ B(H)such
that Uθ(ρ) = U(θ)ρU(θ). Here, B(H)denotes the space
of bounded linear operators in H. Throughout this work
we will restrict to L-layered QNNs
Uθ=UL
θL···U1
θ1,where Ul
θl(ρ) = eiθlHlρeiθlHl,(1)
for some Hermitian generators {Hl}, so that U(θ) =
QL
l=1 eiθlHl. Moreover, we consider models that depend
on a loss function of the form
θ(ρi) = Tr[Uθ(ρi)O],(2)
where Ois a Hermitian observable. We quantify the train-
ing error via the so-called empirical loss, or training error,
which is defined as
b
L(θ) =
M
X
i=1
ciθ(ρi).(3)
3
The model is trained by solving the optimization task
arg minθb
L(θ)[63]. Once a desired convergence in the op-
timization is achieved, the optimal parameters, along with
the loss function θ, are used to predict labels. For the
case of binary classification, where Y={+1,1}, one can
choose ci:= yi
M. Then, if the measurement operator is
normalized such that θ(ρi)[1,1], this corresponds to
the hinge loss, a standard loss function but not the only
relevant one [84]) in machine learning.
We further remark that while Eq. (3) approximates the
error of the learned model, the true loss is defined as
L(θ) = EρP[c(y)θ(ρ)] .(4)
Here we have denoted the weights as c(y)to make their
dependency on the labels yexplicit. The difference be-
tween the true loss and the empirical one, known as the
generalization error, is given by
gen(θ) = |L(θ)ˆ
L(θ)|.(5)
We now turn to GQML, where the first step is identifying
the underlying symmetries of the dataset, as this allows us
to create suitable inductive biases for hθ. In particular,
many problems of interest exhibit so-called label symmetry,
i.e., the function fproduces labels that remain invariant
under a set of operations on the inputs. Concretely, one
can verify that such set of operations forms a group [18],
which leads to the following definition.
Definition 1 (Label symmetries and G-invariance).Given
a compact group Gand some unitary representation Ract-
ing on quantum states ρ, we say fhas a label symmetry if
it is G-invariant, i.e., if
f(R(g)ρR(g)) = f(ρ),gG . (6)
Here, we recall that a representation is a mapping of a
group into the space of invertible linear operators on some
vector space (in this case the space of quantum states)
that preserves the structure of the group [25]. Also, we
note that some problems may have functions fwhose out-
puts change (rather than being invariant) in a way entirely
determined by the action of Gon their inputs. While still
captured by general GQML theory, these do not pertain
to Definition 1and are not discussed further. Label invari-
ance captures the scenario where the relevant information
in ρis unchanged under the action of G.
Evidently, when searching for models hθthat accurately
predict outputs of f, it is natural to restrict our search
to the space of models that respect the label symmetries
of f. In this context, the theory of GQML provides a
constructive approach to create G-invariant models, resting
on the concept of equivariance [23].
Figure 2. Quantum circuit for an Sn-equivariant QNN.
Each layer of the QNN is obtained by exponentiation of a gen-
erator from the set Gin Eq. (9). Here we show a circuit with
L= 3 layers acting on n= 4 qubits. Single-qubit blocks in-
dicate a rotation about the xor yaxis, while two-qubit blocks
denote entangling gates generated by a ZZ interaction. All
colored gates between dashed horizontal lines share the same
trainable parameter θl.
Definition 2 (Equivariance).We say that an observable O
is G-equivariant iff for all elements gG,[O, R(g)] = 0.
We say that a layer Ul
θlof a QNN is G-equivariant iff it is
generated by a G-equivariant Hermitian operator.
By the previous definition, G-equivariant layers are maps
that commute with the action of the group
Ul
θl(R(g)ρR(g)) = R(g)Ul
θl(ρ)R(g).(7)
Definition 2can be naturally extended to QNNs.
Definition 3 (Equivariant QNN).We say that a L-
layered QNN is G-equivariant iff each of its layers is G-
equivariant.
Altogether, equivariant QNNs and measurement opera-
tors provide a recipe to design invariant models, i.e. models
that respect the label symmetries. Akin to their classical
machine learning counterparts [15], such GQML models
consist in a composition of many equivariant operations
(realized by the Llayers of the equivariant QNN) and an
invariant one (realized by the measurement of the equiv-
ariant observable) [23]. Furthermore, model invariance ex-
tends to the loss function itself, as captured by the follow-
ing Lemma.
Lemma 1 (Invariance from equivariance).A loss function
of the form in Eq. (2)is G-invariant if its composed of a
G-equivariant QNN and measurement.
A proof of this Lemma along with that of the follow-
ing Lemmas and Theorems are presented in Supplementary
Methods 2 and 3.
4
Sn-Equivariant QNNs and measurements
In the previous section we have described how to build
generic G-invariant models. We now specialize to the case
where Gis the symmetric group Sn, and where Ris the
qubit-defining representation of Sn, i.e., the one permuting
qubits which for any πSnacts as
R(π)
n
O
i=1 |ψi=
n
O
i=1 ψπ1(i).(8)
Following Definitions 2and 3, the first step towards
building Sn-equivariant QNNs is defining Sn-equivariant
generators for each layer. In the Methods section we de-
scribe how such operators can be obtained, but here we
will restrict our attention to the following set of generators
G=
1
n
n
X
j=1
Xj,1
n
n
X
j=1
Yj,2
n(n1) X
k<j
ZjZk
.(9)
Note that there is some freedom in the choice of genera-
tors. Any two sums over two distinct single qubit Pauli
operators (the first two generators) plus a sum over pairs
of the remaining Pauli operator (the third generator) suf-
fices and we choose the above set without loss of general-
ity. In Fig. 2we show an example of an L= 3 layered
Sn-equivariant QNN acting on n= 4 qubits. While the
single-qubit rotations generated by Gare readily achiev-
able in most quantum computing platforms, the collective
ZZ interactions are best suited to architectures allowing
for reconfigurable connectivity [8587] or platforms that
implement mediated all-to-all interactions [88,89]. In fact,
such interactions are referred to as one-axis twisting [90]
in the context of spin squeezing [91] and form the basis of
many quantum sensing protocols.
In addition, we will consider observables of the following
form
M=
1
n
n
X
j=1
χj,2
n(n1)
n
X
k<j;j=1
χjχk,
n
Y
j=1
χj
,(10)
where χis a (fixed) Pauli matrix. It is straightforward to
see that any Hl∈ G and O∈ M will commute with R(π)
for any πSn. We note that one could certainly consider
other observables as well.
We now leverage tools from representation theory to
understand and unravel the underlying structure of Sn-
equivariant QNNs and measurement operators. The previ-
ous will allow us to derive, in the next section, theoretical
guarantees for these GQML models.
One of the most notable results from representation the-
ory is that a given finite dimensional representation of a
group decomposes into an orthogonal direct sum of fun-
damental building-blocks known as irreducible representa-
tions (irreps). As further explained in the Methods, the
qubit-defining representation takes, under some appropri-
ate global change of basis (which we denote with
=), the
block-diagonal form
R(πSn)
=M
λ
dλ
M
µ=1
rλ(π) = M
λ
rλ(π)11dλ.(11)
Here λlabels the irreps of Snand rλis the correspond-
ing irrep itself, which appears dλtimes. The collection of
these repeated irreps is called an isotypic component. Cru-
cially, the only irreps appearing in Rcorrespond to two-row
Young diagrams (see Methods) and can be parametrized by
a single non-negative integer m, as λλ(m)=(nm, m),
where m= 0,1,...,n
2. It can be shown that
dλ=n2m+ 1 ,and
mλ=n!(n2m+ 1)!
(nm+ 1)!m!(n2m)!
(12)
where again dλis the number of times the irrep appears
and mλis the dimension of the irrep itself. Note that every
dλis in O(n), whereas some mλcan grow exponentially
with the number of qubits. For instance, if nis even and
m=n/2, one finds that mλ= Ω(4n/n2). We finally note
that Eq. (11) implies Pλmλdλ= 2n.
Given the block-diagonal structure of R,Sn-equivariant
unitaries and measurements must necessarily take the form
U(θ)
=M
λ
11mλUλ(θ),and O
=M
λ
11mλOλ.(13)
That is, both U(θ)and Odecompose into a direct sum of
dλ-dimensional blocks repeated mλtimes (with mλcalled
the multiplicity) on each isotypic component λ. This de-
composition is illustrated in Fig. 3.
Let us highlight several crucial implications of the block
diagonal structure arising from Sn-equivariance. First
and foremost, we note that, under the action of an Sn-
equivariant QNN, the Hilbert space decomposes as
H
=M
λ
mλ
M
ν=1 Hν
λ,(14)
where each Hν
λdenotes a dλ-dimensional invariant sub-
space. Moreover, one can also see that when the QNN
acts on an input quantum state as Uθ(ρ) = U(θ)ρU(θ),
it can only access the information in ρwhich is contained
5
Figure 3. Representation theory and Sn-equivariance. Using tools from representation theory we find that the Sn-equivariant
QNN U(θ)and the representation of the group elements R(π)-for any πSn- admit an irrep block decomposition as in Eq. (13)
and Eq. (11), respectively. The irreps can be labeled with a single parameter λ= (nm, m)where m= 0,1,...,n
2. For
a system of n= 5 qubits, we show in a) the block diagonal decomposition for U(θ)and in b) the decomposition of R(π)as a
representation of S5. The dashed boxes denote the isotypic components labeled by λ. c) As nincreases, U(θ)has a block diagonal
decomposition which contains polynomially large blocks repeated a (potentially) exponential number of times. In contrast, the
block decomposition of R(π)(for any πSn) contains blocks that can be exponentially large but that are only repeated a
polynomial number of times.
Figure 4. Tetrahedral numbers. a) The Tetrahedral num-
bers Tenare obtained by counting how many spheres can be
stacked in the configuration of a tetrahedron (triangular base
pyramid) of height n. b) One can also compute Tenas the sum
of consecutive triangular numbers, which count how many ob-
jects (e.g., spheres) can be arranged in an equilateral triangle.
in the invariant subspaces Hν
λ(see also [23]). This means
that to solve the learning task, we require two ingredients:
i) the data must encode the relevant information required
for classification into these subspaces [23,25], and ii) the
QNN must be able to accurately process the information
within each Hν
λ. As discussed in the Methods, we can guar-
antee that the second condition will not be an issue, as the
set of generators in Eq. (9) is universal within each invari-
ant subspace, i.e., the QNN can map any state in Hν
λto
any other state in Hν
λ(see also Ref. [92]).
A second fundamental implication of Eq.(13) is that the
manifold of equivariant unitaries is of low-dimension. We
make this explicit in the following lemma.
Lemma 2 (Dimension of Sn-equivariant unitaries).The
submanifold of Snequivariant unitaries is of dimension
equal to the Tetrahedral numbers Ten+1 =n+3
3(see
Fig. 4), and therefore on the order of Θ(n3).
Crucially, Lemma 2shows that the equivariance con-
straint limits the degrees of freedom in the QNN (and con-
comitantly in any observable) from 4nto only polynomially
many.
Absence of barren plateaus in Sn-equivariant QNNs
Barren plateaus have been recognized as one of the main
challenges to overcome in order to guarantee the success of
QML models using QNNs [16]. When a model exhibits
a barren plateau, the loss landscape becomes, on average,
exponentially flat and featureless as the problem size in-
creases [3345]. This severely impedes its trainability, as
one needs to spend an exponentially large amount of re-
sources to correctly estimate a loss minimizing direction.
Issues of barren plateaus arise primarily due to the struc-
ture of the models (including the choice of QNN, the input
state and the observables) employed [3343,45] but can
also be caused solely by effects of noise [44]. In the rest
of this section, we will only be concerned with the former
type of barren plateaus, that is the most studied.
Recently, a great deal of effort has been put forward to-
wards creating strategies capable of mitigating the effect
of barren plateaus [78,93105]. While these are promis-
ing and have shown moderate success, the ‘holy grail’
is identifying architectures which are immune to barren
plateaus altogether, and thus enjoy trainability guaran-
tees. Examples of such architectures are shallow hardware
摘要:

TheoreticalGuaranteesforPermutation-EquivariantQuantumNeuralNetworksLouisSchatzki,1,2,∗MartínLarocca,3,4,†QuynhT.Nguyen,3,5FrédéricSauvage,3andM.Cerezo1,‡1InformationSciences,LosAlamosNationalLaboratory,LosAlamos,NewMexico87545,USA2ElectricalandComputerEngineering,UniversityofIllinoisUrbana-Champaig...

展开>> 收起<<
Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks Louis Schatzki1 2Martín Larocca3 4Quynh T. Nguyen3 5Frédéric Sauvage3and M. Cerezo1 1Information Sciences Los Alamos National Laboratory Los Alamos New Mexico 87545 USA.pdf

共41页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:41 页 大小:1.37MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 41
客服
关注