Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks Louis Schatzki1 2Martín Larocca3 4Quynh T. Nguyen3 5Frédéric Sauvage3and M. Cerezo1 1Information Sciences Los Alamos National Laboratory Los Alamos New Mexico 87545 USA

2025-05-06 0 0 1.37MB 41 页 10玖币

侵权投诉

Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks

Louis Schatzki,1, 2, ∗Martín Larocca,3, 4, †Quynh T. Nguyen,3, 5 Frédéric Sauvage,3and M. Cerezo1, ‡

1Information Sciences, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA

2Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, USA

3Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA

4Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA

5Harvard Quantum Initiative, Harvard University, Cambridge, Massachusetts 02138, USA

Despite the great promise of quantum machine learning models, there are several challenges

one must overcome before unlocking their full potential. For instance, models based on quantum

neural networks (QNNs) can suﬀer from excessive local minima and barren plateaus in their training

landscapes. Recently, the nascent ﬁeld of geometric quantum machine learning (GQML) has emerged

as a potential solution to some of those issues. The key insight of GQML is that one should design

architectures, such as equivariant QNNs, encoding the symmetries of the problem at hand. Here,

we focus on problems with permutation symmetry (i.e., symmetry group Sn), and show how to

build Sn-equivariant QNNs. We provide an analytical study of their performance, proving that

they do not suﬀer from barren plateaus, quickly reach overparametrization, and generalize well from

small amounts of data. To verify our results, we perform numerical simulations for a graph state

classiﬁcation task. Our work provides theoretical guarantees for equivariant QNNs, thus indicating

the power and potential of GQML.

INTRODUCTION

Symmetry studies and formalizes the invariance of ob-

jects under some set of operations. A wealth of theory has

gone into describing symmetries as mathematical entities

through the concept of groups and representations. While

the analysis of symmetries in nature has greatly improved

our understanding of the laws of physics, the study of sym-

metries in data has just recently gained momentum within

the framework of learning theory. In the past few years,

classical machine learning practitioners realized that mod-

els tend to perform better when constrained to respect the

underlying symmetries of the data. This has led to the

blossoming ﬁeld of geometric deep learning [1–5], where

symmetries are incorporated as geometric priors into the

learning architectures, improving trainability and general-

ization performance [6–13].

The tremendous success of geometric deep learning has

recently inspired researchers to import these ideas to the

realm of quantum machine learning (QML) [14–16]. QML

is a new and exciting ﬁeld at the intersection of classi-

cal machine learning, and quantum computing. By run-

ning routines in quantum hardware, and thus exploiting

the exponentially large dimension of the Hilbert space, the

hope is that QML algorithms can outperform their classical

counterparts when learning from data [17].

∗louisms2@illinois.edu

†The two ﬁrst authors contributed equally.

‡cerezo@lanl.gov

The infusion of ideas from geometric deep learning

to QML has been termed "geometric quantum machine

learning"(GQML) [18–24]. GQML leverages the ma-

chinery of group and representation theory [25] to build

quantum architectures that encode symmetry information

about the problem at hand. For instance, when the

model is parametrized through a quantum neural network

(QNN) [16,26–28], GQML indicates that the layers of the

QNN should be equivariant under the action of the sym-

metry group associated to the dataset. That is, applying a

symmetry transformation on the input to the QNN layers

should be the same as applying it to its output.

One of the main goals of GQML is to create architec-

tures that solve, or at least signiﬁcantly mitigate, some

of the known issues of standard symmetry non-preserving

QML models [16]. For instance, it has been shown that

the optimization landscapes of generic QNNs can exhibit

a large number of local minima [29–32], or be prone to

the barren plateau phenomenon [33–45] whereby the loss

function gradients vanish exponentially with the problem

size. Crucially, it is known that barren plateaus and

excessive local minima are connected to the expressibil-

ity [30,32,37,43,46] of the QNN, so that problem-agnostic

architectures are more likely to exhibit trainability issues.

In this sense, it is expected that following the GQML pro-

gram of baking symmetry directly into the algorithm, will

lead to models with sharp inductive biases that suitably

limit their expressibility and search space.

In this work we leverage the GQML toolbox to cre-

ate models that are permutation invariant, i.e., mod-

els whose outputs remain invariant under the action of

arXiv:2210.09974v3 [quant-ph] 14 Feb 2024

the symmetric group Sn(see Fig. 1). We focus on this

particular symmetry as learning problems with permu-

tation symmetries abound. Examples include learning

over sets of elements [47,48], modeling relations between

pairs (graphs) [49–54] or multiplets (hypergraphs) of enti-

ties [55–57], problems deﬁned on grids (such as condensed

matter systems) [58–61], molecular systems [62–64], evalu-

ating genuine multipartite entanglement [65–68], or work-

ing with distributed quantum sensors [69–71].

Our ﬁrst contribution is to provide guidelines to build

unitary Sn-equivariant QNNs. We then derive rigorous

theoretical guarantees for these architectures in terms of

their trainability and generalization capabilities. Speciﬁ-

cally, we prove that Sn-equivariant QNNs do not lead to

barren plateaus, can be overparametrized with polynomi-

ally deep circuits, and generalize well with a only a poly-

nomial number of training points. We also identify prob-

lems (i.e., datasets) for which the model is trainable, but

also datasets leading to untrainability. All these appealing

properties are also demonstrated in numerical simulations

of a graph classiﬁcation task. Our empirical results verify

our theoretical ones, and even show that the performance

of Sn-equivariant QNNs can, in practice, be better than

that guaranteed by our theorems.

RESULTS

Preliminaries

While the formalism of GQML can be readily applied

to a wide range of tasks with Snsymmetry, here we will

focus on supervised learning problems. We note however

that our results can be readily extended to more general

scenarios such as unsupervised learning [72,73], reinforced

learning [74,75], generative modeling [76–79], or to the

more task-oriented computational paradigm of variational

quantum algorithms [63,80].

Generally, a supervised quantum machine learning task

can be phrased in terms of a data space R-a set of quantum

states on some Hilbert space H- and a real-valued label

space Y. We will assume Hto be a tensor product of ntwo-

dimensional subsystems (qubits) and thus of dimension d=

2n. We are given repeated access to a training dataset

S={(ρi, yi)}M

i=1, where ρiis sampled from Raccording

to some probability P, and where yi∈ Y. We further

assume that the labels are assigned by some underlying

(but unknown) function f:R 7→ Y, that is, yi=f(ρi). We

make no assumptions regarding the origins of ρi, meaning

that these can correspond to classical data embedded in

quantum states [81,82], or to quantum data obtained from

some quantum mechanical process [60,61,83].

Figure 1. GQML embeds geometric priors into a

QML model. Incorporating prior knowledge through Sn-

equivariance heavily restricts the search space of the model. We

show that such inductive biases lead to models that do not ex-

hibit barren plateaus, can be eﬃciently overparametrized, and

require small amounts of data to generalizing well.

The goal is to produce a parametrized function hθ:R 7→

Yclosely modeling the outputs of the unknown target f,

where θare trainable parameters. That is, we want hθto

accurately predict labels for the data in the training set

S(low training error), as well as to predict the labels for

new and previously unseen states (small generalization er-

ror). We will focus on QML models that are parametrized

through a QNN, a unitary channel Uθ:B(H)→ B(H)such

that Uθ(ρ) = U(θ)ρU(θ)†. Here, B(H)denotes the space

of bounded linear operators in H. Throughout this work

we will restrict to L-layered QNNs

Uθ=UL

θL◦···◦U1

θ1,where Ul

θl(ρ) = e−iθlHlρeiθlHl,(1)

for some Hermitian generators {Hl}, so that U(θ) =

l=1 e−iθlHl. Moreover, we consider models that depend

on a loss function of the form

ℓθ(ρi) = Tr[Uθ(ρi)O],(2)

where Ois a Hermitian observable. We quantify the train-

ing error via the so-called empirical loss, or training error,

which is deﬁned as

L(θ) =

i=1

ciℓθ(ρi).(3)

The model is trained by solving the optimization task

arg minθb

L(θ)[63]. Once a desired convergence in the op-

timization is achieved, the optimal parameters, along with

the loss function ℓθ, are used to predict labels. For the

case of binary classiﬁcation, where Y={+1,−1}, one can

choose ci:= −yi

M. Then, if the measurement operator is

normalized such that ℓθ(ρi)∈[−1,1], this corresponds to

the hinge loss, a standard loss function but not the only

relevant one [84]) in machine learning.

We further remark that while Eq. (3) approximates the

error of the learned model, the true loss is deﬁned as

L(θ) = Eρ∼P[c(y)ℓθ(ρ)] .(4)

Here we have denoted the weights as c(y)to make their

dependency on the labels yexplicit. The diﬀerence be-

tween the true loss and the empirical one, known as the

generalization error, is given by

gen(θ) = |L(θ)−ˆ

L(θ)|.(5)

We now turn to GQML, where the ﬁrst step is identifying

the underlying symmetries of the dataset, as this allows us

to create suitable inductive biases for hθ. In particular,

many problems of interest exhibit so-called label symmetry,

i.e., the function fproduces labels that remain invariant

under a set of operations on the inputs. Concretely, one

can verify that such set of operations forms a group [18],

which leads to the following deﬁnition.

Deﬁnition 1 (Label symmetries and G-invariance).Given

a compact group Gand some unitary representation Ract-

ing on quantum states ρ, we say fhas a label symmetry if

it is G-invariant, i.e., if

f(R(g)ρR(g)†) = f(ρ),∀g∈G . (6)

Here, we recall that a representation is a mapping of a

group into the space of invertible linear operators on some

vector space (in this case the space of quantum states)

that preserves the structure of the group [25]. Also, we

note that some problems may have functions fwhose out-

puts change (rather than being invariant) in a way entirely

determined by the action of Gon their inputs. While still

captured by general GQML theory, these do not pertain

to Deﬁnition 1and are not discussed further. Label invari-

ance captures the scenario where the relevant information

in ρis unchanged under the action of G.

Evidently, when searching for models hθthat accurately

predict outputs of f, it is natural to restrict our search

to the space of models that respect the label symmetries

of f. In this context, the theory of GQML provides a

constructive approach to create G-invariant models, resting

on the concept of equivariance [23].

Figure 2. Quantum circuit for an Sn-equivariant QNN.

Each layer of the QNN is obtained by exponentiation of a gen-

erator from the set Gin Eq. (9). Here we show a circuit with

L= 3 layers acting on n= 4 qubits. Single-qubit blocks in-

dicate a rotation about the xor yaxis, while two-qubit blocks

denote entangling gates generated by a ZZ interaction. All

colored gates between dashed horizontal lines share the same

trainable parameter θl.

Deﬁnition 2 (Equivariance).We say that an observable O

is G-equivariant iﬀ for all elements g∈G,[O, R(g)] = 0.

We say that a layer Ul

θlof a QNN is G-equivariant iﬀ it is

generated by a G-equivariant Hermitian operator.

By the previous deﬁnition, G-equivariant layers are maps

that commute with the action of the group

θl(R(g)ρR(g)†) = R(g)Ul

θl(ρ)R(g)†.(7)

Deﬁnition 2can be naturally extended to QNNs.

Deﬁnition 3 (Equivariant QNN).We say that a L-

layered QNN is G-equivariant iﬀ each of its layers is G-

equivariant.

Altogether, equivariant QNNs and measurement opera-

tors provide a recipe to design invariant models, i.e. models

that respect the label symmetries. Akin to their classical

machine learning counterparts [1–5], such GQML models

consist in a composition of many equivariant operations

(realized by the Llayers of the equivariant QNN) and an

invariant one (realized by the measurement of the equiv-

ariant observable) [23]. Furthermore, model invariance ex-

tends to the loss function itself, as captured by the follow-

ing Lemma.

Lemma 1 (Invariance from equivariance).A loss function

of the form in Eq. (2)is G-invariant if its composed of a

G-equivariant QNN and measurement.

A proof of this Lemma along with that of the follow-

ing Lemmas and Theorems are presented in Supplementary

Methods 2 and 3.

Sn-Equivariant QNNs and measurements

In the previous section we have described how to build

generic G-invariant models. We now specialize to the case

where Gis the symmetric group Sn, and where Ris the

qubit-deﬁning representation of Sn, i.e., the one permuting

qubits which for any π∈Snacts as

R(π)

i=1 |ψi⟩=

i=1 ψπ−1(i).(8)

Following Deﬁnitions 2and 3, the ﬁrst step towards

building Sn-equivariant QNNs is deﬁning Sn-equivariant

generators for each layer. In the Methods section we de-

scribe how such operators can be obtained, but here we

will restrict our attention to the following set of generators

G=





j=1

Xj,1

j=1

Yj,2

n(n−1) X

k<j

ZjZk



.(9)

Note that there is some freedom in the choice of genera-

tors. Any two sums over two distinct single qubit Pauli

operators (the ﬁrst two generators) plus a sum over pairs

of the remaining Pauli operator (the third generator) suf-

ﬁces and we choose the above set without loss of general-

ity. In Fig. 2we show an example of an L= 3 layered

Sn-equivariant QNN acting on n= 4 qubits. While the

single-qubit rotations generated by Gare readily achiev-

able in most quantum computing platforms, the collective

ZZ interactions are best suited to architectures allowing

for reconﬁgurable connectivity [85–87] or platforms that

implement mediated all-to-all interactions [88,89]. In fact,

such interactions are referred to as one-axis twisting [90]

in the context of spin squeezing [91] and form the basis of

many quantum sensing protocols.

In addition, we will consider observables of the following

form

M=





j=1

χj,2

n(n−1)

k<j;j=1

χjχk,

j=1

χj



,(10)

where χis a (ﬁxed) Pauli matrix. It is straightforward to

see that any Hl∈ G and O∈ M will commute with R(π)

for any π∈Sn. We note that one could certainly consider

other observables as well.

We now leverage tools from representation theory to

understand and unravel the underlying structure of Sn-

equivariant QNNs and measurement operators. The previ-

ous will allow us to derive, in the next section, theoretical

guarantees for these GQML models.

One of the most notable results from representation the-

ory is that a given ﬁnite dimensional representation of a

group decomposes into an orthogonal direct sum of fun-

damental building-blocks known as irreducible representa-

tions (irreps). As further explained in the Methods, the

qubit-deﬁning representation takes, under some appropri-

ate global change of basis (which we denote with ∼

=), the

block-diagonal form

R(π∈Sn)∼

dλ

µ=1

rλ(π) = M

rλ(π)⊗11dλ.(11)

Here λlabels the irreps of Snand rλis the correspond-

ing irrep itself, which appears dλtimes. The collection of

these repeated irreps is called an isotypic component. Cru-

cially, the only irreps appearing in Rcorrespond to two-row

Young diagrams (see Methods) and can be parametrized by

a single non-negative integer m, as λ≡λ(m)=(n−m, m),

where m= 0,1,...,⌊n

2⌋. It can be shown that

dλ=n−2m+ 1 ,and

mλ=n!(n−2m+ 1)!

(n−m+ 1)!m!(n−2m)!

(12)

where again dλis the number of times the irrep appears

and mλis the dimension of the irrep itself. Note that every

dλis in O(n), whereas some mλcan grow exponentially

with the number of qubits. For instance, if nis even and

m=n/2, one ﬁnds that mλ= Ω(4n/n2). We ﬁnally note

that Eq. (11) implies Pλmλdλ= 2n.

Given the block-diagonal structure of R,Sn-equivariant

unitaries and measurements must necessarily take the form

U(θ)∼

11mλ⊗Uλ(θ),and O∼

11mλ⊗Oλ.(13)

That is, both U(θ)and Odecompose into a direct sum of

dλ-dimensional blocks repeated mλtimes (with mλcalled

the multiplicity) on each isotypic component λ. This de-

composition is illustrated in Fig. 3.

Let us highlight several crucial implications of the block

diagonal structure arising from Sn-equivariance. First

and foremost, we note that, under the action of an Sn-

equivariant QNN, the Hilbert space decomposes as

H∼

mλ

ν=1 Hν

λ,(14)

where each Hν

λdenotes a dλ-dimensional invariant sub-

space. Moreover, one can also see that when the QNN

acts on an input quantum state as Uθ(ρ) = U(θ)ρU(θ†),

it can only access the information in ρwhich is contained

Figure 3. Representation theory and Sn-equivariance. Using tools from representation theory we ﬁnd that the Sn-equivariant

QNN U(θ)and the representation of the group elements R(π)-for any π∈Sn- admit an irrep block decomposition as in Eq. (13)

and Eq. (11), respectively. The irreps can be labeled with a single parameter λ= (n−m, m)where m= 0,1,...,⌊n

2⌋. For

a system of n= 5 qubits, we show in a) the block diagonal decomposition for U(θ)and in b) the decomposition of R(π)as a

representation of S5. The dashed boxes denote the isotypic components labeled by λ. c) As nincreases, U(θ)has a block diagonal

decomposition which contains polynomially large blocks repeated a (potentially) exponential number of times. In contrast, the

block decomposition of R(π)(for any π∈Sn) contains blocks that can be exponentially large but that are only repeated a

polynomial number of times.

Figure 4. Tetrahedral numbers. a) The Tetrahedral num-

bers Tenare obtained by counting how many spheres can be

stacked in the conﬁguration of a tetrahedron (triangular base

pyramid) of height n. b) One can also compute Tenas the sum

of consecutive triangular numbers, which count how many ob-

jects (e.g., spheres) can be arranged in an equilateral triangle.

in the invariant subspaces Hν

λ(see also [23]). This means

that to solve the learning task, we require two ingredients:

i) the data must encode the relevant information required

for classiﬁcation into these subspaces [23,25], and ii) the

QNN must be able to accurately process the information

within each Hν

λ. As discussed in the Methods, we can guar-

antee that the second condition will not be an issue, as the

set of generators in Eq. (9) is universal within each invari-

ant subspace, i.e., the QNN can map any state in Hν

λto

any other state in Hν

λ(see also Ref. [92]).

A second fundamental implication of Eq.(13) is that the

manifold of equivariant unitaries is of low-dimension. We

make this explicit in the following lemma.

Lemma 2 (Dimension of Sn-equivariant unitaries).The

submanifold of Sn−equivariant unitaries is of dimension

equal to the Tetrahedral numbers Ten+1 =n+3

3(see

Fig. 4), and therefore on the order of Θ(n3).

Crucially, Lemma 2shows that the equivariance con-

straint limits the degrees of freedom in the QNN (and con-

comitantly in any observable) from 4nto only polynomially

many.

Absence of barren plateaus in Sn-equivariant QNNs

Barren plateaus have been recognized as one of the main

challenges to overcome in order to guarantee the success of

QML models using QNNs [16]. When a model exhibits

a barren plateau, the loss landscape becomes, on average,

exponentially ﬂat and featureless as the problem size in-

creases [33–45]. This severely impedes its trainability, as

one needs to spend an exponentially large amount of re-

sources to correctly estimate a loss minimizing direction.

Issues of barren plateaus arise primarily due to the struc-

ture of the models (including the choice of QNN, the input

state and the observables) employed [33–43,45] but can

also be caused solely by eﬀects of noise [44]. In the rest

of this section, we will only be concerned with the former

type of barren plateaus, that is the most studied.

Recently, a great deal of eﬀort has been put forward to-

wards creating strategies capable of mitigating the eﬀect

of barren plateaus [78,93–105]. While these are promis-

ing and have shown moderate success, the ‘holy grail’

is identifying architectures which are immune to barren

plateaus altogether, and thus enjoy trainability guaran-

tees. Examples of such architectures are shallow hardware

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TheoreticalGuaranteesforPermutation-EquivariantQuantumNeuralNetworksLouisSchatzki,1,2,∗MartínLarocca,3,4,†QuynhT.Nguyen,3,5FrédéricSauvage,3andM.Cerezo1,‡1InformationSciences,LosAlamosNationalLaboratory,LosAlamos,NewMexico87545,USA2ElectricalandComputerEngineering,UniversityofIllinoisUrbana-Champaig...

展开>> 收起<<

Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks Louis Schatzki1 2Martín Larocca3 4Quynh T. Nguyen3 5Frédéric Sauvage3and M. Cerezo1 1Information Sciences Los Alamos National Laboratory Los Alamos New Mexico 87545 USA.pdf

共41页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks Louis Schatzki1 2Martín Larocca3 4Quynh T. Nguyen3 5Frédéric Sauvage3and M. Cerezo1 1Information Sciences Los Alamos National Laboratory Los Alamos New Mexico 87545 USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: