Interpreting convolutional neural networks low dimensional approximation to quantum spin systems Yilong Ju1yShah Saad Alam2yJonathan Minoff2

2025-05-06 0 0 1.09MB 27 页 10玖币

侵权投诉

Interpreting convolutional neural networks’ low dimensional

approximation to quantum spin systems

Yilong Ju,1,†Shah Saad Alam,2,†Jonathan Minoff2,

Fabio Anselmi3,4, Han Pu2∗, Ankit Patel1,5∗

1Department of Computer Science, Rice University, 6100 Main St., Houston, TX 77005, USA

2Department of Physics and Astronomy, Rice University, 6100 Main St., Houston, TX 77005, USA

3Center for Neuroscience and Artiﬁcial Intelligence, Baylor College of Medicine, Houston

4Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA

†Equal contribution.

∗To whom correspondence should be addressed; E-mail: hpu@rice.edu, abp4@rice.edu

Convolutional neural networks (CNNs) have been employed along with Vari- ational Monte

Carlo methods for ﬁnding the ground state of quantum many- body spin systems with great

success. In order to do so, however, a CNN with only linearly many variational parameters has

to circumvent the “curse of dimensionality” and successfully approximate a wavefunction on

an exponentially large Hilbert space. In our work, we provide a theoretical and experimental

analysis of how the CNN optimizes learning for spin systems, and investigate the CNN’s low

dimensional approximation. We ﬁrst quantify the role played by physical symmetries of the un-

derlying spin system during training. We incorporate our insights into a new training algorithm

and demonstrate its improved efﬁciency, accuracy and robustness. We then further investigate

the CNN’s ability to approximate wavefunctions by looking at the entanglement spectrum cap-

tured by the size of the convolutional ﬁlter. Our insights reveal the CNN to be an ansatz fun-

damentally centred around the occurrence statistics of K-motifs of the input strings. We use

this motivation to provide the shallow CNN ansatz with a unifying theoretical interpretation

in terms of other well-known statistical and physical ansatzes such as the maximum entropy

(MaxEnt) and entangled plaquette correlator product states (EP-CPS). Using regression anal-

ysis, we ﬁnd further relationships between the CNN’s approximations of the different motifs’

expectation values. Our results allow us to gain a comprehensive, improved understanding of

how CNNs successfully approximate quantum spin Hamiltonians and to use that understanding

to improve CNN performance.

Contents

1 Introduction 2

2 The CNN Ansatz 4

2.1 CNNArchitectureandTraining ................................... 4

2.2 Symmetries Reduce the Complexity of Ground State Wavefunction . . . . . . . . . . . . . . . . . 4

2.3 Representing and Approximating Symmetries with CNNs . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Improving CNN Performance by Imposing Symmetry Constraints . . . . . . . . . . . . . . . . . 6

arXiv:2210.00692v1 [quant-ph] 3 Oct 2022

3 When does the Ground State Admit Parsimonious Approximations? 7

4 Statistical & Physical Interpretations of the CNN Ansatz 9

4.1 Motivation............................................... 9

4.2 CNN as a Maximum Entropy (MaxEnt) Ansatz . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.3 CNN as an Entangled Plaquette Correlator Product State (EP-CPS) Ansatz . . . . . . . . . . . . 10

4.4 Numerical Evidence: CNNs Behave like Restricted CPS Approximations . . . . . . . . . . . . . 10

5 Physical Insights from Learned CNNs 11

6 Discussion 12

A Representation, Dynamics and Inductive Bias of CNN 14

A.1 Derivation of Motif Count Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

A.2 The Motif Count Matrix and Critical Kernel Size . . . . . . . . . . . . . . . . . . . . . . . . . . 14

A.3 The Need for Nonlinear Activation Function in CNN . . . . . . . . . . . . . . . . . . . . . . . . 15

A.4 TheGrandSumCondition ...................................... 16

A.5 Algorithms For Improving Training using the Grand Sum Condition . . . . . . . . . . . . . . . . 19

A.6 LearningDynamicsofCNN ..................................... 19

B Equivalence Classes and How to Count Them 23

C Derivation of MaxEnt Ansatz with Symmetries 23

D Entanglement Calculation Derivation and Errors 25

E Regression Analysis 25

F Hyperparameters and Tuning 26

F.1 ForResultsShowninFig.3...................................... 26

F.2 For Results Shown in Fig. 5, Table 2 and Table 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1 Introduction

The central concern of quantum many-body system is to understand how macroscopic properties emerge from

microscopic inter-particle interactions. However, this is in general an extremely difﬁcult question to answer

due largely to the fact that the dimension of the quantum Hilbert space grows exponentially as the number of

constituent particles increases. Ingenious numerical techniques have been developed to study certain classes

of many-body systems. In recent years, techniques inspired by machine learning, speciﬁcally neural networks

(NNs), have attracted much attention. In particular, Convolutional Neural Networks (CNNs), augmented with

quantum Monte Carlo methods, have arisen recently as a potent class of variational ansatzes for numerically

solving quantum spin systems with many particles (1–4). CNNs have often provided rapid and quite accurate

numerical approximations, comparable to the traditional algorithms that exist in quantum physics. As a result,

there has been a ﬂurry of research to improve the performance of these models and to apply them to broader classes

of quantum spin systems with different physical constraints. However, the exact approximations and methods

used by the CNNs remain a mystery, with the CNNs effectively remaining mostly as black boxes. Indeed, this is

a general problem for applications involving NNs, which has prevented us from being able to interpret the NN’s

solution and to extract useful physical insights about quantum systems under study. As a result, there is a lack of

clear understanding on the full potential of machine learning on quantum research.

In this work, we take a crucial step in ﬁlling this gap. Speciﬁcally, we aim to give new insights into how even

a simple, one-hidden-layer CNN provides a solution to a quantum spin problem. We show how physical features,

such as symmetries of the quantum spin system, naturally manifest themselves in the ﬁnal trained network and

during the optimization dynamics. We show the constraints these symmetries place on the variational parameters,

and we use these insights to construct a more efﬁcient, accurate and robust training algorithm for CNNs. To further

understand why the CNN is so adept at sufﬁciently approximating the system using linearly many parameters, we

interpret the convolutional operation in terms of the degree of quantum entanglement captured by the CNN ansatz.

Next, to interpret the advantages conferred by the mathematical form of the CNN, we provide a mapping of the

CNN to other statistical and physical ansatzes such as Maximum Entropy (MaxEnt) and Correlator Product States

(CPS). We also conduct a novel multivariate regression analysis to uncover which physical features are the most

relevant to the low-dimensional learned solution and which ones the CNN captures correctly. Finally, we discuss

how our approach and new insights can be used to design efﬁcient approximations of complicated quantum spin

systems.

Figure 1: A: The CNN architecture for a system of Nsites with M= 2 internal states. It has 1 ﬁlter and 1

convolutional layer with kernel size K= 4, a ReLU activation function σ(·)and uses the cyclic padding. The

one-hot encoding is used for each input state sfrom the net Sz= 0 manifold. B: Learned ln ψ(s)at different

iterations. C: Motif count matrix in the case N= 8,M= 2, and K= 4. All MK= 24= 16 motifs are labeled

for each row and all N

N/M=8

4= 70 states for each column. The counts are labeled with colors with larger

motif counts having brighter colors. States are color-coded by the equivalence classes they belong to.

2 The CNN Ansatz

2.1 CNN Architecture and Training

For our choice of a physical toy system, we pick the 1-dimensional Sutherland model with periodic boundary

conditions and Hamiltonian

i=1

Pi,i+1 (1)

where Pi,i+1 is the operator exchanging the particles at positions iand i+ 1, and the Nparticles are evenly dis-

tributed among Mdifferent species. For M= 2, this system reduces to the antiferromagnetic spin-1/2 Heisenberg

model. The reason we choose this Hamiltonian is twofold. First, it is simple enough that we can benchmark the

CNN’s solution by comparing its energy to the exact value given by the Bethe ansatz (5). Second, it is complex

enough that the exact solution consists of O(MN)unique numbers, whereas the CNN only has O(N)variational

parameters to work with. In order to succeed, the CNN must ﬁnd a way to efﬁciently represent an approximation

to the exact solution, and we seek to understand the nature of this approximation.

To investigate the physics as simply as possible, we start with a basic CNN with a single convolutional

layer followed by a fully connected layer (see Fig. 1A). The inputs to this CNN are the spin conﬁgurations

s={s1, s2, ..., sN}and output is ln ψ(s), where ψ(s)is the wavefunction at sparametrized as:

ln ψCNN(s) = v

i=1

σ(w·si:i+K−1+b),∀s∈ SN,M ,(2)

where σis the ReLU non-linearity, w∈RKis a convolutional ﬁlter of size K,b∈Ris a scalar bias, v∈Ris a

scalar weight, and si:i+K−1is the substring of sof length Kstarting at index i. Since the Sutherland model does

not allow for changes in total magnetization, we have restricted our input spin conﬁgurations sto have zero net

magnetization i.e. s∈ SM,N . We note in passing that, for this particular problem, a nonlinear activation function

is required for preventing the CNN from producing constant outputs (see Sec. A.3 for proof).

Interestingly, if we combine the training results reported in Fig. 1B with the strings shown in Fig. 1C which

have the same color as the bars in Fig. 1B, we can see that a pattern emerges: certain strings shave very similar

ln ψ(s)to each other. On further inspection, we see that the states that have similar ln ψ(s)are the ones that are

connected to each other by a combination of symmetry operations of the Hamiltonian: translations, reﬂections

around any point, and permutations of the spin labels. Essentially, the CNN efﬁciently captures the symmetry

constraints of the target function after training. Our goal is to see how these symmetries in the target function

manifest within the CNN’s variational parameters itself.

As mentioned earlier, the CNN cannot directly ‘see’ the full input string sof size N; instead it gleans infor-

mation about sindirectly through substrings s0of size Kthat it can ‘see’ directly via the convolution operation.

We call these substrings K-motifs. In order to learn about the global symmetries of the Hamiltonian, the CNN

must somehow glean this information using only the frequency and occurrences of the K-motifs, which we can

visualize via a motif count matrix shown in Fig. 1C (see Sec. A.2 for a mathematical deﬁnition). As we will see

later, motifs are the key to understanding why a low-dimensional approximation to the ground state exists, and

why the CNN is particularly suited for this task. Before giving a detailed explanation, we ﬁrst turn our attention

to how the symmetries of the problem appear within the CNN.

2.2 Symmetries Reduce the Complexity of Ground State Wavefunction

In our quest to understand the CNN’s approximation, we start looking into the role of symmetries in decreasing

the complexity of the target ground-state wavefunction. The Sutherland Hamiltonian is invariant under three

symmetries that are commonly found in physics: translation, reﬂection, and SU(M) rotations among the Mtypes

of particles. Let Gdenote the symmetry group generated by all of these symmetries. It follows that the unique,

6 8 10 12 14 16

# of Equivalence Classes

6 8 10 12 14 16

K∗

K=N/2

Figure 2: Left: Number of equivalence classes required to form a 99% accurate approximation of the exact ground

state wavefunction. The exponential scaling of this quantity demonstrates that ﬁnding the right approximation is

not a trivial task. Right: Critical kernel size K∗vs. Nin the case of M= 2. We empirically observe a growth

rate slightly faster than O(N).

nondegenerate ground state must also obey these symmetries in G. In the uncoupled spin basis, the positive

deﬁniteness of the wavefunction allows the SU(M) symmetry to be reduced to an SMsymmetry deﬁned by simply

permuting the Mdifferent particle labels. Due to these symmetries, the target function the CNN must learn has

unique values only in a quotient space, a subspace of the ambient Hilbert space, since ψGS(gs) = ψGS(s), for

all g∈ G. The symmetries partition the Hilbert space Hinto equivalence classes of symmetric states E ≡ H

mod G. The number of equivalence classes |E| can be computed exactly for small N, and for large N, a lower

bound is given by |E| ≥ N!

2NM!((N/M)!)M∼O(MN)(see Supplement B). We thus ﬁnd that the symmetries reduce

the complexity of the target function the CNN is required to learn, but its complexity is still exponential in N

assuming a constant number of parameters needed per equivalence class.

2.3 Representing and Approximating Symmetries with CNNs

And yet the success of CNNs is proof that a more parsimonious approximation does indeed exist. Might the

CNN be learning by only selectively approximating some more important equivalence classes, or perhaps by

taking advantage of strong dependencies between equivalence classes? While the exact ground state wavefunction

consists of exponentially many amplitudes, a small subset of equivalence classes might account for the majority of

the probability mass. To test this, we computed the minimum number of equivalence classes required to achieve

a 99% of the cumulative ground state probability as a function of system size N. Fig. 2 (Left) shows that this still

scales exponentially with N, implying that complexity reduction due to symmetry constraints cannot fully explain

how a CNN can achieve high accuracy with only polynomially many O(K)variational parameters. Furthermore

it is not immediately clear why this dimensionality reduction is even physically possible. We will revisit this issue

later on.

We next turn to the question of how the CNN learns and represents the symmetries of the Sutherland Hamil-

tonian. The equivalence classes are the result of global symmetries of the Hamiltonian which are only manifest

in the full spin conﬁguration or string of length N. However, as we saw before, the CNN cannot ’see’ the full

length Nstring, but rather can only ‘see’ substrings of length Kvia a convolutional ﬁlter of size K < N. How

large must Kbe in order to learn an accurate approximation? Supp. Sec. A.2 shows that in order to distinguish

all equivalence classes we need ﬁlters of size at least K > N/3.

However, the key insight lies in the motif count matrix. The rank of this matrix exactly equals the number of

basis states that the CNN is able to distinguish, so in order to differentiate all equivalence classes, the rank of the

motif count matrix must be at least as large as the number of equivalence classes. We deﬁne K∗as the minimum

value of Kthat satisﬁes such condition. We show its growth vs. Nin Fig. 2 (Right). For M= 2 and K < N/2,

the rank of the motif count matrix is equal to 2K−1, which establishes a connection between the equivalence

classes and the convolutional operation. However, for sufﬁciently large K, the CNN fails to recognize reﬂection

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Interpretingconvolutionalneuralnetworks'lowdimensionalapproximationtoquantumspinsystemsYilongJu,1;yShahSaadAlam,2;yJonathanMinoff2,FabioAnselmi3;4,HanPu2,AnkitPatel1;51DepartmentofComputerScience,RiceUniversity,6100MainSt.,Houston,TX77005,USA2DepartmentofPhysicsandAstronomy,RiceUniversity,6100Main...

展开>> 收起<<

Interpreting convolutional neural networks low dimensional approximation to quantum spin systems Yilong Ju1yShah Saad Alam2yJonathan Minoff2.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Interpreting convolutional neural networks low dimensional approximation to quantum spin systems Yilong Ju1yShah Saad Alam2yJonathan Minoff2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: