Interpreting convolutional neural networks low dimensional approximation to quantum spin systems Yilong Ju1yShah Saad Alam2yJonathan Minoff2

2025-05-06 0 0 1.09MB 27 页 10玖币
侵权投诉
Interpreting convolutional neural networks’ low dimensional
approximation to quantum spin systems
Yilong Ju,1,Shah Saad Alam,2,Jonathan Minoff2,
Fabio Anselmi3,4, Han Pu2, Ankit Patel1,5
1Department of Computer Science, Rice University, 6100 Main St., Houston, TX 77005, USA
2Department of Physics and Astronomy, Rice University, 6100 Main St., Houston, TX 77005, USA
3Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston
4Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA
Equal contribution.
To whom correspondence should be addressed; E-mail: hpu@rice.edu, abp4@rice.edu
Convolutional neural networks (CNNs) have been employed along with Vari- ational Monte
Carlo methods for finding the ground state of quantum many- body spin systems with great
success. In order to do so, however, a CNN with only linearly many variational parameters has
to circumvent the “curse of dimensionality” and successfully approximate a wavefunction on
an exponentially large Hilbert space. In our work, we provide a theoretical and experimental
analysis of how the CNN optimizes learning for spin systems, and investigate the CNN’s low
dimensional approximation. We first quantify the role played by physical symmetries of the un-
derlying spin system during training. We incorporate our insights into a new training algorithm
and demonstrate its improved efficiency, accuracy and robustness. We then further investigate
the CNN’s ability to approximate wavefunctions by looking at the entanglement spectrum cap-
tured by the size of the convolutional filter. Our insights reveal the CNN to be an ansatz fun-
damentally centred around the occurrence statistics of K-motifs of the input strings. We use
this motivation to provide the shallow CNN ansatz with a unifying theoretical interpretation
in terms of other well-known statistical and physical ansatzes such as the maximum entropy
(MaxEnt) and entangled plaquette correlator product states (EP-CPS). Using regression anal-
ysis, we find further relationships between the CNN’s approximations of the different motifs’
expectation values. Our results allow us to gain a comprehensive, improved understanding of
how CNNs successfully approximate quantum spin Hamiltonians and to use that understanding
to improve CNN performance.
Contents
1 Introduction 2
2 The CNN Ansatz 4
2.1 CNNArchitectureandTraining ................................... 4
2.2 Symmetries Reduce the Complexity of Ground State Wavefunction . . . . . . . . . . . . . . . . . 4
2.3 Representing and Approximating Symmetries with CNNs . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Improving CNN Performance by Imposing Symmetry Constraints . . . . . . . . . . . . . . . . . 6
1
arXiv:2210.00692v1 [quant-ph] 3 Oct 2022
3 When does the Ground State Admit Parsimonious Approximations? 7
4 Statistical & Physical Interpretations of the CNN Ansatz 9
4.1 Motivation............................................... 9
4.2 CNN as a Maximum Entropy (MaxEnt) Ansatz . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 CNN as an Entangled Plaquette Correlator Product State (EP-CPS) Ansatz . . . . . . . . . . . . 10
4.4 Numerical Evidence: CNNs Behave like Restricted CPS Approximations . . . . . . . . . . . . . 10
5 Physical Insights from Learned CNNs 11
6 Discussion 12
A Representation, Dynamics and Inductive Bias of CNN 14
A.1 Derivation of Motif Count Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
A.2 The Motif Count Matrix and Critical Kernel Size . . . . . . . . . . . . . . . . . . . . . . . . . . 14
A.3 The Need for Nonlinear Activation Function in CNN . . . . . . . . . . . . . . . . . . . . . . . . 15
A.4 TheGrandSumCondition ...................................... 16
A.5 Algorithms For Improving Training using the Grand Sum Condition . . . . . . . . . . . . . . . . 19
A.6 LearningDynamicsofCNN ..................................... 19
B Equivalence Classes and How to Count Them 23
C Derivation of MaxEnt Ansatz with Symmetries 23
D Entanglement Calculation Derivation and Errors 25
E Regression Analysis 25
F Hyperparameters and Tuning 26
F.1 ForResultsShowninFig.3...................................... 26
F.2 For Results Shown in Fig. 5, Table 2 and Table 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1 Introduction
The central concern of quantum many-body system is to understand how macroscopic properties emerge from
microscopic inter-particle interactions. However, this is in general an extremely difficult question to answer
due largely to the fact that the dimension of the quantum Hilbert space grows exponentially as the number of
constituent particles increases. Ingenious numerical techniques have been developed to study certain classes
of many-body systems. In recent years, techniques inspired by machine learning, specifically neural networks
(NNs), have attracted much attention. In particular, Convolutional Neural Networks (CNNs), augmented with
quantum Monte Carlo methods, have arisen recently as a potent class of variational ansatzes for numerically
solving quantum spin systems with many particles (1–4). CNNs have often provided rapid and quite accurate
numerical approximations, comparable to the traditional algorithms that exist in quantum physics. As a result,
there has been a flurry of research to improve the performance of these models and to apply them to broader classes
of quantum spin systems with different physical constraints. However, the exact approximations and methods
used by the CNNs remain a mystery, with the CNNs effectively remaining mostly as black boxes. Indeed, this is
a general problem for applications involving NNs, which has prevented us from being able to interpret the NN’s
solution and to extract useful physical insights about quantum systems under study. As a result, there is a lack of
clear understanding on the full potential of machine learning on quantum research.
2
In this work, we take a crucial step in filling this gap. Specifically, we aim to give new insights into how even
a simple, one-hidden-layer CNN provides a solution to a quantum spin problem. We show how physical features,
such as symmetries of the quantum spin system, naturally manifest themselves in the final trained network and
during the optimization dynamics. We show the constraints these symmetries place on the variational parameters,
and we use these insights to construct a more efficient, accurate and robust training algorithm for CNNs. To further
understand why the CNN is so adept at sufficiently approximating the system using linearly many parameters, we
interpret the convolutional operation in terms of the degree of quantum entanglement captured by the CNN ansatz.
Next, to interpret the advantages conferred by the mathematical form of the CNN, we provide a mapping of the
CNN to other statistical and physical ansatzes such as Maximum Entropy (MaxEnt) and Correlator Product States
(CPS). We also conduct a novel multivariate regression analysis to uncover which physical features are the most
relevant to the low-dimensional learned solution and which ones the CNN captures correctly. Finally, we discuss
how our approach and new insights can be used to design efficient approximations of complicated quantum spin
systems.
Figure 1: A: The CNN architecture for a system of Nsites with M= 2 internal states. It has 1 filter and 1
convolutional layer with kernel size K= 4, a ReLU activation function σ(·)and uses the cyclic padding. The
one-hot encoding is used for each input state sfrom the net Sz= 0 manifold. B: Learned ln ψ(s)at different
iterations. C: Motif count matrix in the case N= 8,M= 2, and K= 4. All MK= 24= 16 motifs are labeled
for each row and all N
N/M=8
4= 70 states for each column. The counts are labeled with colors with larger
motif counts having brighter colors. States are color-coded by the equivalence classes they belong to.
3
2 The CNN Ansatz
2.1 CNN Architecture and Training
For our choice of a physical toy system, we pick the 1-dimensional Sutherland model with periodic boundary
conditions and Hamiltonian
H=
N
X
i=1
Pi,i+1 (1)
where Pi,i+1 is the operator exchanging the particles at positions iand i+ 1, and the Nparticles are evenly dis-
tributed among Mdifferent species. For M= 2, this system reduces to the antiferromagnetic spin-1/2 Heisenberg
model. The reason we choose this Hamiltonian is twofold. First, it is simple enough that we can benchmark the
CNN’s solution by comparing its energy to the exact value given by the Bethe ansatz (5). Second, it is complex
enough that the exact solution consists of O(MN)unique numbers, whereas the CNN only has O(N)variational
parameters to work with. In order to succeed, the CNN must find a way to efficiently represent an approximation
to the exact solution, and we seek to understand the nature of this approximation.
To investigate the physics as simply as possible, we start with a basic CNN with a single convolutional
layer followed by a fully connected layer (see Fig. 1A). The inputs to this CNN are the spin configurations
s={s1, s2, ..., sN}and output is ln ψ(s), where ψ(s)is the wavefunction at sparametrized as:
ln ψCNN(s) = v
N
X
i=1
σ(w·si:i+K1+b),s∈ SN,M ,(2)
where σis the ReLU non-linearity, wRKis a convolutional filter of size K,bRis a scalar bias, vRis a
scalar weight, and si:i+K1is the substring of sof length Kstarting at index i. Since the Sutherland model does
not allow for changes in total magnetization, we have restricted our input spin configurations sto have zero net
magnetization i.e. s∈ SM,N . We note in passing that, for this particular problem, a nonlinear activation function
is required for preventing the CNN from producing constant outputs (see Sec. A.3 for proof).
Interestingly, if we combine the training results reported in Fig. 1B with the strings shown in Fig. 1C which
have the same color as the bars in Fig. 1B, we can see that a pattern emerges: certain strings shave very similar
ln ψ(s)to each other. On further inspection, we see that the states that have similar ln ψ(s)are the ones that are
connected to each other by a combination of symmetry operations of the Hamiltonian: translations, reflections
around any point, and permutations of the spin labels. Essentially, the CNN efficiently captures the symmetry
constraints of the target function after training. Our goal is to see how these symmetries in the target function
manifest within the CNN’s variational parameters itself.
As mentioned earlier, the CNN cannot directly ‘see’ the full input string sof size N; instead it gleans infor-
mation about sindirectly through substrings s0of size Kthat it can ‘see’ directly via the convolution operation.
We call these substrings K-motifs. In order to learn about the global symmetries of the Hamiltonian, the CNN
must somehow glean this information using only the frequency and occurrences of the K-motifs, which we can
visualize via a motif count matrix shown in Fig. 1C (see Sec. A.2 for a mathematical definition). As we will see
later, motifs are the key to understanding why a low-dimensional approximation to the ground state exists, and
why the CNN is particularly suited for this task. Before giving a detailed explanation, we first turn our attention
to how the symmetries of the problem appear within the CNN.
2.2 Symmetries Reduce the Complexity of Ground State Wavefunction
In our quest to understand the CNN’s approximation, we start looking into the role of symmetries in decreasing
the complexity of the target ground-state wavefunction. The Sutherland Hamiltonian is invariant under three
symmetries that are commonly found in physics: translation, reflection, and SU(M) rotations among the Mtypes
of particles. Let Gdenote the symmetry group generated by all of these symmetries. It follows that the unique,
4
6 8 10 12 14 16
N
0
20
40
60
80
# of Equivalence Classes
6 8 10 12 14 16
N
4
6
8
10
K
K
K=N/2
Figure 2: Left: Number of equivalence classes required to form a 99% accurate approximation of the exact ground
state wavefunction. The exponential scaling of this quantity demonstrates that finding the right approximation is
not a trivial task. Right: Critical kernel size Kvs. Nin the case of M= 2. We empirically observe a growth
rate slightly faster than O(N).
nondegenerate ground state must also obey these symmetries in G. In the uncoupled spin basis, the positive
definiteness of the wavefunction allows the SU(M) symmetry to be reduced to an SMsymmetry defined by simply
permuting the Mdifferent particle labels. Due to these symmetries, the target function the CNN must learn has
unique values only in a quotient space, a subspace of the ambient Hilbert space, since ψGS(gs) = ψGS(s), for
all g∈ G. The symmetries partition the Hilbert space Hinto equivalence classes of symmetric states E ≡ H
mod G. The number of equivalence classes |E| can be computed exactly for small N, and for large N, a lower
bound is given by |E| ≥ N!
2NM!((N/M)!)MO(MN)(see Supplement B). We thus find that the symmetries reduce
the complexity of the target function the CNN is required to learn, but its complexity is still exponential in N
assuming a constant number of parameters needed per equivalence class.
2.3 Representing and Approximating Symmetries with CNNs
And yet the success of CNNs is proof that a more parsimonious approximation does indeed exist. Might the
CNN be learning by only selectively approximating some more important equivalence classes, or perhaps by
taking advantage of strong dependencies between equivalence classes? While the exact ground state wavefunction
consists of exponentially many amplitudes, a small subset of equivalence classes might account for the majority of
the probability mass. To test this, we computed the minimum number of equivalence classes required to achieve
a 99% of the cumulative ground state probability as a function of system size N. Fig. 2 (Left) shows that this still
scales exponentially with N, implying that complexity reduction due to symmetry constraints cannot fully explain
how a CNN can achieve high accuracy with only polynomially many O(K)variational parameters. Furthermore
it is not immediately clear why this dimensionality reduction is even physically possible. We will revisit this issue
later on.
We next turn to the question of how the CNN learns and represents the symmetries of the Sutherland Hamil-
tonian. The equivalence classes are the result of global symmetries of the Hamiltonian which are only manifest
in the full spin configuration or string of length N. However, as we saw before, the CNN cannot ’see’ the full
length Nstring, but rather can only ‘see’ substrings of length Kvia a convolutional filter of size K < N. How
large must Kbe in order to learn an accurate approximation? Supp. Sec. A.2 shows that in order to distinguish
all equivalence classes we need filters of size at least K > N/3.
However, the key insight lies in the motif count matrix. The rank of this matrix exactly equals the number of
basis states that the CNN is able to distinguish, so in order to differentiate all equivalence classes, the rank of the
motif count matrix must be at least as large as the number of equivalence classes. We define Kas the minimum
value of Kthat satisfies such condition. We show its growth vs. Nin Fig. 2 (Right). For M= 2 and K < N/2,
the rank of the motif count matrix is equal to 2K1, which establishes a connection between the equivalence
classes and the convolutional operation. However, for sufficiently large K, the CNN fails to recognize reflection
5
摘要:

Interpretingconvolutionalneuralnetworks'lowdimensionalapproximationtoquantumspinsystemsYilongJu,1;yShahSaadAlam,2;yJonathanMinoff2,FabioAnselmi3;4,HanPu2,AnkitPatel1;51DepartmentofComputerScience,RiceUniversity,6100MainSt.,Houston,TX77005,USA2DepartmentofPhysicsandAstronomy,RiceUniversity,6100Main...

展开>> 收起<<
Interpreting convolutional neural networks low dimensional approximation to quantum spin systems Yilong Ju1yShah Saad Alam2yJonathan Minoff2.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:1.09MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注