2 The CNN Ansatz
2.1 CNN Architecture and Training
For our choice of a physical toy system, we pick the 1-dimensional Sutherland model with periodic boundary
conditions and Hamiltonian
H=
N
X
i=1
Pi,i+1 (1)
where Pi,i+1 is the operator exchanging the particles at positions iand i+ 1, and the Nparticles are evenly dis-
tributed among Mdifferent species. For M= 2, this system reduces to the antiferromagnetic spin-1/2 Heisenberg
model. The reason we choose this Hamiltonian is twofold. First, it is simple enough that we can benchmark the
CNN’s solution by comparing its energy to the exact value given by the Bethe ansatz (5). Second, it is complex
enough that the exact solution consists of O(MN)unique numbers, whereas the CNN only has O(N)variational
parameters to work with. In order to succeed, the CNN must find a way to efficiently represent an approximation
to the exact solution, and we seek to understand the nature of this approximation.
To investigate the physics as simply as possible, we start with a basic CNN with a single convolutional
layer followed by a fully connected layer (see Fig. 1A). The inputs to this CNN are the spin configurations
s={s1, s2, ..., sN}and output is ln ψ(s), where ψ(s)is the wavefunction at sparametrized as:
ln ψCNN(s) = v
N
X
i=1
σ(w·si:i+K−1+b),∀s∈ SN,M ,(2)
where σis the ReLU non-linearity, w∈RKis a convolutional filter of size K,b∈Ris a scalar bias, v∈Ris a
scalar weight, and si:i+K−1is the substring of sof length Kstarting at index i. Since the Sutherland model does
not allow for changes in total magnetization, we have restricted our input spin configurations sto have zero net
magnetization i.e. s∈ SM,N . We note in passing that, for this particular problem, a nonlinear activation function
is required for preventing the CNN from producing constant outputs (see Sec. A.3 for proof).
Interestingly, if we combine the training results reported in Fig. 1B with the strings shown in Fig. 1C which
have the same color as the bars in Fig. 1B, we can see that a pattern emerges: certain strings shave very similar
ln ψ(s)to each other. On further inspection, we see that the states that have similar ln ψ(s)are the ones that are
connected to each other by a combination of symmetry operations of the Hamiltonian: translations, reflections
around any point, and permutations of the spin labels. Essentially, the CNN efficiently captures the symmetry
constraints of the target function after training. Our goal is to see how these symmetries in the target function
manifest within the CNN’s variational parameters itself.
As mentioned earlier, the CNN cannot directly ‘see’ the full input string sof size N; instead it gleans infor-
mation about sindirectly through substrings s0of size Kthat it can ‘see’ directly via the convolution operation.
We call these substrings K-motifs. In order to learn about the global symmetries of the Hamiltonian, the CNN
must somehow glean this information using only the frequency and occurrences of the K-motifs, which we can
visualize via a motif count matrix shown in Fig. 1C (see Sec. A.2 for a mathematical definition). As we will see
later, motifs are the key to understanding why a low-dimensional approximation to the ground state exists, and
why the CNN is particularly suited for this task. Before giving a detailed explanation, we first turn our attention
to how the symmetries of the problem appear within the CNN.
2.2 Symmetries Reduce the Complexity of Ground State Wavefunction
In our quest to understand the CNN’s approximation, we start looking into the role of symmetries in decreasing
the complexity of the target ground-state wavefunction. The Sutherland Hamiltonian is invariant under three
symmetries that are commonly found in physics: translation, reflection, and SU(M) rotations among the Mtypes
of particles. Let Gdenote the symmetry group generated by all of these symmetries. It follows that the unique,
4