Thermodynamics of the Ising model encoded in restricted Boltzmann machines Jing Gu1and Kai Zhang1 2 1Division of Natural and Applied Sciences Duke Kunshan University Kunshan Jiangsu 215300 China

2025-04-26 0 0 1.3MB 13 页 10玖币
侵权投诉
Thermodynamics of the Ising model encoded in restricted Boltzmann machines
Jing Gu1and Kai Zhang1, 2,
1Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan, Jiangsu, 215300, China
2Data Science Research Center (DSRC), Duke Kunshan University, Kunshan, Jiangsu, 215300, China
The restricted Boltzmann machine (RBM) is a two-layer energy-based model that uses its hidden-
visible connections to learn the underlying distribution of visible units, whose interactions are often
complicated by high-order correlations. Previous studies on the Ising model of small system sizes
have shown that RBMs are able to accurately learn the Boltzmann distribution and reconstruct
thermal quantities at temperatures away from the critical point Tc. How the RBM encodes the
Boltzmann distribution and captures the phase transition are, however, not well explained. In
this work, we perform RBM learning of the 2dand 3dIsing model and carefully examine how
the RBM extracts useful probabilistic and physical information from Ising configurations. We find
several indicators derived from the weight matrix that could characterize the Ising phase transition.
We verify that the hidden encoding of a visible state tends to have an equal number of positive
and negative units, whose sequence is randomly assigned during training and can be inferred by
analyzing the weight matrix. We also explore the physical meaning of visible energy and loss
function (pseudo-likelihood) of the RBM and show that they could be harnessed to predict the
critical point or estimate physical quantities such as entropy.
I. INTRODUCTION
The tremendous success of deep learning in multiple
areas over the last decade has really revived the inter-
play between physics and machine learning, in particular
neural networks [1]. On one hand, (statistical) physics
ideas [2], such as renormalization group (RG) [3], en-
ergy landscape [4], free energy [5], glassy dynamics [6],
jamming [7], Langevin dynamics [8], and field theory [9],
shed some light on the interpretation of deep learning
and statistical inference in general [10]. On the other
hand, machine learning and deep learning tools are har-
nessed to solved a wide range of physics problems, such as
interaction potential construction [11], phase transition
detection [12], structure encoding [13], physical concepts
discovery [14], and many others [15, 16]. At the very
intersection of these two fields lies the restricted Boltz-
mann machine (RBM) [17], which serves as a classical
paradigm to investigate how an overarching perspective
could benefit both sides.
The RBM uses hidden-visible connections to encode
(high-order) correlations between visible units [18]. Its
precursor–the (unrestricted) Boltzmann machine was in-
spired by spin glasses [19, 20] and is often used in the
inverse Ising problem to infer physical parameters [21–
23]. The restriction of hidden-hidden and visible-visible
connections in RBMs allows for more efficient training
algorithms, and therefore leads to recent applications in
Monte Carlo simulation acceleration [24], quantum wave-
function representation [25, 26], and polymer configura-
tion generation [27]. Deep neural networks formed by
stacks of RBMs have been mapped onto the variational
RG due to their conceptual similarity [28]. RBMs are
also shown to be equivalent to tensor network states from
kai.zhang@dukekunshan.edu.cn
quantum many-body physics [29]. As simple as it seems,
energy-based models like the RBM could eventually be-
come the building blocks of autonomous machine intelli-
gence [30].
Besides the above mentioned efforts, the RBM has also
been applied extensively in the study of the minimal
model for second-order phase transition–the Ising model.
For the small systems under investigation, it was found
that RBMs with an enough number of hidden units can
encode the Boltzmann distribution, reconstruct thermal
quantities, and generate new Ising configurations fairly
well [31–33]. The visible hidden visible ··· gener-
ating sequence of the RBM can be mapped onto a RG
flow in physical temperature (often towards the critical
point) [34–36]. But the mechanism and power of the
RBM to capture physics concepts and principles have not
been fully explored. First, in what way is the Boltzmann
distribution of the Ising model learned by the RBM? Sec-
ond, can the RBM learn and even quantitatively predict
the phase transition without extra human knowledge?
An affirmative answer to the second question is partic-
ularly appealing, because simple unsupervised learning
methods such as principal component analysis (PCA) us-
ing configuration information alone do not provide quan-
titative prediction for the transition temperature [37, 38]
and supervised learning with neural networks requires
human labeling of the phase type or temperature of a
given configuration [39, 40].
In this article, we report a detailed numerical study
on RBM learning of the Ising model with a system size
much larger than those used previously. The purpose
is to thoroughly dissect the various parts of the RBM
and reveal how each part contributes to the learning of
the Boltzmann distribution of the input Ising configu-
rations. Such understanding allows us to extract sev-
eral useful machine-learning estimators or predictors for
physical quantities, such as entropy and phase transi-
tion temperature. Conversely, the analysis of a physi-
arXiv:2210.06203v1 [cond-mat.stat-mech] 12 Oct 2022
2
cal model helps us to obtain important insights about
the meaning of RBM parameters and functions, such
as weight matrix, visible energy and pseudo-likelihood.
Below, we first introduce our Ising datasets, the RBM
and its training protocols in Sec. II. We then report and
discuss the results about model parameters, hidden lay-
ers, visible energy and pseudo-likelihood in Sec. III. Af-
ter the conclusion, more details about the Ising model
and the RBM are provided in Appendices. Sample
codes of the RBM are shared on the GitHub at https:
//github.com/Jing-DS/isingrbm.
II. MODELS AND METHODS
A. Dataset of Ising configurations generated by
Monte Carlo simulations
The Hamiltonian of the Ising model with N=Ld
spins in a configuration s= [s1, s2,··· , sN]Ton a d-
dimensional hypercubic lattice of linear dimension Lin
the absence of magnetic field is
H(s) = JX
hi,ji
sisj(1)
where the spin variable si=±1 (i= 1,2,··· , N), the
coupling parameter J > 0 (set to unity) favors ferro-
magnetic configurations (parallel spins) and the notation
hi, jimeans to sum over nearest neighbors [41]. At a
given temperature T, the configuration sdrawn from the
sample space of 2Nstates follows the Boltzmann distri-
bution
pT(s) = eH(s)
kBT
ZT
(2)
where ZT=P
s
eH(s)
kBTis the partition function. The
Boltzmann constant kBis set to unity.
Using single-flip Monte Carlo simulations un-
der periodic boundary conditions [42], we generate
Ising configurations for two-dimensional (2d) sys-
tems (d= 2) of L= 64 (N= 4096) at nT= 16
temperatures T= 0.25,0.5,0.75,1.0,··· ,4.0 (in
units of J/kB) and for three-dimensional (3d) sys-
tems (d= 3) of L= 16 (N= 4096) at nT= 20
temperatures T= 2.5,2.75,3.0,3.25,3.5,3.75,4.0,
4.25,4.3,4.4,4.5,4.6,4.7,4.75,5.0,5.25,5.5,5.75,6.0,6.25.
After fully equilibrated, M= 50000 configurations at
each Tare collected into a dataset DTfor that T. For
2dsystems, we also use a dataset DTconsisting of
50000 configurations per temperature from all T’s.
Analytical results about thermal quantities of the 2d
Ising model, such as internal energy hEi, (physical) en-
tropy S, heat capacity CVand magnetization hmi, are
well known [43–46]. Numerical simulation methods and
results about the 3dIsing model have also been re-
ported [47]. Thermodynamic definitions and relations
used in this work are summarized in Appendix A.
FIG. 1. A restricted Boltzmann machine (RBM) with nh= 6
hidden units and nv= 9 visible units. Model parameters
θ={W,b,c}are represented by connections. A filter wT
1
from visible units to the first hidden unit is highlighted by
red (light color) connections.
B. Restricted Boltzmann Machine (RBM)
The restricted Boltzmann machine (RBM) is a two-
layer energy-based model with nhhidden units (or neu-
rons) hi=±1 (i= 1,2,··· , nh) in the hidden layer,
whose state vector is h= [h1, h2,··· , hnh]T, and nvvis-
ible units vj=±1 (j= 1,2,··· , nv) in the visible layer,
whose state vector is v= [v1, v2,··· , vnv]T(Fig. 1) [48].
In this work, the visible layer is just the Ising configura-
tion vector, i.e. v=s, with nv=N. We choose binary
unit {−1,+1}(instead of {0,1}) to better align with the
definition of Ising spin variable si.
The total energy Eθ(v,h) of the RBM is defined as
Eθ(v,h) = bTvcThhTWv
=
nv
X
j=1
bjvj
nh
X
i=1
cihi
nh
X
i=1
nv
X
j=1
Wij hivj
(3)
where b= [b1, b2,··· , bnv]Tis the visible bias, c=
[c1, c2,··· , cnh]Tis the hidden bias and
Wnh×nv=
wT
1
wT
2
.
.
.
wT
nh
=
| | |
w:,1w:,2··· w:,nv
| | |
(4)
is the interaction weight matrix between visible and hid-
den units. Under this notation, each row vector wT
i(of
dimension nv) is a filter mapping from the visible state
vto a hidden unit iand each column vector w:,j (of di-
mension nh) is an inverse filter mapping from the hidden
state hto a visible unit j. All parameters are collectively
written as θ={W,b,c}.“Restricted” refers to the lack
of interaction between hidden units or between visible
units.
The joint distribution for an overall state (v,h) is
pθ(v,h) = eEθ(v,h)
Zθ
(5)
3
where the partition function of the RBM
Zθ=X
vX
h
eEθ(v,h).(6)
The learned model distribution for visible state vis from
marginalization of pθ(v,h),
pθ(v) = X
h
pθ(v,h) = 1
Zθ
e−Eθ(v),(7)
where the visible energy–an effective energy for visible
state v(often termed as “free energy” in machine learn-
ing literature),
Eθ(v) = bTv
nh
X
i=1
ln ewT
ivci+ewT
iv+ci(8)
is defined according to e−Eθ(v)=P
h
eEθ(v,h)such that
Zθ=P
v
e−Eθ(v).See Appendix B for a detailed deriva-
tion.
The conditional distributions to generate hfrom v,
pθ(h|v), and to generate vfrom h,pθ(v|h), satisfying
pθ(v,h) = pθ(h|v)pθ(v) = pθ(v|h)pθ(h), can be written
as products
pθ(h|v) =
nh
Y
i=1
pθ(hi|v)
pθ(v|h) =
nv
Y
j=1
pθ(vj|h)
(9)
because hiare independent from each other (at fixed v)
and vjare independent from each other (at fixed h). It
can be shown that
pθ(hi= 1|v) = σ2(ci+wT
iv)
pθ(hi=1|v) = 1 σ2(ci+wT
iv)
pθ(vj= 1|h) = σ2(bj+hTw:,j )
pθ(vj=1|h) = 1 σ2(bj+hTw:,j )
(10)
where the sigmoid function σ(z) = 1
1+ez(Appendix B).
C. Loss function and training of RBMs
Given the dataset D= [v1,v2,··· ,vM]Tof Msam-
ples generated independently from the identical data dis-
tribution pD(v) (vi.i.d.
pD(v)), the goal of RBM learning
is to find a model distribution pθ(v) that approximates
pD(v). In the context of this work, the data samples v’s
are Ising configurations and the data distribution pD(v)
is or is related to the Ising Boltzmann distribution pT(s).
Based on maximum likelihood estimation, the optimal
parameters θ= arg min
θL(θ) can be found by minimize
the negative log likelihood
L(θ) = h−ln pθ(v)ivpD=hEθ(v)ivpD+ ln Zθ(11)
which serves as the loss function of RBM learning. Note
that the partition function Zθonly depends on the model
but not on data. Since the calculation of Zθinvolves sum-
mation over all possible (v,h) states, which is not feasi-
ble, L(θ) can not be evaluated exactly, except for very
small systems [49]. Approximations have to be made,
for example, by mean-field calculations [50]. An interest-
ing feature of the RBM is that, although the actual loss
function L(θ) is not accessible, its gradient
θL(θ) = h∇θEθ(v)ivpD− h∇θEθ(v)ivpθ(12)
can be sampled, which enables a gradient descent learn-
ing algorithm. From step tto step t+1, model parameters
are updated with learning rate ηas
θt+1 =θtηθL(θt).(13)
To evaluate the loss function, we use its approximate
– the pseudo-(negative log)likelihood [51]
e
L(θ) = *
nv
X
i=1
ln pθ(vi|vj6=i)+vpD
≈ L(θ) (14)
where the notation
pθ(vi|vj6=i) = pθ(vi|vjfor j6=i)
=e−Eθ(v)
e−Eθ(v)+e−Eθ([v1,···,vi,··· ,vnv])
(15)
is the conditional probability for component vigiven that
all the other components vj(j6=i) are fixed. Prac-
tically, to avoid the time-consuming sum over all visi-
ble units
nv
P
i=1
, it is suggested to randomly sample one
i0∈ {1,2,··· , nv}and estimate that
e
L(θ)≈ h−nvln pθ(vi0|vj6=i0)ivpD,(16)
if all the visible units are on average translation-
invariant [52]. To monitor the reconstruction error, we
also calculate the cross entropy CE between the initial
configuration vand the conditional probability pθ(v0|h)
for reconstruction vpθ(h|v)
hpθ(v0|h)
v0(See Appendix C
for definition).
For both 2dand 3dIsing systems, we first train sin-
gle temperature RBMs (T-RBM). M= 50000 Ising con-
figurations at each Tforming a dataset DTare used
to train one model such that there are nTT-RBMs
in total. While nv=N, we try various number of
hidden units with nh= 400,900,1600,2500 in 2dand
nh= 400,900,1600 in 3d. For 2dsystems, we also train
an all temperature RBM (T-RBM) for which 50000
Ising configurations per temperature are drawn to com-
pose a dataset DTof M= 50000nT= 8 ×105sam-
ples. The number of hidden units for this T-RBM is
nh= 400,900,1600.Weight matrix Ware initialized
with Glorot normal initialization [53] (band care ini-
tialized as zero). Parameters are optimized with the
摘要:

ThermodynamicsoftheIsingmodelencodedinrestrictedBoltzmannmachinesJingGu1andKaiZhang1,2,1DivisionofNaturalandAppliedSciences,DukeKunshanUniversity,Kunshan,Jiangsu,215300,China2DataScienceResearchCenter(DSRC),DukeKunshanUniversity,Kunshan,Jiangsu,215300,ChinaTherestrictedBoltzmannmachine(RBM)isatwo-l...

展开>> 收起<<
Thermodynamics of the Ising model encoded in restricted Boltzmann machines Jing Gu1and Kai Zhang1 2 1Division of Natural and Applied Sciences Duke Kunshan University Kunshan Jiangsu 215300 China.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.3MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注