Thermodynamics of the Ising model encoded in restricted Boltzmann machines Jing Gu1and Kai Zhang1 2 1Division of Natural and Applied Sciences Duke Kunshan University Kunshan Jiangsu 215300 China

2025-04-26 1 0 1.3MB 13 页 10玖币

侵权投诉

Thermodynamics of the Ising model encoded in restricted Boltzmann machines

Jing Gu1and Kai Zhang1, 2, ∗

1Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan, Jiangsu, 215300, China

2Data Science Research Center (DSRC), Duke Kunshan University, Kunshan, Jiangsu, 215300, China

The restricted Boltzmann machine (RBM) is a two-layer energy-based model that uses its hidden-

visible connections to learn the underlying distribution of visible units, whose interactions are often

complicated by high-order correlations. Previous studies on the Ising model of small system sizes

have shown that RBMs are able to accurately learn the Boltzmann distribution and reconstruct

thermal quantities at temperatures away from the critical point Tc. How the RBM encodes the

Boltzmann distribution and captures the phase transition are, however, not well explained. In

this work, we perform RBM learning of the 2dand 3dIsing model and carefully examine how

the RBM extracts useful probabilistic and physical information from Ising conﬁgurations. We ﬁnd

several indicators derived from the weight matrix that could characterize the Ising phase transition.

We verify that the hidden encoding of a visible state tends to have an equal number of positive

and negative units, whose sequence is randomly assigned during training and can be inferred by

analyzing the weight matrix. We also explore the physical meaning of visible energy and loss

function (pseudo-likelihood) of the RBM and show that they could be harnessed to predict the

critical point or estimate physical quantities such as entropy.

I. INTRODUCTION

The tremendous success of deep learning in multiple

areas over the last decade has really revived the inter-

play between physics and machine learning, in particular

neural networks [1]. On one hand, (statistical) physics

ideas [2], such as renormalization group (RG) [3], en-

ergy landscape [4], free energy [5], glassy dynamics [6],

jamming [7], Langevin dynamics [8], and ﬁeld theory [9],

shed some light on the interpretation of deep learning

and statistical inference in general [10]. On the other

hand, machine learning and deep learning tools are har-

nessed to solved a wide range of physics problems, such as

interaction potential construction [11], phase transition

detection [12], structure encoding [13], physical concepts

discovery [14], and many others [15, 16]. At the very

intersection of these two ﬁelds lies the restricted Boltz-

mann machine (RBM) [17], which serves as a classical

paradigm to investigate how an overarching perspective

could beneﬁt both sides.

The RBM uses hidden-visible connections to encode

(high-order) correlations between visible units [18]. Its

precursor–the (unrestricted) Boltzmann machine was in-

spired by spin glasses [19, 20] and is often used in the

inverse Ising problem to infer physical parameters [21–

23]. The restriction of hidden-hidden and visible-visible

connections in RBMs allows for more eﬃcient training

algorithms, and therefore leads to recent applications in

Monte Carlo simulation acceleration [24], quantum wave-

function representation [25, 26], and polymer conﬁgura-

tion generation [27]. Deep neural networks formed by

stacks of RBMs have been mapped onto the variational

RG due to their conceptual similarity [28]. RBMs are

also shown to be equivalent to tensor network states from

∗kai.zhang@dukekunshan.edu.cn

quantum many-body physics [29]. As simple as it seems,

energy-based models like the RBM could eventually be-

come the building blocks of autonomous machine intelli-

gence [30].

Besides the above mentioned eﬀorts, the RBM has also

been applied extensively in the study of the minimal

model for second-order phase transition–the Ising model.

For the small systems under investigation, it was found

that RBMs with an enough number of hidden units can

encode the Boltzmann distribution, reconstruct thermal

quantities, and generate new Ising conﬁgurations fairly

well [31–33]. The visible →hidden →visible ··· gener-

ating sequence of the RBM can be mapped onto a RG

ﬂow in physical temperature (often towards the critical

point) [34–36]. But the mechanism and power of the

RBM to capture physics concepts and principles have not

been fully explored. First, in what way is the Boltzmann

distribution of the Ising model learned by the RBM? Sec-

ond, can the RBM learn and even quantitatively predict

the phase transition without extra human knowledge?

An aﬃrmative answer to the second question is partic-

ularly appealing, because simple unsupervised learning

methods such as principal component analysis (PCA) us-

ing conﬁguration information alone do not provide quan-

titative prediction for the transition temperature [37, 38]

and supervised learning with neural networks requires

human labeling of the phase type or temperature of a

given conﬁguration [39, 40].

In this article, we report a detailed numerical study

on RBM learning of the Ising model with a system size

much larger than those used previously. The purpose

is to thoroughly dissect the various parts of the RBM

and reveal how each part contributes to the learning of

the Boltzmann distribution of the input Ising conﬁgu-

rations. Such understanding allows us to extract sev-

eral useful machine-learning estimators or predictors for

physical quantities, such as entropy and phase transi-

tion temperature. Conversely, the analysis of a physi-

arXiv:2210.06203v1 [cond-mat.stat-mech] 12 Oct 2022

cal model helps us to obtain important insights about

the meaning of RBM parameters and functions, such

as weight matrix, visible energy and pseudo-likelihood.

Below, we ﬁrst introduce our Ising datasets, the RBM

and its training protocols in Sec. II. We then report and

discuss the results about model parameters, hidden lay-

ers, visible energy and pseudo-likelihood in Sec. III. Af-

ter the conclusion, more details about the Ising model

and the RBM are provided in Appendices. Sample

codes of the RBM are shared on the GitHub at https:

//github.com/Jing-DS/isingrbm.

II. MODELS AND METHODS

A. Dataset of Ising conﬁgurations generated by

Monte Carlo simulations

The Hamiltonian of the Ising model with N=Ld

spins in a conﬁguration s= [s1, s2,··· , sN]Ton a d-

dimensional hypercubic lattice of linear dimension Lin

the absence of magnetic ﬁeld is

H(s) = −JX

hi,ji

sisj(1)

where the spin variable si=±1 (i= 1,2,··· , N), the

coupling parameter J > 0 (set to unity) favors ferro-

magnetic conﬁgurations (parallel spins) and the notation

hi, jimeans to sum over nearest neighbors [41]. At a

given temperature T, the conﬁguration sdrawn from the

sample space of 2Nstates follows the Boltzmann distri-

bution

pT(s) = e−H(s)

kBT

(2)

where ZT=P

e−H(s)

kBTis the partition function. The

Boltzmann constant kBis set to unity.

Using single-ﬂip Monte Carlo simulations un-

der periodic boundary conditions [42], we generate

Ising conﬁgurations for two-dimensional (2d) sys-

tems (d= 2) of L= 64 (N= 4096) at nT= 16

temperatures T= 0.25,0.5,0.75,1.0,··· ,4.0 (in

units of J/kB) and for three-dimensional (3d) sys-

tems (d= 3) of L= 16 (N= 4096) at nT= 20

temperatures T= 2.5,2.75,3.0,3.25,3.5,3.75,4.0,

4.25,4.3,4.4,4.5,4.6,4.7,4.75,5.0,5.25,5.5,5.75,6.0,6.25.

After fully equilibrated, M= 50000 conﬁgurations at

each Tare collected into a dataset DTfor that T. For

2dsystems, we also use a dataset D∪Tconsisting of

50000 conﬁgurations per temperature from all T’s.

Analytical results about thermal quantities of the 2d

Ising model, such as internal energy hEi, (physical) en-

tropy S, heat capacity CVand magnetization hmi, are

well known [43–46]. Numerical simulation methods and

results about the 3dIsing model have also been re-

ported [47]. Thermodynamic deﬁnitions and relations

used in this work are summarized in Appendix A.

FIG. 1. A restricted Boltzmann machine (RBM) with nh= 6

hidden units and nv= 9 visible units. Model parameters

θ={W,b,c}are represented by connections. A ﬁlter wT

from visible units to the ﬁrst hidden unit is highlighted by

red (light color) connections.

B. Restricted Boltzmann Machine (RBM)

The restricted Boltzmann machine (RBM) is a two-

layer energy-based model with nhhidden units (or neu-

rons) hi=±1 (i= 1,2,··· , nh) in the hidden layer,

whose state vector is h= [h1, h2,··· , hnh]T, and nvvis-

ible units vj=±1 (j= 1,2,··· , nv) in the visible layer,

whose state vector is v= [v1, v2,··· , vnv]T(Fig. 1) [48].

In this work, the visible layer is just the Ising conﬁgura-

tion vector, i.e. v=s, with nv=N. We choose binary

unit {−1,+1}(instead of {0,1}) to better align with the

deﬁnition of Ising spin variable si.

The total energy Eθ(v,h) of the RBM is deﬁned as

Eθ(v,h) = −bTv−cTh−hTWv

=−

j=1

bjvj−

i=1

cihi−

i=1

j=1

Wij hivj

(3)

where b= [b1, b2,··· , bnv]Tis the visible bias, c=

[c1, c2,··· , cnh]Tis the hidden bias and

Wnh×nv=





−wT

1−

−wT

2−

−wT

nh−





=

| | |

w:,1w:,2··· w:,nv

| | | 

(4)

is the interaction weight matrix between visible and hid-

den units. Under this notation, each row vector wT

i(of

dimension nv) is a ﬁlter mapping from the visible state

vto a hidden unit iand each column vector w:,j (of di-

mension nh) is an inverse ﬁlter mapping from the hidden

state hto a visible unit j. All parameters are collectively

written as θ={W,b,c}.“Restricted” refers to the lack

of interaction between hidden units or between visible

units.

The joint distribution for an overall state (v,h) is

pθ(v,h) = e−Eθ(v,h)

Zθ

(5)

where the partition function of the RBM

Zθ=X

e−Eθ(v,h).(6)

The learned model distribution for visible state vis from

marginalization of pθ(v,h),

pθ(v) = X

pθ(v,h) = 1

Zθ

e−Eθ(v),(7)

where the visible energy–an eﬀective energy for visible

state v(often termed as “free energy” in machine learn-

ing literature),

Eθ(v) = −bTv−

i=1

ln e−wT

iv−ci+ewT

iv+ci(8)

is deﬁned according to e−Eθ(v)=P

e−Eθ(v,h)such that

Zθ=P

e−Eθ(v).See Appendix B for a detailed deriva-

tion.

The conditional distributions to generate hfrom v,

pθ(h|v), and to generate vfrom h,pθ(v|h), satisfying

pθ(v,h) = pθ(h|v)pθ(v) = pθ(v|h)pθ(h), can be written

as products

pθ(h|v) =

i=1

pθ(hi|v)

pθ(v|h) =

j=1

pθ(vj|h)

(9)

because hiare independent from each other (at ﬁxed v)

and vjare independent from each other (at ﬁxed h). It

can be shown that

pθ(hi= 1|v) = σ2(ci+wT

iv)

pθ(hi=−1|v) = 1 −σ2(ci+wT

iv)

pθ(vj= 1|h) = σ2(bj+hTw:,j )

pθ(vj=−1|h) = 1 −σ2(bj+hTw:,j )

(10)

where the sigmoid function σ(z) = 1

1+e−z(Appendix B).

C. Loss function and training of RBMs

Given the dataset D= [v1,v2,··· ,vM]Tof Msam-

ples generated independently from the identical data dis-

tribution pD(v) (vi.i.d.

∼pD(v)), the goal of RBM learning

is to ﬁnd a model distribution pθ(v) that approximates

pD(v). In the context of this work, the data samples v’s

are Ising conﬁgurations and the data distribution pD(v)

is or is related to the Ising Boltzmann distribution pT(s).

Based on maximum likelihood estimation, the optimal

parameters θ∗= arg min

θL(θ) can be found by minimize

the negative log likelihood

L(θ) = h−ln pθ(v)iv∼pD=hEθ(v)iv∼pD+ ln Zθ(11)

which serves as the loss function of RBM learning. Note

that the partition function Zθonly depends on the model

but not on data. Since the calculation of Zθinvolves sum-

mation over all possible (v,h) states, which is not feasi-

ble, L(θ) can not be evaluated exactly, except for very

small systems [49]. Approximations have to be made,

for example, by mean-ﬁeld calculations [50]. An interest-

ing feature of the RBM is that, although the actual loss

function L(θ) is not accessible, its gradient

∇θL(θ) = h∇θEθ(v)iv∼pD− h∇θEθ(v)iv∼pθ(12)

can be sampled, which enables a gradient descent learn-

ing algorithm. From step tto step t+1, model parameters

are updated with learning rate ηas

θt+1 =θt−η∇θL(θt).(13)

To evaluate the loss function, we use its approximate

– the pseudo-(negative log)likelihood [51]

L(θ) = *−

i=1

ln pθ(vi|vj6=i)+v∼pD

≈ L(θ) (14)

where the notation

pθ(vi|vj6=i) = pθ(vi|vjfor j6=i)

=e−Eθ(v)

e−Eθ(v)+e−Eθ([v1,···,−vi,··· ,vnv])

(15)

is the conditional probability for component vigiven that

all the other components vj(j6=i) are ﬁxed. Prac-

tically, to avoid the time-consuming sum over all visi-

ble units

i=1

, it is suggested to randomly sample one

i0∈ {1,2,··· , nv}and estimate that

L(θ)≈ h−nvln pθ(vi0|vj6=i0)iv∼pD,(16)

if all the visible units are on average translation-

invariant [52]. To monitor the reconstruction error, we

also calculate the cross entropy CE between the initial

conﬁguration vand the conditional probability pθ(v0|h)

for reconstruction vpθ(h|v)

−→ hpθ(v0|h)

−→ v0(See Appendix C

for deﬁnition).

For both 2dand 3dIsing systems, we ﬁrst train sin-

gle temperature RBMs (T-RBM). M= 50000 Ising con-

ﬁgurations at each Tforming a dataset DTare used

to train one model such that there are nTT-RBMs

in total. While nv=N, we try various number of

hidden units with nh= 400,900,1600,2500 in 2dand

nh= 400,900,1600 in 3d. For 2dsystems, we also train

an all temperature RBM (∪T-RBM) for which 50000

Ising conﬁgurations per temperature are drawn to com-

pose a dataset D∪Tof M= 50000nT= 8 ×105sam-

ples. The number of hidden units for this ∪T-RBM is

nh= 400,900,1600.Weight matrix Ware initialized

with Glorot normal initialization [53] (band care ini-

tialized as zero). Parameters are optimized with the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ThermodynamicsoftheIsingmodelencodedinrestrictedBoltzmannmachinesJingGu1andKaiZhang1,2,1DivisionofNaturalandAppliedSciences,DukeKunshanUniversity,Kunshan,Jiangsu,215300,China2DataScienceResearchCenter(DSRC),DukeKunshanUniversity,Kunshan,Jiangsu,215300,ChinaTherestrictedBoltzmannmachine(RBM)isatwo-l...

展开>> 收起<<

Thermodynamics of the Ising model encoded in restricted Boltzmann machines Jing Gu1and Kai Zhang1 2 1Division of Natural and Applied Sciences Duke Kunshan University Kunshan Jiangsu 215300 China.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Thermodynamics of the Ising model encoded in restricted Boltzmann machines Jing Gu1and Kai Zhang1 2 1Division of Natural and Applied Sciences Duke Kunshan University Kunshan Jiangsu 215300 China

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: