Ab initio Canonical Sampling based on Variational Inference
Alo¨ıs Castellano,1Fran¸cois Bottin,1Johann Bouchet,2Antoine Levitt,3and Gabriel Stoltz3
1CEA, DAM, DIF, F-91297 Arpajon, France, and Universit´e Paris-Saclay, CEA,
Laboratoires des Mat´eriaux en Conditions Extrˆemes, 91680 Bruy`eres-le-Chˆatel, France.
2CEA, DES, IRESNE, DEC, Cadarache, F-13018 St Paul Les Durance, France.
3CERMICS, Ecole des Ponts, Marne-la-Vall´ee, France
MATHERIALS team-project, Inria Paris, France
Finite temperature calculations, based on ab initio molecular dynamics (AIMD) simulations, are
a powerful tool able to predict material properties that cannot be deduced from ground state calcu-
lations. However, the high computational cost of AIMD limits its applicability for large or complex
systems. To circumvent this limitation we introduce a new method named Machine Learning As-
sisted Canonical Sampling (MLACS), which accelerates the sampling of the Born–Oppenheimer
potential surface in the canonical ensemble. Based on a self-consistent variational procedure, the
method iteratively trains a Machine Learning Interatomic Potential to generate configurations that
approximate the canonical distribution of positions associated with the ab initio potential energy.
By proving the reliability of the method on anharmonic systems, we show that the method is able
to reproduce the results of AIMD with an ab initio accuracy at a fraction of its computational cost.
Molecular dynamics (MD) simulations, beside Monte-
Carlo calculations [1], are nowadays a popular way to ob-
tain finite temperature properties and explore the phase
diagram of materials. If the seminal works of Alder [2–
4], Rahman [5,6] and coworkers only treated classical
potentials, all the powerful tools introduced in these pa-
pers were reused in ab initio molecular dynamics (AIMD)
codes thereafter. In particular, by using the density func-
tional theory (DFT) [7,8] and the Born–Oppenheimer
(BO) [9] approximation, but also by assuming that the
ground state electronic density is obtained at each MD
time step [10,11], all the ideas proposed previously apply.
However, the high computational cost, due to both the
evaluation of the Hellmann–Feynman forces using DFT
and the high number of MD time steps required to sam-
ple the BO surface potential in the canonical ensemble,
limits the range of systems that can be studied. To accel-
erate the computation of finite temperature properties,
two main strategies have been proposed. In the first one,
AIMD simulations are replaced by MD with ab initio-
based numerical potentials. In the second one, the sam-
pling of the canonical distribution is performed through
a direct generation of atomic configurations, bypassing
AIMD simulations.
Recent progress in the field of Machine Learning In-
teratomic Potentials (MLIP) [12–16] promises an accel-
eration of finite-temperature studies with a near-DFT
accuracy. This high accuracy is ensured by the flexibility
of MLIPs, which enables to reproduce a large variety of
BO surfaces. However, the construction of a MLIP is a
tedious task as MLIP show poor extrapolative capabili-
ties. Consequently, the set of configurations used for the
MLIP requires a careful construction, which led to the de-
velopment of learn-on-the-fly molecular dynamics [17–19]
and other active learning based dataset selection meth-
ods [20–22]. Moreover, usual finite-temperature works
using MLIP shift the studied system from the DFT de-
scription to the MLIP one, with atomic positions (and
computed properties) distributed according to the Boltz-
mann weights associated with the MLIP potential. Con-
sequently, one may need to ensure that simulations using
MLIP are not in the extrapolative regime. This verifica-
tion can be done by computing corrections from free en-
ergy perturbation methods [23], which however requires
additional DFT single-point calculations.
Several groups have recently proposed methods to by-
pass AIMD and generate configurations by adapting the
Self-Consistent Harmonic Approximation (SCHA) [24–
27] to modern ab initio methods, using a varational in-
ference strategy. Among those, we can cite the stochas-
tic Temperature Dependent Effective Potential [28],
the Stochastic SCHA [29–31] and the Quantum Self-
Consistent Ab Initio Lattice Dynamics [32]. Within
those methods (named EHCS in the following, for Effec-
tive Harmonic Canonical Sampling), configurations are
generated with displacements around equilibrium posi-
tions according to a distribution corresponding to an ef-
fective harmonic Hamiltonian. The self-consistent (SC)
construction of this Hamiltonian, based on a variational
procedure, allows to include explicit temperature effects.
However, the harmonic form of the effective potential
means that the atomic displacements follow a Gaussian
distribution. This can entice differences for actual dis-
tributions of displacements in highly anharmonic solids
or close to the melting temperature, and makes this ap-
proach completely inapplicable on liquids.
In this Letter, we propose a new method named Ma-
chine Learning Assisted Canonical Sampling (MLACS),
which can be thought of as a generalization of the vari-
ational inference strategy used in the EHCS methods to
linear MLIP. MLACS consists in a SC variational pro-
cedure to generate configurations in order to best ap-
proximate the DFT canonical distribution and obtain
a near-DFT accuracy in the properties computed at a
arXiv:2210.11531v1 [cond-mat.mtrl-sci] 20 Oct 2022