Hyperactive Learning for Data-Driven Interatomic Potentials Cas van der Oord1Matthias Sachs2D avid P eter Kov acs1Christoph Ortner3and G abor Cs anyi1

2025-04-29 0 0 4.17MB 21 页 10玖币
侵权投诉
Hyperactive Learning for Data-Driven Interatomic Potentials
Cas van der Oord,1, Matthias Sachs,2avid P´eter
Koacs,1Christoph Ortner,3and G´abor Cs´anyi1
1University of Cambridge, Cambridge, CB2 1PZ, U.K.
2University of Birmingham, Birmingham, B15 2TT, U.K.
3University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
(Dated: November 9, 2022)
Data-driven interatomic potentials have emerged as a powerful class of surrogate models for ab
initio potential energy surfaces that are able to reliably predict macroscopic properties with exper-
imental accuracy. In generating accurate and transferable potentials the most time-consuming and
arguably most important task is generating the training set, which still requires significant expert
user input. To accelerate this process, this work presents hyperactive learning (HAL), a framework
for formulating an accelerated sampling algorithm specifically for the task of training database gen-
eration. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics)
and add a biasing term that drives the system towards high uncertainty and thus to unseen training
configurations. Building on this framework, general protocols for building training databases for
alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials
for AlSi10 are created by fitting to a minimal HAL-generated database containing 88 configura-
tions (32 atoms each) with fast evaluation times of <100 µs/atom/cpu-core. These potentials are
demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL
database is built using ACE, able to determine the density of a long polyethylene glycol (PEG)
polymer formed of 200 monomer units with experimental accuracy by only fitting to small isolated
PEG polymers with sizes ranging from 2 to 32.
I. INTRODUCTION
Over the last decade there has been rapid
progress in the development of data-driven in-
teratomic potentials, see the review papers
[1–6]. Many systems are often too complex
to be modelled by an empirical description
yet inaccessible to electronic structure methods
due to prohibitive computational cost. Richly
parametrised data-driven interatomic poten-
tials bridge this gap and are able to successfully
describe the underlying chemistry and physics
by approximating the potential energy surface
(PES) with quantum mechanical accuracy [7–
9]. This approximation is done by regressing
ahigh-dimensional model to training data col-
lected from electronic structure calculations.
Over the years many approaches have been
explored using a range of different model archi-
casv2@cam.ac.uk
tectures. These include Artificial Neural Net-
works (ANN) based on atom centered symme-
try functions [10] and have been used in models
such as ANI [11, 12] and DeepMD [13]. An-
other widely used approach is Gaussian Pro-
cess Regression (GPR) implemented in mod-
els such as SOAP/GAP [14, 15], FCHL [16]
and sGDML [17]. Linear approximations of the
PES have also been introduced initially by us-
ing permutation invariant polynomials (PIPs)
[18] and the more recent atomic PIPs vari-
ant [19, 20]. Other linear models include spec-
tral neighbour analysis potentials [21] based on
the bispectrum [22], moment tensor potentials
[23] and the atomic cluster expansion (ACE)
[24–26]. More recently, message passing neural
network (MPNN) architectures have been in-
troduced [27–34] the most recent of which have
been able to outperform any of the previously
mentioned models regarding accuracy on bench-
marks such as MD17 [35] and ISO17 [36]. Cen-
tral to all of these models is that they are fitted
to a training database comprised of configura-
arXiv:2210.04225v2 [physics.comp-ph] 7 Nov 2022
2
tions Rlabelled with observations comprising
total energy ER, forces FRand perhaps virial
stresses VR, obtained from electronic structure
simulations. By performing a regression on the
training data model predictions Eof the total
energy, and estimates of the respective forces
Fi=−∇iEcan be determined. Here, the i
operator denotes the gradient with respect to
the position of atom i.
Building suitable training databases remains
a challenge and the most time consuming task
in developing general data-driven interatomic
potentials [37–39]. Databases such as MD17
and ISO17 are typically created by performing
Molecular Dynamics (MD) simulations on the
structures of interest and selecting decorrelated
configurations along the trajectory. This ap-
proach samples the potential energy surface ac-
cording to its Boltzmann distribution. Once the
training database contains sufficient number of
configurations, a high dimensional model may
be regressed in order to accurately interpolate
its potential energy surface. The interpolation
accuracy can be improved by further sampling,
albeit with diminishing returns. However, it is
by no means clear that the Boltzmann distribu-
tion is the optimal measure, or even a “good”
measure, from which to draw samples for an
ML training database. Indeed, it likely results
in severe undersampling of configurations corre-
sponding to defects and transition states, par-
ticularly for material systems with high barri-
ers, which nevertheless have a profound effect
on material properties and are often the sub-
ject of intense study.
A lack of training data in a sub-region can
lead to deep unphysical energy minima in
trained models, sometimes called “holes”, which
are well known to cause catastrophic problems
for MD simulations: the trajectory can get
trapped in these unphysical minima or even be-
come unstable numerically for normal step sizes.
A natural strategy to prevent such problems
is active learning (AL): the simulation is aug-
mented with a stopping criterion aimed at de-
tecting when the model encounters a configura-
tion for which the prediction is unreliable. In-
tuitively, one can think of such configurations
as being “far” from the training set. When this
situation occurs, a ground-truth evaluation is
triggered, the training database extended, and
the model refitted to the enlarged database. In
the context of data-driven interatomic poten-
tials, this approach was successfully employed
by the linear moment tensor potentials [40, 41]
and the Gaussian process (GP) based methods
FLARE [42, 43] and GAP [44] which both use
site energy uncertainty arising from the GP to
formulate a stopping criterion in order to detect
unreliable predictions during simulations.
The key contribution of this work is the intro-
duction of the hyperactive learning framework.
Rather than relying on normal MD to sample
the potential energy and wait until an unreli-
able prediction appears (which may take a very
long time once the model is decent), we contin-
ually bias the MD simulation towards regions
of high uncertainty. By balancing the physi-
cal MD driving force with such a bias we ac-
celerate the discovery of unreliably predicted
configurations but retain the overall focus on
low energy regions that are important for mod-
elling. This exploration-exploitation trade-off
originates from Bayesian Optimisation (BO), a
technique used to efficiently optimise a compu-
tationally expensive “black box” function [45].
BO has been shown to yield state-of-the-art re-
sults for optimisation problems while simultane-
ously minimising incurred computational costs
by requiring fewer evaluations [46]. In atomistic
systems BO has been applied in global struc-
ture search [47–50] where the PES is optimised
to find stable structures. Other previous work
balancing exploration and exploitation in data-
driven interatomic potentials is also closely re-
lated, where configurations were generated by
balancing high uncertainty and high-likelihood
(or rather low-energy) [51]. Here the PES was
explored by perturbing geometries while mon-
itoring uncertainty rather than explicitly run-
ning MD. Note that upon the completion of this
work, we discovered a closely related work that
also uses uncertainty-biased MD[52]. The two
studies were performed independently, and ap-
peared on preprint servers near-simultaneously.
In BO an acquisition function balances explo-
3
ration and exploitation, controlled by a biasing
parameter.
EHAL := Eτσ (1)
The biasing strength, represented by biasing
parameter τ, controls the exploration of un-
seen parts of the PES and needs to be carefully
tuned in order for the HAL-MD trajectory to re-
main energetically sensible. An on-the-fly auto-
tuning of τis presented in the Methods section.
The addition of a biasing potential, accelerating
the exploration of relevant configurations, has
a long history in the study of rare events and
free energy computations, using adaptive bias-
ing strategies such as meta-dynamics [53, 54],
umbrella sampling [55, 56], and similar methods
(e.g., [57, 58]). While the biasing force in these
methods is implicitly specified by the choice of
a collective variable, the direction of the biasing
force in HAL is the result of the choice of the
uncertainty measure σ.
We make the general HAL concept concrete
in the context of the ACE “machine learning po-
tential” framework [24, 25], however, the meth-
ods we propose are immediate applicable to lin-
ear models and to Gaussian process type mod-
els, and are in principle also extendable to any
other ML potential that comes with an uncer-
tainty measure, including deep neural network
models. In the context of linear ACE models,
described in detail in the methods section, the
site energy is defined as a linear combination of
basis functions,
Ei=c·Bi.(2)
and total energy, E=PiEi=c·Bwhere
B=PiBi.
The prediction of the uncertainty σcan, for
example, be obtained through the use of an en-
semble. Different methods of setting up such
ensembles for linear, GP or NN frameworks can
be used, such as dropout [59], or bootstrapping
[60]. In this work, we leverage the linearity
of the ACE model and adopt a Bayesian view
of the regression problem so that we are able
to use unbiased uncertainty estimation. The
drawback analytical estimates of uncertainty
is that often they are expensive to compute,
which would preclude their evaluation at every
MD time step, as needed by HAL. We circum-
vent this problem by setting up a committee
based estimator for the unbiased Bayesian un-
certainty measure, which yields an efficient al-
gorithm with negligible overhead on top of ordi-
nary MD. Assuming an isotropic Gaussian prior
on the model parameters and Gaussian indepen-
dent and identically distributed (i.i.d) noise on
observations, yields an explicit posterior distri-
bution π(c) of the parameters from which one
can deduce the variance σ2
Eof the posterior-
predictive distribution of total energies,
σ2
E=1
λ+BTΣB,(3)
where the covariance matrix Σis defined as
Σ1=αI+λΨTΨ.(4)
Here, α, λ are hyperparameters whose treat-
ment is detailed in the methods section, and Ψ
is the corresponding design matrix of the linear
regression problem and depends on the obser-
vations to which the ACE model is fitted.
The evaluation of the uncertainty or variance
σ2
Ein equation (3) is computationally expen-
sive for a large basis B; scaling as O(N2
basis).
To improve computational efficiency, σ2
Ecan be
approximated by using an ensemble {ck}K
k=1 ob-
tained by sampling from the posterior π(c) (see
Methods for further details), resulting in
˜σ2
E=1
λ+1
K
K
X
k=1
(Ek¯
E)2,(5)
where ¯
E=¯
c·Bwith ¯c being the posterior mean
of the posterior distribution whose closed form
is provided in (22) of the methods section. This
is computationally efficient to evaluate, requir-
ing a single basis evaluation Bfollowed by K
dot-products with the ensemble parameters.
Throughout the remainder of this article we
will fix the choice of uncertainty measure in the
definition of the HAL energy to be the stan-
dard deviation of the posterior-predictive distri-
bution of energy as outlined above, i.e., σ=σE,
4
which we approximate as ˜σ= ˜σE. From both a
theoretical and modelling perspective, it would
be of interest to consider other measures of un-
certainty as biasing terms. Further discussion of
this aspect is provided in the methods section.
Having introduced HAL-MD it remains to
specify a stopping criterion that can be used to
terminate the dynamics and extract new train-
ing configurations. To that end we introduce a
relative force uncertainty,fi, which is attractive
from a modelling perspective, as for instance
liquid and phonon properties require vastly dif-
ferent absolute force accuracy but similar rel-
ative force accuracy, typically on the order of
3-10%. Given the model committee we intro-
duced to define ˜σwe define
fi=
1
KPK
k=1 kFk
i¯
Fik
k¯
Fik+ε,(6)
where ¯
Fiis the mean force prediction. Further,
εis a regularising constant to prevent diver-
gence of the fraction, and to be specified by
the user, often set to around 0.2 eV/
A. During
HAL simulations, fiprovides a computationally
efficient means to detect emerging local (force)
uncertainties and trigger new ab initio calcula-
tions once it exceeds a predefined tolerance,
max
ifi> ftol.(7)
The specification of ftol is both training data
and model specific, and often requires careful
tuning to achieve good performance. Too low
ftol keeps triggering unnecessary ab initio cal-
culations, whereas too high leads to generation
of unphysical high energy configurations. To
avoid manual tuning and aid generality, we nor-
malise fionto [0,1] through the application of
the softmax function s(fi), resulting in the new
stopping criterion
max
i
exp fi
Piexp fi
> stol,(8)
where we use the default tolerance stol = 0.5.
The paper is structured as follows. Follow-
ing an initial discussion of the performance of
the relative force error measure fi, its abil-
ity to predict true error is investigated and
its performance benchmarked by assembling
a reduced diamond structure silicon database.
Next, the HAL framework is used to build train-
ing databases for an alloy (AlSi10) and polymer
(polyethylene glycol or PEG) from scratch and
the ability of the resulting ACE models are able
to accurately predict the AlSi10 melting tem-
perature and PEG density are shown.
II. RESULTS AND DISCUSSION
A. Filtering an existing training set
Before illustrating the HAL algorithm itself,
we first demonstrate the ability of the relative
force error estimate fiin Eq. (6) to detect true
relative force errors. To that end, we will use
estimator to significantly reduce a large training
set while maintaining accurate model properties
relative to the DFT reference. The database we
use for this demonstration was originally devel-
oped for a Si GAP model [38] a wide range of
structures ranging from bulk crystals in various
phases, amorphous, liquid and vacancy config-
urations. The filtering process builds a reduced
database by starting from a single configura-
tion and selecting configurations containing the
maximum fifrom the remaining test configu-
rations. Iterating this process accelerates the
learning rate and rapidly converges model prop-
erties with respect to the DFT reference. The
models we train are linear ACE models basis
functions up to correlation order ν=3, polyno-
mial degree 20, outer cutoff set to 5.5
A and
inner cutoff set to the closest interatomic dis-
tance in the training database. An auxiliary
pair potential basis was used using polynomial
degree 3 outer cutoff 7.0
A and no inner cutoff.
The weights for the energy wE, forces wFand
virials wV, which are described in detail in the
Methods section, were set to 5.0/1.0/1.0. The
size of the committees used to determine fiwas
K= 32.
5
FIG. 1: a) Maximum relative force error
estimate max fiversus error correlation plots
for silicon diamond containing 4 and 10
training configurations. b) Learning rate
comparison between filtering and random
selection for silicon diamond.
1. Si diamond: error correlation and convergence
Prior to training database reduction the abil-
ity of the relative force error estimate fito pre-
dict relative force error is investigated. Fig. 1a
compares the maximum relative force error in
a configuration against the maximum of fifor
two different training databases, containing 4
and 10 silicon diamond configurations respec-
tively. The test configurations are the remain-
ing configurations contained in the 489 silicon
diamond configurations as part of the entire
silicon database (totalling 16708 local environ-
ments). The regularising constant εwas set to
the mean force magnitude as predicted by the
mean parameterisation. Both figures show good
correlation between maximum relative force er-
ror and max fi, therefore making it a suitable
criterion to be monitored during (H)AL strate-
gies.
By leveraging the correlation of fiwith true
relative force error the existing silicon diamond
database can be reduced by iteratively select-
ing configurations containing the largest rela-
tive force uncertainty as part of a greedy algo-
rithms strategy. To demonstrate this a sin-
gle configuration from the 489 silicon diamond
configurations the silicon database was fit-
ted. Next, fiwas determined over the remain-
ing configurations and the configuration con-
taining the largest max fiadded to the training
database. This process was repeated train and
test error of this filtering procedure for silicon
diamond is shown in Fig. 1b. It is benchmarked
against performing random selection whereby,
starting from the same initial configuration, test
configurations were chosen at random from the
pool of remaining . The result indicates that
fiaccurately detects configurations with large
errors and manages to accelerate the learning
rate significantly relative to random selection.
Good generalisation between training and test
errors is achieved by using around 5% of the to-
tal environment contained in the original silicon
diamond database.
2. Si diamond: property convergence
The significant acceleration of the learning
rate shown in Fig. 1b shows that generalisa-
tion between train and test error is rapidly
achieved, in turn suggesting that property con-
vergence is accelerated too. This is investi-
gated These properties elastic constants, en-
ergy volume curves, phonon spectrum and ther-
mal properties for bulk silicon diamond.
Fig. 2 demonstrates that property conver-
gence for the energy volume curves, phonon
spectrum and thermal properties are rapidly
achieved by fitting to a fraction of the origi-
nal database. to 5% of the original database
reaches sufficient accuracy to describe all prop-
erties with good accuracy with respect to the
DFT reference. This is again confirmed by
elastic constants as predicted by the respective
models as shown in Table. I. The convergence
摘要:

HyperactiveLearningforData-DrivenInteratomicPotentialsCasvanderOord,1,MatthiasSachs,2DavidPeterKovacs,1ChristophOrtner,3andGaborCsanyi11UniversityofCambridge,Cambridge,CB21PZ,U.K.2UniversityofBirmingham,Birmingham,B152TT,U.K.3UniversityofBritishColumbia,Vancouver,BC,V6T1Z2,Canada(Dated:Novembe...

展开>> 收起<<
Hyperactive Learning for Data-Driven Interatomic Potentials Cas van der Oord1Matthias Sachs2D avid P eter Kov acs1Christoph Ortner3and G abor Cs anyi1.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:4.17MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注