Hyperactive Learning for Data-Driven Interatomic Potentials Cas van der Oord1Matthias Sachs2D avid P eter Kov acs1Christoph Ortner3and G abor Cs anyi1

2025-04-29 0 0 4.17MB 21 页 10玖币

Hyperactive Learning for Data-Driven Interatomic Potentials

Cas van der Oord,1, ∗Matthias Sachs,2D´avid P´eter

Kov´acs,1Christoph Ortner,3and G´abor Cs´anyi1

1University of Cambridge, Cambridge, CB2 1PZ, U.K.

2University of Birmingham, Birmingham, B15 2TT, U.K.

3University of British Columbia, Vancouver, BC, V6T 1Z2, Canada

(Dated: November 9, 2022)

Data-driven interatomic potentials have emerged as a powerful class of surrogate models for ab

initio potential energy surfaces that are able to reliably predict macroscopic properties with exper-

imental accuracy. In generating accurate and transferable potentials the most time-consuming and

arguably most important task is generating the training set, which still requires signiﬁcant expert

user input. To accelerate this process, this work presents hyperactive learning (HAL), a framework

for formulating an accelerated sampling algorithm speciﬁcally for the task of training database gen-

eration. The key idea is to start from a physically motivated sampler (e.g., molecular dynamics)

and add a biasing term that drives the system towards high uncertainty and thus to unseen training

conﬁgurations. Building on this framework, general protocols for building training databases for

alloys and polymers leveraging the HAL framework will be presented. For alloys, ACE potentials

for AlSi10 are created by ﬁtting to a minimal HAL-generated database containing 88 conﬁgura-

tions (32 atoms each) with fast evaluation times of <100 µs/atom/cpu-core. These potentials are

demonstrated to predict the melting temperature with excellent accuracy. For polymers, a HAL

database is built using ACE, able to determine the density of a long polyethylene glycol (PEG)

polymer formed of 200 monomer units with experimental accuracy by only ﬁtting to small isolated

PEG polymers with sizes ranging from 2 to 32.

I. INTRODUCTION

Over the last decade there has been rapid

progress in the development of data-driven in-

teratomic potentials, see the review papers

[1–6]. Many systems are often too complex

to be modelled by an empirical description

yet inaccessible to electronic structure methods

due to prohibitive computational cost. Richly

parametrised data-driven interatomic poten-

tials bridge this gap and are able to successfully

describe the underlying chemistry and physics

by approximating the potential energy surface

(PES) with quantum mechanical accuracy [7–

9]. This approximation is done by regressing

ahigh-dimensional model to training data col-

lected from electronic structure calculations.

Over the years many approaches have been

explored using a range of diﬀerent model archi-

∗casv2@cam.ac.uk

tectures. These include Artiﬁcial Neural Net-

works (ANN) based on atom centered symme-

try functions [10] and have been used in models

such as ANI [11, 12] and DeepMD [13]. An-

other widely used approach is Gaussian Pro-

cess Regression (GPR) implemented in mod-

els such as SOAP/GAP [14, 15], FCHL [16]

and sGDML [17]. Linear approximations of the

PES have also been introduced initially by us-

ing permutation invariant polynomials (PIPs)

[18] and the more recent atomic PIPs vari-

ant [19, 20]. Other linear models include spec-

tral neighbour analysis potentials [21] based on

the bispectrum [22], moment tensor potentials

[23] and the atomic cluster expansion (ACE)

[24–26]. More recently, message passing neural

network (MPNN) architectures have been in-

troduced [27–34] the most recent of which have

been able to outperform any of the previously

mentioned models regarding accuracy on bench-

marks such as MD17 [35] and ISO17 [36]. Cen-

tral to all of these models is that they are ﬁtted

to a training database comprised of conﬁgura-

arXiv:2210.04225v2 [physics.comp-ph] 7 Nov 2022

2

tions Rlabelled with observations comprising

total energy ER, forces FRand perhaps virial

stresses VR, obtained from electronic structure

simulations. By performing a regression on the

training data model predictions Eof the total

energy, and estimates of the respective forces

Fi=−∇iEcan be determined. Here, the ∇i

operator denotes the gradient with respect to

the position of atom i.

Building suitable training databases remains

a challenge and the most time consuming task

in developing general data-driven interatomic

potentials [37–39]. Databases such as MD17

and ISO17 are typically created by performing

Molecular Dynamics (MD) simulations on the

structures of interest and selecting decorrelated

conﬁgurations along the trajectory. This ap-

proach samples the potential energy surface ac-

cording to its Boltzmann distribution. Once the

training database contains suﬃcient number of

conﬁgurations, a high dimensional model may

be regressed in order to accurately interpolate

its potential energy surface. The interpolation

accuracy can be improved by further sampling,

albeit with diminishing returns. However, it is

by no means clear that the Boltzmann distribu-

tion is the optimal measure, or even a “good”

measure, from which to draw samples for an

ML training database. Indeed, it likely results

in severe undersampling of conﬁgurations corre-

sponding to defects and transition states, par-

ticularly for material systems with high barri-

ers, which nevertheless have a profound eﬀect

on material properties and are often the sub-

ject of intense study.

A lack of training data in a sub-region can

lead to deep unphysical energy minima in

trained models, sometimes called “holes”, which

are well known to cause catastrophic problems

for MD simulations: the trajectory can get

trapped in these unphysical minima or even be-

come unstable numerically for normal step sizes.

A natural strategy to prevent such problems

is active learning (AL): the simulation is aug-

mented with a stopping criterion aimed at de-

tecting when the model encounters a conﬁgura-

tion for which the prediction is unreliable. In-

tuitively, one can think of such conﬁgurations

as being “far” from the training set. When this

situation occurs, a ground-truth evaluation is

triggered, the training database extended, and

the model reﬁtted to the enlarged database. In

the context of data-driven interatomic poten-

tials, this approach was successfully employed

by the linear moment tensor potentials [40, 41]

and the Gaussian process (GP) based methods

FLARE [42, 43] and GAP [44] which both use

site energy uncertainty arising from the GP to

formulate a stopping criterion in order to detect

unreliable predictions during simulations.

The key contribution of this work is the intro-

duction of the hyperactive learning framework.

Rather than relying on normal MD to sample

the potential energy and wait until an unreli-

able prediction appears (which may take a very

long time once the model is decent), we contin-

ually bias the MD simulation towards regions

of high uncertainty. By balancing the physi-

cal MD driving force with such a bias we ac-

celerate the discovery of unreliably predicted

conﬁgurations but retain the overall focus on

low energy regions that are important for mod-

elling. This exploration-exploitation trade-oﬀ

originates from Bayesian Optimisation (BO), a

technique used to eﬃciently optimise a compu-

tationally expensive “black box” function [45].

BO has been shown to yield state-of-the-art re-

sults for optimisation problems while simultane-

ously minimising incurred computational costs

by requiring fewer evaluations [46]. In atomistic

systems BO has been applied in global struc-

ture search [47–50] where the PES is optimised

to ﬁnd stable structures. Other previous work

balancing exploration and exploitation in data-

driven interatomic potentials is also closely re-

lated, where conﬁgurations were generated by

balancing high uncertainty and high-likelihood

(or rather low-energy) [51]. Here the PES was

explored by perturbing geometries while mon-

itoring uncertainty rather than explicitly run-

ning MD. Note that upon the completion of this

work, we discovered a closely related work that

also uses uncertainty-biased MD[52]. The two

studies were performed independently, and ap-

peared on preprint servers near-simultaneously.

In BO an acquisition function balances explo-

3

ration and exploitation, controlled by a biasing

parameter.

EHAL := E−τσ (1)

The biasing strength, represented by biasing

parameter τ, controls the exploration of un-

seen parts of the PES and needs to be carefully

tuned in order for the HAL-MD trajectory to re-

main energetically sensible. An on-the-ﬂy auto-

tuning of τis presented in the Methods section.

The addition of a biasing potential, accelerating

the exploration of relevant conﬁgurations, has

a long history in the study of rare events and

free energy computations, using adaptive bias-

ing strategies such as meta-dynamics [53, 54],

umbrella sampling [55, 56], and similar methods

(e.g., [57, 58]). While the biasing force in these

methods is implicitly speciﬁed by the choice of

a collective variable, the direction of the biasing

force in HAL is the result of the choice of the

uncertainty measure σ.

We make the general HAL concept concrete

in the context of the ACE “machine learning po-

tential” framework [24, 25], however, the meth-

ods we propose are immediate applicable to lin-

ear models and to Gaussian process type mod-

els, and are in principle also extendable to any

other ML potential that comes with an uncer-

tainty measure, including deep neural network

models. In the context of linear ACE models,

described in detail in the methods section, the

site energy is deﬁned as a linear combination of

basis functions,

Ei=c·Bi.(2)

and total energy, E=PiEi=c·Bwhere

B=PiBi.

The prediction of the uncertainty σcan, for

example, be obtained through the use of an en-

semble. Diﬀerent methods of setting up such

ensembles for linear, GP or NN frameworks can

be used, such as dropout [59], or bootstrapping

[60]. In this work, we leverage the linearity

of the ACE model and adopt a Bayesian view

of the regression problem so that we are able

to use unbiased uncertainty estimation. The

drawback analytical estimates of uncertainty

is that often they are expensive to compute,

which would preclude their evaluation at every

MD time step, as needed by HAL. We circum-

vent this problem by setting up a committee

based estimator for the unbiased Bayesian un-

certainty measure, which yields an eﬃcient al-

gorithm with negligible overhead on top of ordi-

nary MD. Assuming an isotropic Gaussian prior

on the model parameters and Gaussian indepen-

dent and identically distributed (i.i.d) noise on

observations, yields an explicit posterior distri-

bution π(c) of the parameters from which one

can deduce the variance σ2

Eof the posterior-

predictive distribution of total energies,

σ2

E=1

λ+BTΣB,(3)

where the covariance matrix Σis deﬁned as

Σ−1=αI+λΨTΨ.(4)

Here, α, λ are hyperparameters whose treat-

ment is detailed in the methods section, and Ψ

is the corresponding design matrix of the linear

regression problem and depends on the obser-

vations to which the ACE model is ﬁtted.

The evaluation of the uncertainty or variance

σ2

Ein equation (3) is computationally expen-

sive for a large basis B; scaling as O(N2

basis).

To improve computational eﬃciency, σ2

Ecan be

approximated by using an ensemble {ck}K

k=1 ob-

tained by sampling from the posterior π(c) (see

Methods for further details), resulting in

˜σ2

E=1

λ+1

K

K

X

k=1

(Ek−¯

E)2,(5)

where ¯

E=¯

c·Bwith ¯c being the posterior mean

of the posterior distribution whose closed form

is provided in (22) of the methods section. This

is computationally eﬃcient to evaluate, requir-

ing a single basis evaluation Bfollowed by K

dot-products with the ensemble parameters.

Throughout the remainder of this article we

will ﬁx the choice of uncertainty measure in the

deﬁnition of the HAL energy to be the stan-

dard deviation of the posterior-predictive distri-

bution of energy as outlined above, i.e., σ=σE,

4

which we approximate as ˜σ= ˜σE. From both a

theoretical and modelling perspective, it would

be of interest to consider other measures of un-

certainty as biasing terms. Further discussion of

this aspect is provided in the methods section.

Having introduced HAL-MD it remains to

specify a stopping criterion that can be used to

terminate the dynamics and extract new train-

ing conﬁgurations. To that end we introduce a

relative force uncertainty,fi, which is attractive

from a modelling perspective, as for instance

liquid and phonon properties require vastly dif-

ferent absolute force accuracy but similar rel-

ative force accuracy, typically on the order of

3-10%. Given the model committee we intro-

duced to deﬁne ˜σwe deﬁne

fi=

1

KPK

k=1 kFk

i−¯

Fik

k¯

Fik+ε,(6)

where ¯

Fiis the mean force prediction. Further,

εis a regularising constant to prevent diver-

gence of the fraction, and to be speciﬁed by

the user, often set to around 0.2 eV/

A. During

HAL simulations, fiprovides a computationally

eﬃcient means to detect emerging local (force)

uncertainties and trigger new ab initio calcula-

tions once it exceeds a predeﬁned tolerance,

max

ifi> ftol.(7)

The speciﬁcation of ftol is both training data

and model speciﬁc, and often requires careful

tuning to achieve good performance. Too low

ftol keeps triggering unnecessary ab initio cal-

culations, whereas too high leads to generation

of unphysical high energy conﬁgurations. To

avoid manual tuning and aid generality, we nor-

malise fionto [0,1] through the application of

the softmax function s(fi), resulting in the new

stopping criterion

max

i

exp fi

Piexp fi

> stol,(8)

where we use the default tolerance stol = 0.5.

The paper is structured as follows. Follow-

ing an initial discussion of the performance of

the relative force error measure fi, its abil-

ity to predict true error is investigated and

its performance benchmarked by assembling

a reduced diamond structure silicon database.

Next, the HAL framework is used to build train-

ing databases for an alloy (AlSi10) and polymer

(polyethylene glycol or PEG) from scratch and

the ability of the resulting ACE models are able

to accurately predict the AlSi10 melting tem-

perature and PEG density are shown.

II. RESULTS AND DISCUSSION

A. Filtering an existing training set

Before illustrating the HAL algorithm itself,

we ﬁrst demonstrate the ability of the relative

force error estimate fiin Eq. (6) to detect true

relative force errors. To that end, we will use

estimator to signiﬁcantly reduce a large training

set while maintaining accurate model properties

relative to the DFT reference. The database we

use for this demonstration was originally devel-

oped for a Si GAP model [38] a wide range of

structures ranging from bulk crystals in various

phases, amorphous, liquid and vacancy conﬁg-

urations. The ﬁltering process builds a reduced

database by starting from a single conﬁgura-

tion and selecting conﬁgurations containing the

maximum fifrom the remaining test conﬁgu-

rations. Iterating this process accelerates the

learning rate and rapidly converges model prop-

erties with respect to the DFT reference. The

models we train are linear ACE models basis

functions up to correlation order ν=3, polyno-

mial degree 20, outer cutoﬀ set to 5.5 

A and

inner cutoﬀ set to the closest interatomic dis-

tance in the training database. An auxiliary

pair potential basis was used using polynomial

degree 3 outer cutoﬀ 7.0 

A and no inner cutoﬀ.

The weights for the energy wE, forces wFand

virials wV, which are described in detail in the

Methods section, were set to 5.0/1.0/1.0. The

size of the committees used to determine fiwas

K= 32.

5

FIG. 1: a) Maximum relative force error

estimate max fiversus error correlation plots

for silicon diamond containing 4 and 10

training conﬁgurations. b) Learning rate

comparison between ﬁltering and random

selection for silicon diamond.

1. Si diamond: error correlation and convergence

Prior to training database reduction the abil-

ity of the relative force error estimate fito pre-

dict relative force error is investigated. Fig. 1a

compares the maximum relative force error in

a conﬁguration against the maximum of fifor

two diﬀerent training databases, containing 4

and 10 silicon diamond conﬁgurations respec-

tively. The test conﬁgurations are the remain-

ing conﬁgurations contained in the 489 silicon

diamond conﬁgurations as part of the entire

silicon database (totalling 16708 local environ-

ments). The regularising constant εwas set to

the mean force magnitude as predicted by the

mean parameterisation. Both ﬁgures show good

correlation between maximum relative force er-

ror and max fi, therefore making it a suitable

criterion to be monitored during (H)AL strate-

gies.

By leveraging the correlation of fiwith true

relative force error the existing silicon diamond

database can be reduced by iteratively select-

ing conﬁgurations containing the largest rela-

tive force uncertainty as part of a greedy algo-

rithms strategy. To demonstrate this a sin-

gle conﬁguration from the 489 silicon diamond

conﬁgurations the silicon database was ﬁt-

ted. Next, fiwas determined over the remain-

ing conﬁgurations and the conﬁguration con-

taining the largest max fiadded to the training

database. This process was repeated train and

test error of this ﬁltering procedure for silicon

diamond is shown in Fig. 1b. It is benchmarked

against performing random selection whereby,

starting from the same initial conﬁguration, test

conﬁgurations were chosen at random from the

pool of remaining . The result indicates that

fiaccurately detects conﬁgurations with large

errors and manages to accelerate the learning

rate signiﬁcantly relative to random selection.

Good generalisation between training and test

errors is achieved by using around 5% of the to-

tal environment contained in the original silicon

diamond database.

2. Si diamond: property convergence

The signiﬁcant acceleration of the learning

rate shown in Fig. 1b shows that generalisa-

tion between train and test error is rapidly

achieved, in turn suggesting that property con-

vergence is accelerated too. This is investi-

gated These properties elastic constants, en-

ergy volume curves, phonon spectrum and ther-

mal properties for bulk silicon diamond.

Fig. 2 demonstrates that property conver-

gence for the energy volume curves, phonon

spectrum and thermal properties are rapidly

achieved by ﬁtting to a fraction of the origi-

nal database. to 5% of the original database

reaches suﬃcient accuracy to describe all prop-

erties with good accuracy with respect to the

DFT reference. This is again conﬁrmed by

elastic constants as predicted by the respective

models as shown in Table. I. The convergence

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HyperactiveLearningforData-DrivenInteratomicPotentialsCasvanderOord,1,MatthiasSachs,2DavidPeterKovacs,1ChristophOrtner,3andGaborCsanyi11UniversityofCambridge,Cambridge,CB21PZ,U.K.2UniversityofBirmingham,Birmingham,B152TT,U.K.3UniversityofBritishColumbia,Vancouver,BC,V6T1Z2,Canada(Dated:Novembe...

展开>> 收起<<

Hyperactive Learning for Data-Driven Interatomic Potentials Cas van der Oord1Matthias Sachs2D avid P eter Kov acs1Christoph Ortner3and G abor Cs anyi1.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

相关推荐

更多

立即下载

分类：图书资源 价格：10玖币 属性：21 页 大小：4.17MB 格式：PDF 时间：2025-04-29

开通VIP享超值会员特权

多端同步记录
高速下载文档
免费文档工具
分享文档赚钱
每日登录抽奖
优质衍生服务

作者详情

MAOOA..
高级编辑

文档 14218 粉丝 0

相关内容

更多

热门标签

人际关系配电装置动力学连接体力的合成高考理综全宋诗作者索引公务员考试

/ 21

评分收藏

立即下载

关于我们联系我们隐私政策用户协议免责申明会员服务协议
本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！ Copyright ©Jiubeiyunall rights reserved SITEMAP| 备案号：渝ICP备2024044455号| 渝公网安备50010702506394 | 违法与不良信息举报方式：微信:jiubeiyun2024,QQ:264159069,电话:15523442343,邮箱:jiubeiyun@126.com

客服

关注

二维码已失效
刷新

打开微信，点击“扫一扫”

安全高效便捷

免密登录