2
tions Rlabelled with observations comprising
total energy ER, forces FRand perhaps virial
stresses VR, obtained from electronic structure
simulations. By performing a regression on the
training data model predictions Eof the total
energy, and estimates of the respective forces
Fi=−∇iEcan be determined. Here, the ∇i
operator denotes the gradient with respect to
the position of atom i.
Building suitable training databases remains
a challenge and the most time consuming task
in developing general data-driven interatomic
potentials [37–39]. Databases such as MD17
and ISO17 are typically created by performing
Molecular Dynamics (MD) simulations on the
structures of interest and selecting decorrelated
configurations along the trajectory. This ap-
proach samples the potential energy surface ac-
cording to its Boltzmann distribution. Once the
training database contains sufficient number of
configurations, a high dimensional model may
be regressed in order to accurately interpolate
its potential energy surface. The interpolation
accuracy can be improved by further sampling,
albeit with diminishing returns. However, it is
by no means clear that the Boltzmann distribu-
tion is the optimal measure, or even a “good”
measure, from which to draw samples for an
ML training database. Indeed, it likely results
in severe undersampling of configurations corre-
sponding to defects and transition states, par-
ticularly for material systems with high barri-
ers, which nevertheless have a profound effect
on material properties and are often the sub-
ject of intense study.
A lack of training data in a sub-region can
lead to deep unphysical energy minima in
trained models, sometimes called “holes”, which
are well known to cause catastrophic problems
for MD simulations: the trajectory can get
trapped in these unphysical minima or even be-
come unstable numerically for normal step sizes.
A natural strategy to prevent such problems
is active learning (AL): the simulation is aug-
mented with a stopping criterion aimed at de-
tecting when the model encounters a configura-
tion for which the prediction is unreliable. In-
tuitively, one can think of such configurations
as being “far” from the training set. When this
situation occurs, a ground-truth evaluation is
triggered, the training database extended, and
the model refitted to the enlarged database. In
the context of data-driven interatomic poten-
tials, this approach was successfully employed
by the linear moment tensor potentials [40, 41]
and the Gaussian process (GP) based methods
FLARE [42, 43] and GAP [44] which both use
site energy uncertainty arising from the GP to
formulate a stopping criterion in order to detect
unreliable predictions during simulations.
The key contribution of this work is the intro-
duction of the hyperactive learning framework.
Rather than relying on normal MD to sample
the potential energy and wait until an unreli-
able prediction appears (which may take a very
long time once the model is decent), we contin-
ually bias the MD simulation towards regions
of high uncertainty. By balancing the physi-
cal MD driving force with such a bias we ac-
celerate the discovery of unreliably predicted
configurations but retain the overall focus on
low energy regions that are important for mod-
elling. This exploration-exploitation trade-off
originates from Bayesian Optimisation (BO), a
technique used to efficiently optimise a compu-
tationally expensive “black box” function [45].
BO has been shown to yield state-of-the-art re-
sults for optimisation problems while simultane-
ously minimising incurred computational costs
by requiring fewer evaluations [46]. In atomistic
systems BO has been applied in global struc-
ture search [47–50] where the PES is optimised
to find stable structures. Other previous work
balancing exploration and exploitation in data-
driven interatomic potentials is also closely re-
lated, where configurations were generated by
balancing high uncertainty and high-likelihood
(or rather low-energy) [51]. Here the PES was
explored by perturbing geometries while mon-
itoring uncertainty rather than explicitly run-
ning MD. Note that upon the completion of this
work, we discovered a closely related work that
also uses uncertainty-biased MD[52]. The two
studies were performed independently, and ap-
peared on preprint servers near-simultaneously.
In BO an acquisition function balances explo-