Tensor-reduced atomic density representations James P. Darby1 2Dávid P. Kovács2Ilyes Batatia2 3Miguel A. Caro4 Gus L. W. Hart5Christoph Ortner6and Gábor Csányi2

2025-05-02 0 0 1001.95KB 14 页 10玖币
侵权投诉
Tensor-reduced atomic density representations
James P. Darby,1, 2 vid P. Kovács,2Ilyes Batatia,2, 3 Miguel A. Caro,4
Gus L. W. Hart,5Christoph Ortner,6and Gábor Csányi2
1Warwick Centre for Predictive Modelling, School of Engineering,
University of Warwick, Coventry, CV4 7AL, UK
2Engineering Laboratory, University of Cambridge, Cambridge, CB2 1PZ UK
3ENS Paris-Saclay, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
4Department of Electrical Engineering and Automation, Aalto University, FIN-02150 Espoo, Finland
5Department of Physics and Astronomy, Brigham Young University, Provo, Utah, 84602, USA
6Department of Mathematics, University of British Columbia,
1984 Mathematics Road, Vancouver, BC, Canada V6T 1Z2
(Dated: December 7, 2022)
Density based representations of atomic environments that are invariant under Euclidean sym-
metries have become a widely used tool in the machine learning of interatomic potentials, broader
data-driven atomistic modelling and the visualisation and analysis of materials datasets. The stan-
dard mechanism used to incorporate chemical element information is to create separate densities
for each element and form tensor products between them. This leads to a steep scaling in the size
of the representation as the number of elements increases. Graph neural networks, which do not
explicitly use density representations, escape this scaling by mapping the chemical element infor-
mation into a fixed dimensional space in a learnable way. By exploiting symmetry, we recast this
approach as tensor factorisation of the standard neighbour density based descriptors and, using a
new notation, identify connections to existing compression algorithms. In doing so, we form com-
pact tensor-reduced representation of the local atomic environment whose size does not depend on
the number of chemical elements, is systematically convergable and therefore remains applicable to
a wide range of data analysis and regression tasks.
Over the past decade, machine learning methods
for studying atomistic systems have become widely
adopted [1–3]. Most of these methods utilise representa-
tions of local atomic environments that are invariant un-
der relevant symmetries; typically rotations, reflections,
translations and permutations of equivalent atoms [4].
Enforcing these symmetries allows for greater data effi-
ciency during model training and ensures that predictions
are made in a physically consistent manner. There are
many different ways of constructing such representations
which are broadly split into two categories: (i) descrip-
tors based on internal coordinates, such as the Behler-
Parrinello Atom-Centered Symmetry Functions [5], and
(ii) density-based descriptors such as Smooth Overlap of
Atomic Positions (SOAP) [6] or the bispectrum [7, 8],
which employ a symmetrised expansion of ν-correlations
of the atomic neighbourhood density (ν= 2 for SOAP
and ν= 3 for the bispectrum). A major drawback of all
these representations is that their size increases dramati-
cally with the number of chemical elements Sin the sys-
tem. For instance, the number of features in the linearly
complete Atomic Cluster Expansion (ACE) [9, 10] de-
scriptor which unifies, extends and generalises the afore-
mentioned representations, scales as Sνfor terms with
correlation order ν(i.e. a body order of ν+ 1). This
poor scaling severely restricts the use of these represen-
tations in many applications. For example, in the case of
*These authors contributed equally.
machine learned interatomic potentials for systems with
many (e.g. more than 5) different chemical elements,
the large size of the models results in memory limita-
tions being reached during parameter estimation as well
as significantly reducing evaluation speed.
Multiple strategies to tackle this scaling problem have
been proposed including element weighting [11, 12] or
embedding the elements into a fixed small dimensional
space [13, 14], directly reducing the element-sensitive cor-
relation order [15], low-rank tensor-train approximations
for lattice models [16] and data-driven approaches for se-
lecting the most relevant subset or combination of the
original features for a given dataset [17–19]. A rather
different class of machine learning methods are Message
Passing Neural Networks (MPNNs) [20, 21]. Instead of
constructing full tensor products, these models also em-
bed chemical element information in a fixed size latent
space using a learnable transformation RSRKwhere
Kis the dimension of the latent space, and thus avoid
the poor scaling with the number of chemical elements.
Recently these methods have achieved very high accu-
racy [22–24], strongly suggesting that the true complex-
ity of the relevant chemical element space does not grow
as Sν.
In this paper we introduce a general approach for sig-
nificantly reducing the scaling of density-based represen-
tations like SOAP and ACE. We show that by exploiting
the tensor structures of the descriptors and applying low-
rank approximations we can derive new tensor-reduced
descriptors which are systematically convergeable to the
arXiv:2210.01705v2 [physics.chem-ph] 6 Dec 2022
2
original full descriptor limit. We also verify this with
numerical experiments on real data. We also show that
there is a natural generalisation to compress not only the
chemical element information but also the radial degrees
of freedom, yielding an even more compact representa-
tion. When fitting interatomic potentials for organic
molecules and high entropy alloys, we achieve a ten-fold
reduction in the number of features required when using
linear (ACE) and nonlinear kernel models (SOAP-GAP).
We also fit a linear model to a dataset with 37 chemical
elements which would be infeasible without the tensor-
reduced features.
All many-body density based descriptors can be under-
stood in terms of the Atomic Cluster Expansion [9]. In
ACE, the first step in describing the local neighbourhood
N(i) = {j:rij < rcut}around atom iis forming the one-
particle basis φznlm(rij , Zj)as a product of radial basis
functions Rn, spherical harmonics Ym
land an additional
element index shown in Eq. (1), where rij and Zjdenote
the relative position and atomic number of neighbour j.
Permutation invariance is introduced by summing over
neighbour atoms in Eq. (2) after which (ν+ 1)-body fea-
tures are formed in Eq. (3) by taking tensor products
of the atomic basis Ai,znlm with itself νtimes. Finally,
Eq. (4) shows how the product basis Ai,znlm is rotation-
ally symmetrised using the generalised Clebsch-Gordon
coefficients Clη
m, where ηenumerates all possible symmet-
ric couplings cf. [9, 10, 17] for the details.
φznlm(rij , Zj) = Rn(rij )Ym
l(ˆ
rij )δzZj,(1)
Ai,znlm =X
j∈N (i)
φznlm(rij , Zj),(2)
Ai,znlm =
ν
Y
t=1
Ai,ztntltmt(3)
Bi,znlη=X
mClη
mAi,znlm (4)
A linear ACE model can the be fit to an invariant atomic
property ϕias
ϕi=X
znlη
cznlηBi,znlη(5)
where cznlηare the model parameters and for practi-
cal reasons the expansion is truncated using νmax,lmax
and nmax =N. Note that as Bi,znlηis invariant under
(za, na, la)(zb, nb, lb)symmetrically equivalent terms
are usually omitted from Eq. (5), again see [9, 10, 17] for
the details.
Crucially, the tensor product in Eq. (3) causes the
number of features (and therefore the number of model
parameters) to grow rapidly as O(NνSν). Previous work
[13, 18] has reduced this to O(Kν)by first embedding
the chemical and radial information into Kchannels
(Eq. (6)), then taking a full tensor product across the
¯
Ai,klm (Eq. (7)).
¯
Ai,klm =X
zn
Wk
znAi,znlm, k = 1 . . . K (6)
¯
Ai,klm =
ν
Y
t=1
¯
Ai,ktltmt(7)
This approach is also used in Moment Tensor Potentials
[14, 25] and in Gaussian Moment descriptors [26]. The
embedding can be identified in Eq. 3 of ref. [25], where
µindexes the embedded channels and νis similar to lin
ACE. Then taking tensor products across the embedded
channels corresponds to forming products of the moments
Mµν . In general, the embedding weights are optimised
either before or during fitting [13, 25] with the latter
causing the models to be non-linear.
We propose a principled approach to further reduce
the size of the basis to O(K)which can be understood
from two different angles. First, we identify the model
parameters cηcznlηin Eq. (5) as a symmetric tensor,
invariant under (za, na, la)(zb, nb, lb), which can be
expanded as a sum of products of rank-1 tensors as,
cη=
K
X
k=1
λ wkwk··· ⊗ wk
|{z }
νtimes
,(8)
or in component form
cznlη=
K
X
k
λ
ν
Y
t=1
Wk
ztntlt(9)
where Wk
ztntltare the components of wk. This expansion
is exact for finite K, as cis finite due to basis trunca-
tion, and is equivalent to eigenvalue decomposition of a
symmetric matrix when ν= 2. Note that we choose to
use the same weights Wk
ztntltfor all νand η, which sig-
nificantly reduces the number of weights that need be
specified. In practice, we can choose to expand over the
zn or zindices only, see SI for details, and then substi-
tute the expansion into Eq. (5) as
ϕiX
klη
λklη"X
mClη
mX
zn
ν
Y
t=1
Wklt
ztntAi,ztntltmt#(10)
=X
klη
λklη"X
mClη
m
ν
Y
t=1
˜
Ai,kltmt#(11)
=X
klη
λklη˜
Bi,klη(12)
where ˜
Bi,klηare the new tensor reduced features and the
approximation arises because in practice we truncate the
tensor decomposition early. The key novelty is that only
element-wise products are taken across the kindex of
the embedded channels ˜
Ai,kltmtwhen forming the many-
body basis, rather than a full tensor product, i.e. kdoes
3
not have a tsubscript in Eq. 10 (see Table I for the full
definitions). For completeness, we note that applying this
tensor reduction to the elements only and using K= 2
is equivalent to the element-weighting strategies used in
[11, 12, 27].
There are multiple natural strategies for specifying the
embedding weights Wkl
zn, including approximating a pre-
computed cznlηor treating the weights as model param-
eters to be estimated during the training process, as is
done in MACE [24]. Here we investigate using random
weights as a simpler alternative. This ensures that Eq. 12
remains a linear model and allows the ˜
Bi,klηto be used
directly in other tasks such as data visualisation.
We now show that the resulting tensor-reduced fea-
tures can also be understood from the perspective of di-
rectly compressing the original Bi,znlηfeatures. Random
Projection (RP) [29, 30] is an established technique where
high dimensional feature vectors {~x1, . . . , ~xN} ⊂ Rdare
compressed as ˜xi=W~xiRK, with the entries of the
matrix Wbeing normally distributed. This approach
is simple, offers a tuneable level of compression and is
underpinned by the Johnson-Lindenstrauss Lemma [31]
which bounds the fractional error made in approximat-
ing ~xT
i~xjby ˜xT
i˜xj. RP can also be used to reduce the
cost of linear models, with a closely related approach re-
cently used in [32]. In Compressed Least-Square Regres-
sion (CLSR) [33–35] features are replaced by their pro-
jections, thus reducing the number of model parameters.
Loosely speaking, the approximation errors incurred in
CLSR (and RP in general) are expected to decay as
1/Kand we refer to refs. [33, 35, 36] for more details.
The drawback of RP is that it requires the full feature
vector to be constructed so that applying RP to ACE
would not avoid the unfavourable O(NνSν)scaling. We
propose using tensor sketching [37] instead of RP. For
vectors with tensor structure ~x =~y ~z where ~x Rd1d2,
~y Rd1and ~z Rd2, the Random Projection W~x can
be efficiently computed directly from ~y and ~z as
W~x = (W0~y )(W00~z ).(13)
where denotes the element-wise (Hadamard) product.
Similarly, the ACE product basis can be tensor sketched,
across the zn indices, as
ˆ
Ai= (W1~
Ai)lm
k(W2~
Ai). . . lm
k(Wν~
Ai)(14)
where lm
kdenotes taking the tensor product over the
upper indices lm and the element-wise product over the
lower index kand W1,W2etc. are i.i.d. random ma-
trices, see SI for details. The ˆ
Ai,klm can then be sym-
metrised as in Eq. 4 yielding
ˆ
Bi,klη=X
mClη
m
ν
Y
t=1
ˆ
Ait,kltmt(15)
where ˆ
Ait,kltmtis defined more precisely in Table I. Fi-
nally, we note that because the embedded channels are
independent, the error in approximating inner products
using the average across Kchannels is expected to de-
crease as 1/K, just as with standard RP. Based on
this, we conjecture that similar bounds derived for the
errors made in CLSR may also apply here. A sum-
mary comparing standard ACE, element-embedding and
tensor-reduced ACE is given in Table I, where it is clear
that The features derived using tensor decomposition are
equivalent to the tensor-sketched features with the choice
of using equal weights in each factor.
We now turn to numerical results and first demonstrate
that the tensor-reduced features are able to efficiently
and completely describe a many-element training set. We
consider a dataset comprised of all symmetry inequiva-
lent fcc structures made up of 5 elements with up to 6
atoms per unit cell [38]. A set of features is complete on
this dataset if the design matrix for a linear model fit to
total energies has full (numerical) row rank, where each
row corresponds to a different training configuration.
0 50000 100000 150000 200000
Number of basis functions
2500
5000
7500
10000
12500
15000
17500
20000
Rank of design matrix
Rank of the design matrix for FCC lattices, S=5
0 5000 10000 15000 20000 25000
5000
10000
15000
20000
on lattice
0.025 A
0.25 A
5-body
6-body
2-body
3-body
4-body
full ACE basis
tensor-reduced
element-radial
Rattling
Body order
Representation
FIG. 1. The row rank of the design matrix as a function of
basis set on a dataset of all symmetry inequivalent fcc lattices
of 5 chemical elements and unit cell sizes of up to 6 atoms.
The inset zooms in on the x=yregion.
Figure 1 shows the numerical rank of the design ma-
trix as a function of the basis set. At a given correlation
order the standard ACE basis set is grown by increasing
the polynomial degree, and the tensor-reduced basis set
is enlarged by increasing K, the number of independent
channels. In both cases once the rank stops increasing at
the given correlation order we increment ν. The colors
in Fig. 1 correspond to three different geometrical varia-
tions: blue contains on-lattice configurations only whilst
in magenta and red the atomic positions have been per-
turbed by a random Gaussian displacement with mean 0
and standard deviation of 0.025 and 0.25 Å, respectively.
The dotted lines corresponds to the standard ACE basis,
whereas the solid lines corresponds to the tensor-reduced
version from Eq. (12). Although the standard ACE basis
can always achieve full row rank since it is a complete lin-
ear basis, it does this very inefficiently. In contrast, the
row rank using the tensor-reduced basis grows almost lin-
摘要:

Tensor-reducedatomicdensityrepresentationsJamesP.Darby,1,2DávidP.Kovács,2IlyesBatatia,2,3MiguelA.Caro,4GusL.W.Hart,5ChristophOrtner,6andGáborCsányi21WarwickCentreforPredictiveModelling,SchoolofEngineering,UniversityofWarwick,Coventry,CV47AL,UK2EngineeringLaboratory,UniversityofCambridge,Cambridge,...

展开>> 收起<<
Tensor-reduced atomic density representations James P. Darby1 2Dávid P. Kovács2Ilyes Batatia2 3Miguel A. Caro4 Gus L. W. Hart5Christoph Ortner6and Gábor Csányi2.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1001.95KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注