Tensor-reduced atomic density representations James P. Darby1 2Dávid P. Kovács2Ilyes Batatia2 3Miguel A. Caro4 Gus L. W. Hart5Christoph Ortner6and Gábor Csányi2

2025-05-02 0 0 1001.95KB 14 页 10玖币

侵权投诉

Tensor-reduced atomic density representations

James P. Darby∗,1, 2 Dávid P. Kovács∗,2Ilyes Batatia,2, 3 Miguel A. Caro,4

Gus L. W. Hart,5Christoph Ortner,6and Gábor Csányi2

1Warwick Centre for Predictive Modelling, School of Engineering,

University of Warwick, Coventry, CV4 7AL, UK

2Engineering Laboratory, University of Cambridge, Cambridge, CB2 1PZ UK

3ENS Paris-Saclay, Université Paris-Saclay, 91190 Gif-sur-Yvette, France

4Department of Electrical Engineering and Automation, Aalto University, FIN-02150 Espoo, Finland

5Department of Physics and Astronomy, Brigham Young University, Provo, Utah, 84602, USA

6Department of Mathematics, University of British Columbia,

1984 Mathematics Road, Vancouver, BC, Canada V6T 1Z2

(Dated: December 7, 2022)

Density based representations of atomic environments that are invariant under Euclidean sym-

metries have become a widely used tool in the machine learning of interatomic potentials, broader

data-driven atomistic modelling and the visualisation and analysis of materials datasets. The stan-

dard mechanism used to incorporate chemical element information is to create separate densities

for each element and form tensor products between them. This leads to a steep scaling in the size

of the representation as the number of elements increases. Graph neural networks, which do not

explicitly use density representations, escape this scaling by mapping the chemical element infor-

mation into a ﬁxed dimensional space in a learnable way. By exploiting symmetry, we recast this

approach as tensor factorisation of the standard neighbour density based descriptors and, using a

new notation, identify connections to existing compression algorithms. In doing so, we form com-

pact tensor-reduced representation of the local atomic environment whose size does not depend on

the number of chemical elements, is systematically convergable and therefore remains applicable to

a wide range of data analysis and regression tasks.

Over the past decade, machine learning methods

for studying atomistic systems have become widely

adopted [1–3]. Most of these methods utilise representa-

tions of local atomic environments that are invariant un-

der relevant symmetries; typically rotations, reﬂections,

translations and permutations of equivalent atoms [4].

Enforcing these symmetries allows for greater data eﬃ-

ciency during model training and ensures that predictions

are made in a physically consistent manner. There are

many diﬀerent ways of constructing such representations

which are broadly split into two categories: (i) descrip-

tors based on internal coordinates, such as the Behler-

Parrinello Atom-Centered Symmetry Functions [5], and

(ii) density-based descriptors such as Smooth Overlap of

Atomic Positions (SOAP) [6] or the bispectrum [7, 8],

which employ a symmetrised expansion of ν-correlations

of the atomic neighbourhood density (ν= 2 for SOAP

and ν= 3 for the bispectrum). A major drawback of all

these representations is that their size increases dramati-

cally with the number of chemical elements Sin the sys-

tem. For instance, the number of features in the linearly

complete Atomic Cluster Expansion (ACE) [9, 10] de-

scriptor which uniﬁes, extends and generalises the afore-

mentioned representations, scales as Sνfor terms with

correlation order ν(i.e. a body order of ν+ 1). This

poor scaling severely restricts the use of these represen-

tations in many applications. For example, in the case of

*These authors contributed equally.

machine learned interatomic potentials for systems with

many (e.g. more than 5) diﬀerent chemical elements,

the large size of the models results in memory limita-

tions being reached during parameter estimation as well

as signiﬁcantly reducing evaluation speed.

Multiple strategies to tackle this scaling problem have

been proposed including element weighting [11, 12] or

embedding the elements into a ﬁxed small dimensional

space [13, 14], directly reducing the element-sensitive cor-

relation order [15], low-rank tensor-train approximations

for lattice models [16] and data-driven approaches for se-

lecting the most relevant subset or combination of the

original features for a given dataset [17–19]. A rather

diﬀerent class of machine learning methods are Message

Passing Neural Networks (MPNNs) [20, 21]. Instead of

constructing full tensor products, these models also em-

bed chemical element information in a ﬁxed size latent

space using a learnable transformation RS→RKwhere

Kis the dimension of the latent space, and thus avoid

the poor scaling with the number of chemical elements.

Recently these methods have achieved very high accu-

racy [22–24], strongly suggesting that the true complex-

ity of the relevant chemical element space does not grow

as Sν.

In this paper we introduce a general approach for sig-

niﬁcantly reducing the scaling of density-based represen-

tations like SOAP and ACE. We show that by exploiting

the tensor structures of the descriptors and applying low-

rank approximations we can derive new tensor-reduced

descriptors which are systematically convergeable to the

arXiv:2210.01705v2 [physics.chem-ph] 6 Dec 2022

original full descriptor limit. We also verify this with

numerical experiments on real data. We also show that

there is a natural generalisation to compress not only the

chemical element information but also the radial degrees

of freedom, yielding an even more compact representa-

tion. When ﬁtting interatomic potentials for organic

molecules and high entropy alloys, we achieve a ten-fold

reduction in the number of features required when using

linear (ACE) and nonlinear kernel models (SOAP-GAP).

We also ﬁt a linear model to a dataset with 37 chemical

elements which would be infeasible without the tensor-

reduced features.

All many-body density based descriptors can be under-

stood in terms of the Atomic Cluster Expansion [9]. In

ACE, the ﬁrst step in describing the local neighbourhood

N(i) = {j:rij < rcut}around atom iis forming the one-

particle basis φznlm(rij , Zj)as a product of radial basis

functions Rn, spherical harmonics Ym

land an additional

element index shown in Eq. (1), where rij and Zjdenote

the relative position and atomic number of neighbour j.

Permutation invariance is introduced by summing over

neighbour atoms in Eq. (2) after which (ν+ 1)-body fea-

tures are formed in Eq. (3) by taking tensor products

of the atomic basis Ai,znlm with itself νtimes. Finally,

Eq. (4) shows how the product basis Ai,znlm is rotation-

ally symmetrised using the generalised Clebsch-Gordon

coeﬃcients Clη

m, where ηenumerates all possible symmet-

ric couplings cf. [9, 10, 17] for the details.

φznlm(rij , Zj) = Rn(rij )Ym

l(ˆ

rij )δzZj,(1)

Ai,znlm =X

j∈N (i)

φznlm(rij , Zj),(2)

Ai,znlm =

t=1

Ai,ztntltmt(3)

Bi,znlη=X

mClη

mAi,znlm (4)

A linear ACE model can the be ﬁt to an invariant atomic

property ϕias

ϕi=X

znlη

cznlηBi,znlη(5)

where cznlηare the model parameters and for practi-

cal reasons the expansion is truncated using νmax,lmax

and nmax =N. Note that as Bi,znlηis invariant under

(za, na, la)←→ (zb, nb, lb)symmetrically equivalent terms

are usually omitted from Eq. (5), again see [9, 10, 17] for

the details.

Crucially, the tensor product in Eq. (3) causes the

number of features (and therefore the number of model

parameters) to grow rapidly as O(NνSν). Previous work

[13, 18] has reduced this to O(Kν)by ﬁrst embedding

the chemical and radial information into Kchannels

(Eq. (6)), then taking a full tensor product across the

Ai,klm (Eq. (7)).

Ai,klm =X

znAi,znlm, k = 1 . . . K (6)

Ai,klm =

t=1

Ai,ktltmt(7)

This approach is also used in Moment Tensor Potentials

[14, 25] and in Gaussian Moment descriptors [26]. The

embedding can be identiﬁed in Eq. 3 of ref. [25], where

µindexes the embedded channels and νis similar to lin

ACE. Then taking tensor products across the embedded

channels corresponds to forming products of the moments

Mµν . In general, the embedding weights are optimised

either before or during ﬁtting [13, 25] with the latter

causing the models to be non-linear.

We propose a principled approach to further reduce

the size of the basis to O(K)which can be understood

from two diﬀerent angles. First, we identify the model

parameters cη≡cznlηin Eq. (5) as a symmetric tensor,

invariant under (za, na, la)←→ (zb, nb, lb), which can be

expanded as a sum of products of rank-1 tensors as,

cη=

k=1

λkη wk⊗wk··· ⊗ wk

|{z }

νtimes

,(8)

or in component form

cznlη=

λkη

t=1

ztntlt(9)

where Wk

ztntltare the components of wk. This expansion

is exact for ﬁnite K, as cis ﬁnite due to basis trunca-

tion, and is equivalent to eigenvalue decomposition of a

symmetric matrix when ν= 2. Note that we choose to

use the same weights Wk

ztntltfor all νand η, which sig-

niﬁcantly reduces the number of weights that need be

speciﬁed. In practice, we can choose to expand over the

zn or zindices only, see SI for details, and then substi-

tute the expansion into Eq. (5) as

ϕi≈X

klη

λklη"X

mClη

t=1

Wklt

ztntAi,ztntltmt#(10)

klη

λklη"X

mClη

t=1

Ai,kltmt#(11)

klη

λklη˜

Bi,klη(12)

where ˜

Bi,klηare the new tensor reduced features and the

approximation arises because in practice we truncate the

tensor decomposition early. The key novelty is that only

element-wise products are taken across the kindex of

the embedded channels ˜

Ai,kltmtwhen forming the many-

body basis, rather than a full tensor product, i.e. kdoes

not have a tsubscript in Eq. 10 (see Table I for the full

deﬁnitions). For completeness, we note that applying this

tensor reduction to the elements only and using K= 2

is equivalent to the element-weighting strategies used in

[11, 12, 27].

There are multiple natural strategies for specifying the

embedding weights Wkl

zn, including approximating a pre-

computed cznlηor treating the weights as model param-

eters to be estimated during the training process, as is

done in MACE [24]. Here we investigate using random

weights as a simpler alternative. This ensures that Eq. 12

remains a linear model and allows the ˜

Bi,klηto be used

directly in other tasks such as data visualisation.

We now show that the resulting tensor-reduced fea-

tures can also be understood from the perspective of di-

rectly compressing the original Bi,znlηfeatures. Random

Projection (RP) [29, 30] is an established technique where

high dimensional feature vectors {~x1, . . . , ~xN} ⊂ Rdare

compressed as ˜xi=W~xi∈RK, with the entries of the

matrix Wbeing normally distributed. This approach

is simple, oﬀers a tuneable level of compression and is

underpinned by the Johnson-Lindenstrauss Lemma [31]

which bounds the fractional error made in approximat-

ing ~xT

i~xjby ˜xT

i˜xj. RP can also be used to reduce the

cost of linear models, with a closely related approach re-

cently used in [32]. In Compressed Least-Square Regres-

sion (CLSR) [33–35] features are replaced by their pro-

jections, thus reducing the number of model parameters.

Loosely speaking, the approximation errors incurred in

CLSR (and RP in general) are expected to decay as

1/√Kand we refer to refs. [33, 35, 36] for more details.

The drawback of RP is that it requires the full feature

vector to be constructed so that applying RP to ACE

would not avoid the unfavourable O(NνSν)scaling. We

propose using tensor sketching [37] instead of RP. For

vectors with tensor structure ~x =~y ⊗~z where ~x ∈Rd1d2,

~y ∈Rd1and ~z ∈Rd2, the Random Projection W~x can

be eﬃciently computed directly from ~y and ~z as

W~x = (W0~y )(W00~z ).(13)

where denotes the element-wise (Hadamard) product.

Similarly, the ACE product basis can be tensor sketched,

across the zn indices, as

Ai= (W1~

Ai)⊗lm

k(W2~

Ai). . . ⊗lm

k(Wν~

Ai)(14)

where ⊗lm

kdenotes taking the tensor product over the

upper indices lm and the element-wise product over the

lower index kand W1,W2etc. are i.i.d. random ma-

trices, see SI for details. The ˆ

Ai,klm can then be sym-

metrised as in Eq. 4 yielding

Bi,klη=X

mClη

t=1

Ait,kltmt(15)

where ˆ

Ait,kltmtis deﬁned more precisely in Table I. Fi-

nally, we note that because the embedded channels are

independent, the error in approximating inner products

using the average across Kchannels is expected to de-

crease as 1/√K, just as with standard RP. Based on

this, we conjecture that similar bounds derived for the

errors made in CLSR may also apply here. A sum-

mary comparing standard ACE, element-embedding and

tensor-reduced ACE is given in Table I, where it is clear

that The features derived using tensor decomposition are

equivalent to the tensor-sketched features with the choice

of using equal weights in each factor.

We now turn to numerical results and ﬁrst demonstrate

that the tensor-reduced features are able to eﬃciently

and completely describe a many-element training set. We

consider a dataset comprised of all symmetry inequiva-

lent fcc structures made up of 5 elements with up to 6

atoms per unit cell [38]. A set of features is complete on

this dataset if the design matrix for a linear model ﬁt to

total energies has full (numerical) row rank, where each

row corresponds to a diﬀerent training conﬁguration.

0 50000 100000 150000 200000

Number of basis functions

2500

5000

7500

10000

12500

15000

17500

20000

Rank of design matrix

Rank of the design matrix for FCC lattices, S=5

0 5000 10000 15000 20000 25000

5000

10000

15000

20000

on lattice

0.025 A

0.25 A

5-body

6-body

2-body

3-body

4-body

full ACE basis

tensor-reduced

element-radial

Rattling

Body order

Representation

FIG. 1. The row rank of the design matrix as a function of

basis set on a dataset of all symmetry inequivalent fcc lattices

of 5 chemical elements and unit cell sizes of up to 6 atoms.

The inset zooms in on the x=yregion.

Figure 1 shows the numerical rank of the design ma-

trix as a function of the basis set. At a given correlation

order the standard ACE basis set is grown by increasing

the polynomial degree, and the tensor-reduced basis set

is enlarged by increasing K, the number of independent

channels. In both cases once the rank stops increasing at

the given correlation order we increment ν. The colors

in Fig. 1 correspond to three diﬀerent geometrical varia-

tions: blue contains on-lattice conﬁgurations only whilst

in magenta and red the atomic positions have been per-

turbed by a random Gaussian displacement with mean 0

and standard deviation of 0.025 and 0.25 Å, respectively.

The dotted lines corresponds to the standard ACE basis,

whereas the solid lines corresponds to the tensor-reduced

version from Eq. (12). Although the standard ACE basis

can always achieve full row rank since it is a complete lin-

ear basis, it does this very ineﬃciently. In contrast, the

row rank using the tensor-reduced basis grows almost lin-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Tensor-reducedatomicdensityrepresentationsJamesP.Darby,1,2DávidP.Kovács,2IlyesBatatia,2,3MiguelA.Caro,4GusL.W.Hart,5ChristophOrtner,6andGáborCsányi21WarwickCentreforPredictiveModelling,SchoolofEngineering,UniversityofWarwick,Coventry,CV47AL,UK2EngineeringLaboratory,UniversityofCambridge,Cambridge,...

展开>> 收起<<

Tensor-reduced atomic density representations James P. Darby1 2Dávid P. Kovács2Ilyes Batatia2 3Miguel A. Caro4 Gus L. W. Hart5Christoph Ortner6and Gábor Csányi2.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Tensor-reduced atomic density representations James P. Darby1 2Dávid P. Kovács2Ilyes Batatia2 3Miguel A. Caro4 Gus L. W. Hart5Christoph Ortner6and Gábor Csányi2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: