Hierarchical Learning in Euclidean Neural Networks Joshua A. Rackers Center for Computing Research

2025-05-06 1 0 608.11KB 9 页 10玖币

侵权投诉

Hierarchical Learning in Euclidean Neural Networks

Joshua A. Rackers

Center for Computing Research

Sandia National Laboratories

Albuquerque, NM 87110

jracker@sandia.gov

Pranav Rao

Center for Computing Research

Sandia National Laboratories

Albuquerque, NM 87110

Institute for Condensed Matter Theory

University of Illinois, Urbana-Champaign

Urbana, IL 61801

pvrao2@illinois.edu

Abstract

Equivariant machine learning methods have shown wide success at 3D learning

applications in recent years. These models explicitly build in the reﬂection, transla-

tion and rotation symmetries of Euclidean space and have facilitated large advances

in accuracy and data efﬁciency for a range of applications in the physical sciences.

An outstanding question for equivariant models is why they achieve such larger-

than-expected advances in these applications. To probe this question, we examine

the role of higher order (non-scalar) features in Euclidean Neural Networks (

e3nn

We focus on the previously studied application of

e3nn

to the problem of electron

density prediction, which allows for a variety of non-scalar outputs, and examine

whether the nature of the output (scalar

l= 0

, vector

l= 1

, or higher order

l > 1

)

is relevant to the effectiveness of non-scalar hidden features in the network. Further,

we examine the behavior of non-scalar features throughout training, ﬁnding a

natural hierarchy of features by

, reminiscent of a multipole expansion. We aim for

our work to ultimately inform design principles and choices of domain applications

for e3nn networks.

1 Introduction

Euclidean Neural Networks are graph-based neural network models that explicitly build in the

symmetries of the Euclidean group

E(3)

in three dimensions[

]. These models are equivariant

to rotations and translations: the features of the network explicitly transform under the action of

the Euclidean group. There can be scalar (

Flh=0

), vector (

Flh=1

), and higher order (

Flh>1

) hidden

features at each node

of the graph network, where

represents the

(2l+ 1)

-dimensional irreducible

representation of the rotation group SO(3)1:

Fi=





Flh=0

Flh=1

Flh=2







.(1)

As a result, there is a natural data efﬁciency provided by equivariance; symmetry is built into the model

explicitly so one does not have to resort to data augmentation (expanding training data to include

symmetry-transformed samples). This makes Euclidean networks a natural choice for modeling

We use the schematic notation of Miller et. al. [

]. In practice the features are represented by spherical

harmonics [10].

Preprint. Under review.

arXiv:2210.04766v1 [cs.LG] 10 Oct 2022

problems in the physical and biological sciences, from crystalline materials[

] to molecules[

] (and

more), where systems of study are sensitive to rotations and translations of real space.

In practice, the advantage offered by

e3nn

is more dramatic than expected, going beyond the efﬁciency

gain from avoiding augmentation[

]. Seen across multiple problem domains[

], this advantage

offers great promise for modeling environments where generating training data presents a scaling

problem[

]. Still, the nature of these observed effects remains elusive; a recent overview of the

e3nn

framework bluntly noted: "Unfortunately we have no theoretical explanation for this change [of slope

in the learning curve]." In this work we aim to provide observations on Euclidean Neural Networks

that will help illuminate the cause of this unexpected increase in data efﬁciency. In particular, we

seek to understand further the role of non-scalar features in e3nn.

Initial progress has been made by systematically establishing the advantage of non-scalar features

over invariant models and scalar-only models. Previous works by Miller et. al. [14], Brandstetter et.

al. [

] both establish the advantage of

l= 1

features over invariant models through ablation studies.

Further, it was posited that equivariant models, with non-scalar hidden features, are particularly

suited to learning non-scalar outputs such as vectors[

]. There is a solid intuition for the ﬁrst

observation, namely that the equivariant graph convolution[4],

i∼X

j∈N (i)X

m=−l

Fj⊗R(||xj−xi||)Ylm xj−xi

||xj−xi||,(2)

is able to utilize both distances between neighboring nodes

|||xj−xi||

as well as relative directional

information through the spherical harmonics (the convolution is taken over neighbors N(i)of node

and

R(x)

is a multi-layer perceptron). For example, in the

l= 1

case, Eq. 2 has access to angles

between nodes as well as distances. On the other hand, invariant message-passing graph networks in

the literature[12, 16] have been restricted to learning only on distances between nodes.

However, while establishing the beneﬁt of

l > 0

models, this leaves open the nuance of what

particular

is necessary for a desired application, and the motivation for that speciﬁc choice. In

practice, Batzner et al. [

] as well as Rackers et. al [

] observe for independent tasks that models up

l= 2

(but no higher) provide increasing beneﬁts in a model’s learning curve. Here, we focus on

tackling the following questions:

Do equivariant models have an advantage over invariant models speciﬁcally for learning

non-scalar outputs?

For a given task, is there an

lmax

for the hidden layers beyond which efﬁciency gains

saturate? Does this change with the nature of the task?

3. Is there any internal structure to features in equivariant models?

In this work, we will address these questions in the context of electron density prediction for water

clusters. The electron density prediction task is instructive because the data efﬁciency advantages of

non-scalar features in

e3nn

have already been established for this task[

] and the representation of

the electron density contains higher-order spherical harmonic outputs.

We propose three sets of experiments that address the above questions. First, we examine the effect

of non-scalar features in the network on non-scalar outputs. Second, we study how the maximum

angular momentum of the output,

lmax

, affects the optimal angular momentum channel in the hidden

layers,

lmax

. Finally, we look directly at how the learned features of an

e3nn

electron density model

evolve over training. These experiments will help answer the questions we have laid out and shed

light on the bigger question of the unexplained advantage of equivariance.

2 Methods

For the electron density learning task we seek to predict the coefﬁcients of the density represented in

a density ﬁtting basis[6, 15]):

ρ(r) =

Natoms

µ=0

Nbasis

ν=0

lmax

l=0

m=−l

Cµν

lm Ylme−αikl (r−rµ)2.(3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HierarchicalLearninginEuclideanNeuralNetworksJoshuaA.RackersCenterforComputingResearchSandiaNationalLaboratoriesAlbuquerque,NM87110jracker@sandia.govPranavRaoCenterforComputingResearchSandiaNationalLaboratoriesAlbuquerque,NM87110InstituteforCondensedMatterTheoryUniversityofIllinois,Urbana-ChampaignU...

展开>> 收起<<

Hierarchical Learning in Euclidean Neural Networks Joshua A. Rackers Center for Computing Research.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Hierarchical Learning in Euclidean Neural Networks Joshua A. Rackers Center for Computing Research

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: