Hierarchical Learning in Euclidean Neural Networks Joshua A. Rackers Center for Computing Research

2025-05-06 0 0 608.11KB 9 页 10玖币
侵权投诉
Hierarchical Learning in Euclidean Neural Networks
Joshua A. Rackers
Center for Computing Research
Sandia National Laboratories
Albuquerque, NM 87110
jracker@sandia.gov
Pranav Rao
Center for Computing Research
Sandia National Laboratories
Albuquerque, NM 87110
Institute for Condensed Matter Theory
University of Illinois, Urbana-Champaign
Urbana, IL 61801
pvrao2@illinois.edu
Abstract
Equivariant machine learning methods have shown wide success at 3D learning
applications in recent years. These models explicitly build in the reflection, transla-
tion and rotation symmetries of Euclidean space and have facilitated large advances
in accuracy and data efficiency for a range of applications in the physical sciences.
An outstanding question for equivariant models is why they achieve such larger-
than-expected advances in these applications. To probe this question, we examine
the role of higher order (non-scalar) features in Euclidean Neural Networks (
e3nn
).
We focus on the previously studied application of
e3nn
to the problem of electron
density prediction, which allows for a variety of non-scalar outputs, and examine
whether the nature of the output (scalar
l= 0
, vector
l= 1
, or higher order
l > 1
)
is relevant to the effectiveness of non-scalar hidden features in the network. Further,
we examine the behavior of non-scalar features throughout training, finding a
natural hierarchy of features by
l
, reminiscent of a multipole expansion. We aim for
our work to ultimately inform design principles and choices of domain applications
for e3nn networks.
1 Introduction
Euclidean Neural Networks are graph-based neural network models that explicitly build in the
symmetries of the Euclidean group
E(3)
in three dimensions[
8
,
17
]. These models are equivariant
to rotations and translations: the features of the network explicitly transform under the action of
the Euclidean group. There can be scalar (
Flh=0
), vector (
Flh=1
), and higher order (
Flh>1
) hidden
features at each node
i
of the graph network, where
lh
represents the
(2l+ 1)
-dimensional irreducible
representation of the rotation group SO(3)1:
Fi=
Flh=0
Flh=1
Flh=2
.
.
.(1)
As a result, there is a natural data efficiency provided by equivariance; symmetry is built into the model
explicitly so one does not have to resort to data augmentation (expanding training data to include
symmetry-transformed samples). This makes Euclidean networks a natural choice for modeling
1
We use the schematic notation of Miller et. al. [
14
]. In practice the features are represented by spherical
harmonics [10].
Preprint. Under review.
arXiv:2210.04766v1 [cs.LG] 10 Oct 2022
problems in the physical and biological sciences, from crystalline materials[
5
] to molecules[
3
] (and
more), where systems of study are sensitive to rotations and translations of real space.
In practice, the advantage offered by
e3nn
is more dramatic than expected, going beyond the efficiency
gain from avoiding augmentation[
8
]. Seen across multiple problem domains[
3
,
15
], this advantage
offers great promise for modeling environments where generating training data presents a scaling
problem[
7
]. Still, the nature of these observed effects remains elusive; a recent overview of the
e3nn
framework bluntly noted: "Unfortunately we have no theoretical explanation for this change [of slope
in the learning curve]." In this work we aim to provide observations on Euclidean Neural Networks
that will help illuminate the cause of this unexpected increase in data efficiency. In particular, we
seek to understand further the role of non-scalar features in e3nn.
Initial progress has been made by systematically establishing the advantage of non-scalar features
over invariant models and scalar-only models. Previous works by Miller et. al. [14], Brandstetter et.
al. [
4
] both establish the advantage of
l= 1
features over invariant models through ablation studies.
Further, it was posited that equivariant models, with non-scalar hidden features, are particularly
suited to learning non-scalar outputs such as vectors[
3
,
4
,
14
]. There is a solid intuition for the first
observation, namely that the equivariant graph convolution[4],
F0
iX
j∈N (i)X
l
l
X
m=l
FjR(||xjxi||)Ylm xjxi
||xjxi||,(2)
is able to utilize both distances between neighboring nodes
|||xjxi||
as well as relative directional
information through the spherical harmonics (the convolution is taken over neighbors N(i)of node
i
and
R(x)
is a multi-layer perceptron). For example, in the
l= 1
case, Eq. 2 has access to angles
between nodes as well as distances. On the other hand, invariant message-passing graph networks in
the literature[12, 16] have been restricted to learning only on distances between nodes.
However, while establishing the benefit of
l > 0
models, this leaves open the nuance of what
particular
l
is necessary for a desired application, and the motivation for that specific choice. In
practice, Batzner et al. [
3
] as well as Rackers et. al [
15
] observe for independent tasks that models up
to
l= 2
(but no higher) provide increasing benefits in a model’s learning curve. Here, we focus on
tackling the following questions:
1.
Do equivariant models have an advantage over invariant models specifically for learning
non-scalar outputs?
2.
For a given task, is there an
lmax
h
for the hidden layers beyond which efficiency gains
saturate? Does this change with the nature of the task?
3. Is there any internal structure to features in equivariant models?
In this work, we will address these questions in the context of electron density prediction for water
clusters. The electron density prediction task is instructive because the data efficiency advantages of
non-scalar features in
e3nn
have already been established for this task[
15
] and the representation of
the electron density contains higher-order spherical harmonic outputs.
We propose three sets of experiments that address the above questions. First, we examine the effect
of non-scalar features in the network on non-scalar outputs. Second, we study how the maximum
angular momentum of the output,
lmax
o
, affects the optimal angular momentum channel in the hidden
layers,
lmax
h
. Finally, we look directly at how the learned features of an
e3nn
electron density model
evolve over training. These experiments will help answer the questions we have laid out and shed
light on the bigger question of the unexplained advantage of equivariance.
2 Methods
For the electron density learning task we seek to predict the coefficients of the density represented in
a density fitting basis[6, 15]):
ρ(r) =
Natoms
X
µ=0
Nbasis
X
ν=0
lmax
X
l=0
l
X
m=l
Cµν
lm Ylmeαikl (rrµ)2.(3)
2
摘要:

HierarchicalLearninginEuclideanNeuralNetworksJoshuaA.RackersCenterforComputingResearchSandiaNationalLaboratoriesAlbuquerque,NM87110jracker@sandia.govPranavRaoCenterforComputingResearchSandiaNationalLaboratoriesAlbuquerque,NM87110InstituteforCondensedMatterTheoryUniversityofIllinois,Urbana-ChampaignU...

展开>> 收起<<
Hierarchical Learning in Euclidean Neural Networks Joshua A. Rackers Center for Computing Research.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:608.11KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注