
problems in the physical and biological sciences, from crystalline materials[
5
] to molecules[
3
] (and
more), where systems of study are sensitive to rotations and translations of real space.
In practice, the advantage offered by
e3nn
is more dramatic than expected, going beyond the efficiency
gain from avoiding augmentation[
8
]. Seen across multiple problem domains[
3
,
15
], this advantage
offers great promise for modeling environments where generating training data presents a scaling
problem[
7
]. Still, the nature of these observed effects remains elusive; a recent overview of the
e3nn
framework bluntly noted: "Unfortunately we have no theoretical explanation for this change [of slope
in the learning curve]." In this work we aim to provide observations on Euclidean Neural Networks
that will help illuminate the cause of this unexpected increase in data efficiency. In particular, we
seek to understand further the role of non-scalar features in e3nn.
Initial progress has been made by systematically establishing the advantage of non-scalar features
over invariant models and scalar-only models. Previous works by Miller et. al. [14], Brandstetter et.
al. [
4
] both establish the advantage of
l= 1
features over invariant models through ablation studies.
Further, it was posited that equivariant models, with non-scalar hidden features, are particularly
suited to learning non-scalar outputs such as vectors[
3
,
4
,
14
]. There is a solid intuition for the first
observation, namely that the equivariant graph convolution[4],
F0
i∼X
j∈N (i)X
l
l
X
m=−l
Fj⊗R(||xj−xi||)Ylm xj−xi
||xj−xi||,(2)
is able to utilize both distances between neighboring nodes
|||xj−xi||
as well as relative directional
information through the spherical harmonics (the convolution is taken over neighbors N(i)of node
i
and
R(x)
is a multi-layer perceptron). For example, in the
l= 1
case, Eq. 2 has access to angles
between nodes as well as distances. On the other hand, invariant message-passing graph networks in
the literature[12, 16] have been restricted to learning only on distances between nodes.
However, while establishing the benefit of
l > 0
models, this leaves open the nuance of what
particular
l
is necessary for a desired application, and the motivation for that specific choice. In
practice, Batzner et al. [
3
] as well as Rackers et. al [
15
] observe for independent tasks that models up
to
l= 2
(but no higher) provide increasing benefits in a model’s learning curve. Here, we focus on
tackling the following questions:
1.
Do equivariant models have an advantage over invariant models specifically for learning
non-scalar outputs?
2.
For a given task, is there an
lmax
h
for the hidden layers beyond which efficiency gains
saturate? Does this change with the nature of the task?
3. Is there any internal structure to features in equivariant models?
In this work, we will address these questions in the context of electron density prediction for water
clusters. The electron density prediction task is instructive because the data efficiency advantages of
non-scalar features in
e3nn
have already been established for this task[
15
] and the representation of
the electron density contains higher-order spherical harmonic outputs.
We propose three sets of experiments that address the above questions. First, we examine the effect
of non-scalar features in the network on non-scalar outputs. Second, we study how the maximum
angular momentum of the output,
lmax
o
, affects the optimal angular momentum channel in the hidden
layers,
lmax
h
. Finally, we look directly at how the learned features of an
e3nn
electron density model
evolve over training. These experiments will help answer the questions we have laid out and shed
light on the bigger question of the unexplained advantage of equivariance.
2 Methods
For the electron density learning task we seek to predict the coefficients of the density represented in
a density fitting basis[6, 15]):
ρ(r) =
Natoms
X
µ=0
Nbasis
X
ν=0
lmax
X
l=0
l
X
m=−l
Cµν
lm Ylme−αikl (r−rµ)2.(3)
2