The idea that
G
-invariance can be hardcoded into a model by combining multiple
G
-equivariant
layers with data reduction layers (e.g., pooling) has a long history in deep learning. The most famous
example of this idea is the conventional convolutional neural network (CNN) [
32
]. Since then a
multitude of other group equivariant layers have been designed including two-dimensional rotation
equivariant layers [
47
;
50
;
36
;
38
;
39
], three-dimensional rotational equivariant layers [
46
;
11
;
16
],
layers equivariant to the Euclidean group and its subgroups [
45
] (which we test against in this paper),
and layers that are equivariant with respect to the symmetric group [33].
Our work is not the first to analyze various aspects of invariance in neural network models. Lyle
et al.
[34]
analyzed invariance with respect to the benefits and limitations of data augmentation and
feature averaging, presenting both theoretical and empirical arguments for using feature averaging
over data augmentation. More recently, Chen et al.
[6]
presented a useful group-theoretic framework
with which to understand data augmentation. Relevant to the present work, Chen et al.
[5]
introduced
a notion of approximate invariance. Unlike that work, however, which focused on theoretical results
related to data augmentation, this paper aims to introduce metrics that can be applied to modern deep
learning architectures and answers questions about invariance from an empirical perspective. There
are a number of existing works that proposed metrics aimed at measuring the extent to which a model
is not equivariant (e.g. [
8
;
22
;
18
;
43
;
49
]). Our work differs from these in two ways: (1) we build
general metrics based on basic group theory that are designed to work across different groups and
datatypes and (2) unlike other works that use their metric to evaluate the equivariance of a specific
model, we use our metrics to explore how models learn (or do not learn) to be equivariant generally.
Finally, a range of recent works have shown that even beyond the standard evaluation statistics (e.g.,
accuracy), invariance is an important concept to consider when studying deep learning models. For
example, Kaur et al.
[25]
showed that lack of invariance can be used to identify out-of-distribution
inputs. A further series of works investigated whether excessive invariance can reduce adversarial
robustness [
23
;
24
;
40
]. All of this work reinforces one of the primary messages of this paper, that it
is important to be able to measure invariance and equivariance directly in a model.
3 Quantifying Invariance and Equivariance
We begin this section by recalling the mathematical definitions of equivariance and invariance. We
present these definitions in terms of the mathematical concept of a group, which formally captures
the notion of symmetry [15].
Assume that
G
is a group. We say that
G
acts on sets
X
and
Y
if there are maps
φX:G×X→X
and
φY:G×Y→Y
that respect the composition operation of
G
. That is, for
g1, g2∈G
and
x∈X,
φX(g2, φX(g1, x)) = φX(g2g1, x),
with an analogous condition for
φY
. Whenever the meaning is clear, we simplify notation by
writing
φX(g, x) = gx
(with an analogous convention for
φY
). A map
f:X→Y
is said to be
G-equivariant if for all x∈Xand g∈G,
f(gx) = gf(x).(1)
In the case where the map
φY
is trivial so that
gy =y
for all
g∈G
and
y∈Y
, we say that
f
is
G-invariant. Thus, invariance is a special case of equivariance.
Assume that
f:X→Y
is a neural network where
X
is the ambient space of input data and
Y
is
the target space. In many cases there is a natural way to factorize
f
into a composition
f=f2◦f1
where
f1:X→Z
is known as the feature extractor,
Z
is the latent space of
f
, and
f2:Z→Y
is the classifier. For example, if
f
is a ResNet50 CNN [
20
] then
f1
may consist of all residual
blocks while
f2
would consist of the final affine classification and softmax layers. We say that
machine learning model
f
extracts
G
-equivariant features if
f1
is a
G
-equivariant function. This is an
especially meaningful distinction in the context of transfer learning where invariance or equivariance
can be transferred to a new task via the invariance or equivariance of
f1
. Note that the definition of
G
-equivariant feature extraction requires a well-defined action of
G
on
Z
, which may not be obvious
in many cases. Because the trivial action is defined for any
G
and
Z
, we can always ask whether
f
extracts G-invariant features.
The following proposition provides some insight into how the invariance (or lack of invariance) of
f1
relates to the invariance (or lack of invariance) of f.
3