
differences between representations that models with various architectures learn. They, however, do
not analyze how the symmetrical properties of a model depend on its width and depth.
Other works focus on analyzing a connection between a number of parameters and the generalization
capability of networks [
1
,
47
,
41
,
49
]. It is commonly argued that overparametrized architectures
achieve better generalization bounds, i.e. provide smaller discrepancy between train and test perfor-
mance and are thus preferred in many applications [
49
]. However, it is unclear if the models with
more parameters and hence with a finer generalization capability also learn symmetries better. In our
paper, we investigate this question for a family of feed-forward networks.
Robust Learning
Various methods have been proposed to train more robust neural networks.
While a naive training may lead to overfitting on the training subset, a well-organized pre-training
helps to mitigate this issue [
12
]. Features extracted by a model pretrained on a bigger dataset often
demonstrate more robust results [
18
]. While there are many effective techniques for training more
robust feature extractors [
5
,
51
,
17
], the analysis of such methods from the invariance point of view
has not been done before. We demonstrate that our method allows for better understanding and
explaining what training regimes lead to more invariant, and thus more robust representations.
3 Background
The focus of this paper is a symmetry group. A symmetry is a transformation of an object that leaves
the object unchanged. The set of all such transformations of an object with composition as a binary
operation forms a group. In this paper, the object of interest is a dataset and its symmetries. We
study what kind of transformations map one data sample to another and how neural networks learn
information about the symmetries.
The theory of Lie groups is a sweet spot of mathematics that helps to formalize symmetries and
provides practical methods for studying them. Formally, a Lie group is a group that has the structure
of a differential manifold. The tangent space in the identity element of a Lie group forms a vector
space called Lie algebra. A Lie algebra determines a Lie group up to an isomorphism for simply
connected Lie groups and, being a vector space, is a more convenient object to study than a Lie group.
Thus, in order to recover a simply connected symmetry group, it is sufficient to understand its Lie
algebra.
In this paper, we assume that the Lie group
G
is a matrix Lie group, i.e. is a closed subgroup of the
general linear group of the degree
n
, denoted as
GL(n)
. It is defined as the set of
n×n
invertible
matrices. The correspondence between the Lie group
G
and its Lie algebra
g
in this case is given by
the exponential map:
g={h∈M(n):et·h∈G∀t∈R},(1)
where
M(n)
is the set of all
n×n
matrices. Each such element
h
from equation 1 is called an
infinitesimal generator of the Lie group G.
We introduce
LieGG
, a method to compute
Lie G
roup
G
enerators. LieGG extracts the infinitesimal
generators of the Lie algebra for symmetries that a neural network learned from the data. To find
the Lie algebra basis, we use the discriminator of the dataset, i.e. a function
F:Rn→R
such that
F(x) = 0
if and only if
x
is an element of the dataset
D
. The discriminator function is naturally
modeled by a neural network fitting the data subject to a downstream task. With this, we use the
following criterion for a symmetry group:
Theorem 3.1 (Theorem 2.8 [30]).Let Gbe a connected Lie group of linear transformations acting
on an
n
-th dimensional manifold
X
. Let
F:X → Rl, l ≤n
, define a system of algebraic equations:
Fν(x)=0, ν = 1,· · · , l
and assume that the system is of maximal rank, meaning that the Jacobian matrix
∂Fν
∂xk
is of rank
l
for every solution xof the system. Then Gis a symmetry group of the system if and only if
3