LieGG Studying Learned Lie Group Generators Artem Moskalev UvA-Bosch Delta Lab

2025-05-02 0 0 2.3MB 16 页 10玖币
侵权投诉
LieGG: Studying Learned Lie Group Generators
Artem Moskalev
UvA-Bosch Delta Lab
University of Amsterdam
a.moskalev@uva.nl
Anna Sepliarskaia
Machine Learning Research Unit
TU Wien
seplanna@gmail.com
Ivan Sosnovik
UvA-Bosch Delta Lab
University of Amsterdam
i.sosnovik@uva.nl
Arnold Smeulders
UvA-Bosch Delta Lab
University of Amsterdam
a.w.m.smeulders@uva.nl
Abstract
Symmetries built into a neural network have appeared to be very beneficial for a
wide range of tasks as it saves the data to learn them. We depart from the position
that when symmetries are not built into a model a priori, it is advantageous for
robust networks to learn symmetries directly from the data to fit a task function. In
this paper, we present a method to extract symmetries learned by a neural network
and to evaluate the degree to which a network is invariant to them. With our method,
we are able to explicitly retrieve learned invariances in a form of the generators
of corresponding Lie-groups without prior knowledge of symmetries in the data.
We use the proposed method to study how symmetrical properties depend on a
neural network’s parameterization and configuration. We found that the ability of a
network to learn symmetries generalizes over a range of architectures. However, the
quality of learned symmetries depends on the depth and the number of parameters.
1 Introduction
Convolutional Neural Networks (CNNs) are efficient, for one because they can convert the translation
symmetry in the data into a built-in translation-equivariance property of the network without exhaust-
ing the data to learn the equivariance. Group-equivariant networks generalize this property to rotation
[
6
,
46
,
43
,
20
], scale [
39
,
36
,
4
], and other symmetries defined by matrix groups [
14
]. Equipping a
neural network with prior known symmetries has proved to be data-efficient.
Recent works [
13
,
21
,
50
] have demonstrated that hard coding the symmetry into a neural network
does not always lead to a better generalization. Often, soft or learned equivariance is more ad-
vantageous mostly in terms of data efficiency and accuracy [
42
]. Moreover, architectures with no
equivariance built-in, such as transformers [
11
] and multi-layer perceptron mixers [
40
], achieve a
remarkable performance on a wide range of problems. Effectively, they learn their own geometrical
priors without explicit symmetry constraints [
8
]. This raises the following questions to study in this
paper. To what degree do neural networks learn symmetries? How accurately do learned symmetries
reflect the true symmetries in the data? Can we support the capability of neural networks for learning
symmetries? We present a method to study symmetries learned by neural networks to address the
questions.
In the paper, we depart from Lie group theory and develop a method that can retrieve symmetries
learned from the data by any neural network (Figure 1). Our method only makes the assumption that
Source code: https://github.com/amoskalev/liegg
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.04345v3 [cs.LG] 31 Jan 2023
Figure 1: We solve the matrix nullspace equation to derive an infinitesimal generator from a neural
network and the training data. Due to Lie algebra - Lie group correspondence, we can calculate the
class of transformations the model is invariant to by exponentiating the generator.
a model is differentiable, commonly met in neural networks. While previous works on analyzing
symmetries in neural networks rely on empirical analysis of network representations [
29
] or on
examination of a given set of transformations [
15
,
36
,
26
], our method outputs a generator of the
corresponding Lie-group and allows to quantitatively evaluate how sensitive the network is in the
direction of the learned symmetry We make the following contributions:
From the perspective of Lie-groups, we propose the theory to study symmetrical properties
of neural networks.
We derive an efficient implementation of the method that allows improving the interpretabil-
ity of the model by revealing symmetries it learned.
With our method we conclude that models with more parameters and gradual fine-tuning
learn more precise symmetry groups with a higher degree of invariance to them.
2 Related work
Equivariant and invariant networks
The goal of equivariant and invariant networks is to build
the symmetries into a neural network architecture as an inductive bias. Starting from the convolutional
networks [
25
] that contain a translation symmetry, the concept of the equivariance was generalized to
rotations [
6
,
46
,
43
,
20
], permutations [
48
], scaling [
39
,
36
,
37
,
4
,
45
,
38
] and arbitrary matrix groups
in [
14
]. Various methods have been proposed for learning symmetries directly from the training data
[
50
,
3
] among which a popular approach is to learn transformations by estimating the infinitesimal
generators of symmetry groups [
35
,
10
,
7
,
34
,
9
]. These papers focus on building the symmetries
into the models by modifying the architecture. In our work, we focus on the reverse question, i.e.
given a network with a fixed architecture, what symmetries does it learn from the data?
Symmetries in neural networks
Another line of work is focused on interpreting symmetries in
neural networks. In [
29
] Olah et al. take inspiration from biological circuit motifs [
2
] and study
equivariant patterns learned by an unconstrained neural network on the image classification task.
The authors demonstrate that the network learns rotation, hue, and scale symmetries when trained
on ImageNet. In [
15
] Goodfellow et al. propose a number of empirical tests to study invariances
to known transformations in a network, and demonstrate that auto-encoding architectures learn
increasingly invariant features in the deeper layers. In [
26
] Lenc & Vedaldi quantify invariances and
equivariance in the layers of convolutional networks to a pre-defined set of transformations. In the
concurrent work of Gruver et al. [
16
], the authors propose using the Lie derivative to measure the
local equivariance error to known symmetry groups. In contrast to these works, our method does not
require knowing the set of transformations beforehand; and it also provides theoretical guarantees of
the invariance from the perspective of the Lie groups theory.
Network configuration: width, depth, number of parameters
Neural networks of various
widths and depths have been studied through the lens of universal approximation theorem [
27
,
22
],
functional expressiveness [
32
] and by empirical analysis of learned representations [
28
]. These
works focus on either characterizing learning capabilities of neural networks or on interpreting the
2
differences between representations that models with various architectures learn. They, however, do
not analyze how the symmetrical properties of a model depend on its width and depth.
Other works focus on analyzing a connection between a number of parameters and the generalization
capability of networks [
1
,
47
,
41
,
49
]. It is commonly argued that overparametrized architectures
achieve better generalization bounds, i.e. provide smaller discrepancy between train and test perfor-
mance and are thus preferred in many applications [
49
]. However, it is unclear if the models with
more parameters and hence with a finer generalization capability also learn symmetries better. In our
paper, we investigate this question for a family of feed-forward networks.
Robust Learning
Various methods have been proposed to train more robust neural networks.
While a naive training may lead to overfitting on the training subset, a well-organized pre-training
helps to mitigate this issue [
12
]. Features extracted by a model pretrained on a bigger dataset often
demonstrate more robust results [
18
]. While there are many effective techniques for training more
robust feature extractors [
5
,
51
,
17
], the analysis of such methods from the invariance point of view
has not been done before. We demonstrate that our method allows for better understanding and
explaining what training regimes lead to more invariant, and thus more robust representations.
3 Background
The focus of this paper is a symmetry group. A symmetry is a transformation of an object that leaves
the object unchanged. The set of all such transformations of an object with composition as a binary
operation forms a group. In this paper, the object of interest is a dataset and its symmetries. We
study what kind of transformations map one data sample to another and how neural networks learn
information about the symmetries.
The theory of Lie groups is a sweet spot of mathematics that helps to formalize symmetries and
provides practical methods for studying them. Formally, a Lie group is a group that has the structure
of a differential manifold. The tangent space in the identity element of a Lie group forms a vector
space called Lie algebra. A Lie algebra determines a Lie group up to an isomorphism for simply
connected Lie groups and, being a vector space, is a more convenient object to study than a Lie group.
Thus, in order to recover a simply connected symmetry group, it is sufficient to understand its Lie
algebra.
In this paper, we assume that the Lie group
G
is a matrix Lie group, i.e. is a closed subgroup of the
general linear group of the degree
n
, denoted as
GL(n)
. It is defined as the set of
n×n
invertible
matrices. The correspondence between the Lie group
G
and its Lie algebra
g
in this case is given by
the exponential map:
g={hM(n):et·hGtR},(1)
where
M(n)
is the set of all
n×n
matrices. Each such element
h
from equation 1 is called an
infinitesimal generator of the Lie group G.
We introduce
LieGG
, a method to compute
Lie G
roup
G
enerators. LieGG extracts the infinitesimal
generators of the Lie algebra for symmetries that a neural network learned from the data. To find
the Lie algebra basis, we use the discriminator of the dataset, i.e. a function
F:RnR
such that
F(x) = 0
if and only if
x
is an element of the dataset
D
. The discriminator function is naturally
modeled by a neural network fitting the data subject to a downstream task. With this, we use the
following criterion for a symmetry group:
Theorem 3.1 (Theorem 2.8 [30]).Let Gbe a connected Lie group of linear transformations acting
on an
n
-th dimensional manifold
X
. Let
F:X Rl, l n
, define a system of algebraic equations:
Fν(x)=0, ν = 1,· · · , l
and assume that the system is of maximal rank, meaning that the Jacobian matrix
Fν
xk
is of rank
l
for every solution xof the system. Then Gis a symmetry group of the system if and only if
3
i=n,j=n
X
i=1,j=1
Fν
xi
·hij ·xj= 0,whenever Fν(x)=0,
for
ν= 1,· · · , l
and every infinitesimal generator
h
of
G
, where
hij
is an element of the matrix
h
in
the i-th row and j-th column.
Example
We illustrate the practical significance of the theorem with an example. Suppose the
dataset Dlies on a sphere in Rn:
D={xRn:x2
1+· · · +x2
n= 1}.
The discriminator for this example is the function
F(x) = x2
1+· · · +x2
n1
. According to the
theorem, to find the Lie algebra of the symmetry group, we can find a basis of the solution of the
following linear equations, where aij are the variables of interest:
X
i,j
xi·aij ·xj= 0,
when
x2
1+· · · +x2
n= 1
. We assert that the solutions are the family of matrices
A
such that
A+AT= 0
. Indeed, for
x:xi= 1, xj= 0, j 6=i
, the condition means that
aii = 0
. For
x:xi=1
2, xj=1
2, xk= 0, k 6=i, j
, it follows that
aij +aji = 0
. On the other hand,
A=ATimplies Pi6=jxi·aij ·xj+xj·aji ·xi=Pi6=jxi·aij ·xjxi·aij ·xj= 0.
Note that the Lie algebra of the matrices
{A:A+AT= 0}
corresponds to the Lie group of matrices
{B:B·BT=E}
which is the rotation group. Thus, the symmetry of a sphere is the rotation group.
4 Method
To compute a Lie algebra given a discriminator function
F:RnR
we use Theorem 3.1 and solve
the system of linear equations, where each equation corresponds to one element in the dataset:
i=n,j=n
X
i=1,j=1
F
xi
·hij ·xj= 0,for each point in the dataset. (2)
This is a system of linear equations with
n2
variables
hij
and the number of equations equal to the
number of points in the dataset. We can cast solving such system with respect to
h
as a problem of
finding a nullspace of the matrix
E
. The matrix
E
has the number of rows equal to the number of
points in the dataset and
n2
columns. Multiplying
E
with the vectorized representation of
h
yields the
system of equations in 2. We call the matrix Ethe network polarization matrix.
The problem of finding the nullspace basis of a matrix is a standard problem in the numerical analysis.
It can be efficiently solved using the Singular Value Decomposition (SVD). Recall that we can write
matrix
E
using the SVD in the following way
E=UΣVT
, where
U, V
- are orthogonal matrices and
Σ
is a diagonal matrix with decreasing singular values. Thus, the columns in
V
corresponding to
nearly-zero singular values encode the nullspace of the system of equations in 2 and hence form a
Lie algebra basis.
Thereby, in practice, the calculation of LieGG consists of 3 steps: (i) training a neural network, (ii)
calculation of the polarization matrix
E
, (iii) computing singular vectors corresponding to almost
zero singular values of the polarization matrix E.
4.1 Computing a Lie algebra of a group acting on R2
Symmetries play an important role in computer vision problems, and we will describe LieGG for
this case in more details. In various computer vision problems, it is assumed that the symmetry
group
G
changes images from the dataset by acting on
R2
. For example, convolutional networks [
25
]
and recently proposed equivariant networks for rotations [
46
,
43
] assume that the group acts as a
4
摘要:

LieGG:StudyingLearnedLieGroupGeneratorsArtemMoskalevUvA-BoschDeltaLabUniversityofAmsterdama.moskalev@uva.nlAnnaSepliarskaiaMachineLearningResearchUnitTUWienseplanna@gmail.comIvanSosnovikUvA-BoschDeltaLabUniversityofAmsterdami.sosnovik@uva.nlArnoldSmeuldersUvA-BoschDeltaLabUniversityofAmsterdama.w.m....

展开>> 收起<<
LieGG Studying Learned Lie Group Generators Artem Moskalev UvA-Bosch Delta Lab.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:2.3MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注