LieGG Studying Learned Lie Group Generators Artem Moskalev UvA-Bosch Delta Lab

2025-05-02 0 0 2.3MB 16 页 10玖币

侵权投诉

LieGG: Studying Learned Lie Group Generators

Artem Moskalev

UvA-Bosch Delta Lab

University of Amsterdam

a.moskalev@uva.nl

Anna Sepliarskaia

Machine Learning Research Unit

TU Wien

seplanna@gmail.com

Ivan Sosnovik

UvA-Bosch Delta Lab

University of Amsterdam

i.sosnovik@uva.nl

Arnold Smeulders

UvA-Bosch Delta Lab

University of Amsterdam

a.w.m.smeulders@uva.nl

Abstract

Symmetries built into a neural network have appeared to be very beneﬁcial for a

wide range of tasks as it saves the data to learn them. We depart from the position

that when symmetries are not built into a model a priori, it is advantageous for

robust networks to learn symmetries directly from the data to ﬁt a task function. In

this paper, we present a method to extract symmetries learned by a neural network

and to evaluate the degree to which a network is invariant to them. With our method,

we are able to explicitly retrieve learned invariances in a form of the generators

of corresponding Lie-groups without prior knowledge of symmetries in the data.

We use the proposed method to study how symmetrical properties depend on a

neural network’s parameterization and conﬁguration. We found that the ability of a

network to learn symmetries generalizes over a range of architectures. However, the

quality of learned symmetries depends on the depth and the number of parameters.

1 Introduction

Convolutional Neural Networks (CNNs) are efﬁcient, for one because they can convert the translation

symmetry in the data into a built-in translation-equivariance property of the network without exhaust-

ing the data to learn the equivariance. Group-equivariant networks generalize this property to rotation

[

], scale [

], and other symmetries deﬁned by matrix groups [

]. Equipping a

neural network with prior known symmetries has proved to be data-efﬁcient.

Recent works [

] have demonstrated that hard coding the symmetry into a neural network

does not always lead to a better generalization. Often, soft or learned equivariance is more ad-

vantageous mostly in terms of data efﬁciency and accuracy [

]. Moreover, architectures with no

equivariance built-in, such as transformers [

] and multi-layer perceptron mixers [

], achieve a

remarkable performance on a wide range of problems. Effectively, they learn their own geometrical

priors without explicit symmetry constraints [

]. This raises the following questions to study in this

paper. To what degree do neural networks learn symmetries? How accurately do learned symmetries

reﬂect the true symmetries in the data? Can we support the capability of neural networks for learning

symmetries? We present a method to study symmetries learned by neural networks to address the

questions.

In the paper, we depart from Lie group theory and develop a method that can retrieve symmetries

learned from the data by any neural network (Figure 1). Our method only makes the assumption that

Source code: https://github.com/amoskalev/liegg

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04345v3 [cs.LG] 31 Jan 2023

Figure 1: We solve the matrix nullspace equation to derive an inﬁnitesimal generator from a neural

network and the training data. Due to Lie algebra - Lie group correspondence, we can calculate the

class of transformations the model is invariant to by exponentiating the generator.

a model is differentiable, commonly met in neural networks. While previous works on analyzing

symmetries in neural networks rely on empirical analysis of network representations [

] or on

examination of a given set of transformations [

], our method outputs a generator of the

corresponding Lie-group and allows to quantitatively evaluate how sensitive the network is in the

direction of the learned symmetry We make the following contributions:

•

From the perspective of Lie-groups, we propose the theory to study symmetrical properties

of neural networks.

•

We derive an efﬁcient implementation of the method that allows improving the interpretabil-

ity of the model by revealing symmetries it learned.

•

With our method we conclude that models with more parameters and gradual ﬁne-tuning

learn more precise symmetry groups with a higher degree of invariance to them.

2 Related work

Equivariant and invariant networks

The goal of equivariant and invariant networks is to build

the symmetries into a neural network architecture as an inductive bias. Starting from the convolutional

networks [

] that contain a translation symmetry, the concept of the equivariance was generalized to

rotations [

], permutations [

], scaling [

] and arbitrary matrix groups

in [

]. Various methods have been proposed for learning symmetries directly from the training data

[

] among which a popular approach is to learn transformations by estimating the inﬁnitesimal

generators of symmetry groups [

]. These papers focus on building the symmetries

into the models by modifying the architecture. In our work, we focus on the reverse question, i.e.

given a network with a ﬁxed architecture, what symmetries does it learn from the data?

Symmetries in neural networks

Another line of work is focused on interpreting symmetries in

neural networks. In [

] Olah et al. take inspiration from biological circuit motifs [

] and study

equivariant patterns learned by an unconstrained neural network on the image classiﬁcation task.

The authors demonstrate that the network learns rotation, hue, and scale symmetries when trained

on ImageNet. In [

] Goodfellow et al. propose a number of empirical tests to study invariances

to known transformations in a network, and demonstrate that auto-encoding architectures learn

increasingly invariant features in the deeper layers. In [

] Lenc & Vedaldi quantify invariances and

equivariance in the layers of convolutional networks to a pre-deﬁned set of transformations. In the

concurrent work of Gruver et al. [

], the authors propose using the Lie derivative to measure the

local equivariance error to known symmetry groups. In contrast to these works, our method does not

require knowing the set of transformations beforehand; and it also provides theoretical guarantees of

the invariance from the perspective of the Lie groups theory.

Network conﬁguration: width, depth, number of parameters

Neural networks of various

widths and depths have been studied through the lens of universal approximation theorem [

functional expressiveness [

] and by empirical analysis of learned representations [

]. These

works focus on either characterizing learning capabilities of neural networks or on interpreting the

differences between representations that models with various architectures learn. They, however, do

not analyze how the symmetrical properties of a model depend on its width and depth.

Other works focus on analyzing a connection between a number of parameters and the generalization

capability of networks [

]. It is commonly argued that overparametrized architectures

achieve better generalization bounds, i.e. provide smaller discrepancy between train and test perfor-

mance and are thus preferred in many applications [

]. However, it is unclear if the models with

more parameters and hence with a ﬁner generalization capability also learn symmetries better. In our

paper, we investigate this question for a family of feed-forward networks.

Robust Learning

Various methods have been proposed to train more robust neural networks.

While a naive training may lead to overﬁtting on the training subset, a well-organized pre-training

helps to mitigate this issue [

]. Features extracted by a model pretrained on a bigger dataset often

demonstrate more robust results [

]. While there are many effective techniques for training more

robust feature extractors [

], the analysis of such methods from the invariance point of view

has not been done before. We demonstrate that our method allows for better understanding and

explaining what training regimes lead to more invariant, and thus more robust representations.

3 Background

The focus of this paper is a symmetry group. A symmetry is a transformation of an object that leaves

the object unchanged. The set of all such transformations of an object with composition as a binary

operation forms a group. In this paper, the object of interest is a dataset and its symmetries. We

study what kind of transformations map one data sample to another and how neural networks learn

information about the symmetries.

The theory of Lie groups is a sweet spot of mathematics that helps to formalize symmetries and

provides practical methods for studying them. Formally, a Lie group is a group that has the structure

of a differential manifold. The tangent space in the identity element of a Lie group forms a vector

space called Lie algebra. A Lie algebra determines a Lie group up to an isomorphism for simply

connected Lie groups and, being a vector space, is a more convenient object to study than a Lie group.

Thus, in order to recover a simply connected symmetry group, it is sufﬁcient to understand its Lie

algebra.

In this paper, we assume that the Lie group

is a matrix Lie group, i.e. is a closed subgroup of the

general linear group of the degree

, denoted as

GL(n)

. It is deﬁned as the set of

n×n

invertible

matrices. The correspondence between the Lie group

and its Lie algebra

in this case is given by

the exponential map:

g={h∈M(n):et·h∈G∀t∈R},(1)

where

M(n)

is the set of all

n×n

matrices. Each such element

from equation 1 is called an

inﬁnitesimal generator of the Lie group G.

We introduce

LieGG

, a method to compute

Lie G

roup

enerators. LieGG extracts the inﬁnitesimal

generators of the Lie algebra for symmetries that a neural network learned from the data. To ﬁnd

the Lie algebra basis, we use the discriminator of the dataset, i.e. a function

F:Rn→R

such that

F(x) = 0

if and only if

is an element of the dataset

. The discriminator function is naturally

modeled by a neural network ﬁtting the data subject to a downstream task. With this, we use the

following criterion for a symmetry group:

Theorem 3.1 (Theorem 2.8 [30]).Let Gbe a connected Lie group of linear transformations acting

on an

-th dimensional manifold

. Let

F:X → Rl, l ≤n

, deﬁne a system of algebraic equations:

Fν(x)=0, ν = 1,· · · , l

and assume that the system is of maximal rank, meaning that the Jacobian matrix

∂Fν

∂xk

is of rank

for every solution xof the system. Then Gis a symmetry group of the system if and only if

i=n,j=n

i=1,j=1

∂Fν

∂xi

·hij ·xj= 0,whenever Fν(x)=0,

for

ν= 1,· · · , l

and every inﬁnitesimal generator

, where

hij

is an element of the matrix

the i-th row and j-th column.

Example

We illustrate the practical signiﬁcance of the theorem with an example. Suppose the

dataset Dlies on a sphere in Rn:

D={x∈Rn:x2

1+· · · +x2

n= 1}.

The discriminator for this example is the function

F(x) = x2

1+· · · +x2

n−1

. According to the

theorem, to ﬁnd the Lie algebra of the symmetry group, we can ﬁnd a basis of the solution of the

following linear equations, where aij are the variables of interest:

i,j

xi·aij ·xj= 0,

when

1+· · · +x2

n= 1

. We assert that the solutions are the family of matrices

such that

A+AT= 0

. Indeed, for

x:xi= 1, xj= 0, j 6=i

, the condition means that

aii = 0

. For

x:xi=1

√2, xj=1

√2, xk= 0, k 6=i, j

, it follows that

aij +aji = 0

. On the other hand,

A=−ATimplies Pi6=jxi·aij ·xj+xj·aji ·xi=Pi6=jxi·aij ·xj−xi·aij ·xj= 0.

Note that the Lie algebra of the matrices

{A:A+AT= 0}

corresponds to the Lie group of matrices

{B:B·BT=E}

which is the rotation group. Thus, the symmetry of a sphere is the rotation group.

4 Method

To compute a Lie algebra given a discriminator function

F:Rn→R

we use Theorem 3.1 and solve

the system of linear equations, where each equation corresponds to one element in the dataset:

i=n,j=n

i=1,j=1

∂F

∂xi

·hij ·xj= 0,for each point in the dataset. (2)

This is a system of linear equations with

variables

hij

and the number of equations equal to the

number of points in the dataset. We can cast solving such system with respect to

as a problem of

ﬁnding a nullspace of the matrix

. The matrix

has the number of rows equal to the number of

points in the dataset and

columns. Multiplying

with the vectorized representation of

yields the

system of equations in 2. We call the matrix Ethe network polarization matrix.

The problem of ﬁnding the nullspace basis of a matrix is a standard problem in the numerical analysis.

It can be efﬁciently solved using the Singular Value Decomposition (SVD). Recall that we can write

matrix

using the SVD in the following way

E=UΣVT

, where

U, V

- are orthogonal matrices and

is a diagonal matrix with decreasing singular values. Thus, the columns in

corresponding to

nearly-zero singular values encode the nullspace of the system of equations in 2 and hence form a

Lie algebra basis.

Thereby, in practice, the calculation of LieGG consists of 3 steps: (i) training a neural network, (ii)

calculation of the polarization matrix

, (iii) computing singular vectors corresponding to almost

zero singular values of the polarization matrix E.

4.1 Computing a Lie algebra of a group acting on R2

Symmetries play an important role in computer vision problems, and we will describe LieGG for

this case in more details. In various computer vision problems, it is assumed that the symmetry

group

changes images from the dataset by acting on

. For example, convolutional networks [

]

and recently proposed equivariant networks for rotations [

] assume that the group acts as a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LieGG:StudyingLearnedLieGroupGeneratorsArtemMoskalevUvA-BoschDeltaLabUniversityofAmsterdama.moskalev@uva.nlAnnaSepliarskaiaMachineLearningResearchUnitTUWienseplanna@gmail.comIvanSosnovikUvA-BoschDeltaLabUniversityofAmsterdami.sosnovik@uva.nlArnoldSmeuldersUvA-BoschDeltaLabUniversityofAmsterdama.w.m....

展开>> 收起<<

LieGG Studying Learned Lie Group Generators Artem Moskalev UvA-Bosch Delta Lab.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LieGG Studying Learned Lie Group Generators Artem Moskalev UvA-Bosch Delta Lab

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: