EQUIVARIANCE -AWARE ARCHITECTURAL OPTIMIZATION OF NEURAL NETWORKS Kaitlin Maile

2025-04-24 0 0 1.24MB 18 页 10玖币

侵权投诉

EQUIVARIANCE-AWARE ARCHITECTURAL OPTIMIZATION OF

NEURAL NETWORKS

Kaitlin Maile

IRIT, University of Toulouse

kaitlin.maile@irit.fr

Dennis G. Wilson

ISAE-SUPAERO, University of Toulouse

dennis.wilson@isae-supaero.fr

Patrick Forr´

University of Amsterdam

p.d.forre@uva.nl

ABSTRACT

Incorporating equivariance to symmetry groups as a constraint during neural network training can

improve performance and generalization for tasks exhibiting those symmetries, but such symmetries

are often not perfectly nor explicitly present. This motivates algorithmically optimizing the architec-

tural constraints imposed by equivariance. We propose the equivariance relaxation morphism, which

preserves functionality while reparameterizing a group equivariant layer to operate with equivari-

ance constraints on a subgroup, as well as the [G]-mixed equivariant layer, which mixes layers

constrained to different groups to enable within-layer equivariance optimization. We further present

evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mech-

anisms respectively for equivariance-aware architectural optimization. Experiments across a variety

of datasets show the beneﬁt of dynamically constrained equivariance to ﬁnd effective architectures

with approximate equivariance.

1 Introduction

Constraining neural networks to be equivariant to symmetry groups present in the data can improve their task perfor-

mance, efﬁciency, and generalization capabilities (Bronstein et al., 2021), as shown by translation-equivariant convo-

lutional neural networks (Fukushima & Miyake, 1982; LeCun et al., 1989) for image-based tasks (LeCun et al., 1998).

Seminal works have developed general theories and architectures for equivariance in neural networks, providing a

blueprint for equivariant operations on complex structured data (Cohen & Welling, 2016; Ravanbakhsh et al., 2017;

Kondor & Trivedi, 2018; Weiler et al., 2021). However, these works design model constraints based on an explicit

equivariance property. Furthermore, their architectural assumption of full equivariance in every layer may be overly

constraining; e.g., in handwritten digit recognition, full equivariance to 180◦rotation may lead to misclassifying sam-

ples of “6” and “9”. Weiler & Cesa (2019) found that local equivariance from a ﬁnal subgroup convolutional layer

improves performance over full equivariance. If appropriate equivariance constraints are instead learned, the beneﬁts

of equivariance could extend to applications where the data may have unknown or imperfect symmetries.

Learning approximate equivariance has been recently approached through novel layer operations (Wang et al., 2022;

Finzi et al., 2021; Zhou et al., 2020; Yeh et al., 2022; Basu et al., 2021). Separately, the ﬁeld of neural architecture

search (NAS) aims to optimize full neural network architectures (Zoph & Le, 2017; Real et al., 2017; Elsken et al.,

2017; Liu et al., 2018; Lu et al., 2019). Existing NAS methods have not yet been developed for explicitly optimizing

equivariance, although partial or soft equivariant approaches like Romero & Lohit (2022) and van der Ouderaa et al.

(2022) do allow for custom equivariant architectures. An important aspect of NAS is network morphisms: function-

preserving architectural changes (Wei et al., 2016) which can be used during training to change the loss landscape and

gradient descent trajectory while immediately maintaining the current functionality and loss value (Maile et al., 2022).

Developing tools for searching over a space of architectural representations of equivariance would allow for existing

NAS algorithms to be applied towards architectural optimization of equivariance.

arXiv:2210.05484v3 [cs.LG] 7 Feb 2023

Equivariance-aware Architectural Optimization of Neural Networks

Contributions First, we present two mechanisms towards equivariance-aware architectural optimization. The equiv-

ariance relaxation morphism for group convolutional layers partially expands the representation and parameters of the

layer to enable less constrained learning with a prior on symmetry. The [G]-mixed equivariant layer parameterizes a

layer as a weighted sum of layers equivariant to different groups, permitting the learning of architectural weighting

parameters.

Second, we implement these concepts within two algorithms for architectural optimization of partially-equivariant net-

works. Evolutionary Equivariance-Aware NAS (EquiNASE) utilizes the equivariance relaxation morphism in a greedy

evolutionary algorithm, dynamically relaxing constraints throughout the training process. Differentiable Equivariance-

Aware NAS (EquiNASD) implements [G]-mixed equivariant layers throughout a network to learn the appropriate

approximate equivariance of each layer, in addition to their optimized weights, during training.

Finally, we analyze the proposed mechanisms via their respective NAS approaches in multiple image classiﬁcation

tasks, investigating how the dynamically learned approximate equivariance affects training and performance over

baseline models and other approaches.

1.1 Related works

Approximate equivariance Although no other works on approximate equivariance explicitly study architectural

optimization, some approaches are architectural in nature. We compare our contributions with the most conceptually

similar works to our knowledge.

The main contributions of Basu et al. (2021) and Agrawal & Ostrowski (2022) are similar to our proposed equivariant

relaxation morphism. Basu et al. (2021) also utilizes subgroup decomposition but instead algorithmically builds up

equivariances from smaller groups, while our work focuses on relaxing existing constraints. Agrawal & Ostrowski

(2022) presents theoretical contributions towards network morphisms for group-invariant shallow neural networks:

in comparison, our work focuses on deep group convolutional architectures and implements the morphism in a NAS

algorithm.

The main contributions of Wang et al. (2022) and Finzi et al. (2021) are similar to our proposed [G]-mixed equivariant

layer. Wang et al. (2022) also uses a weighted sum of kernels, but uses the same group for each kernel and deﬁnes the

weights over the domain of group elements. Finzi et al. (2021) uses an equivariant layer in parallel to a linear layer

with weighted regularization, thus only using two layers in parallel and weighting them through regularization rather

than parameterization.

In more diverse approaches, Zhou et al. (2020) and Yeh et al. (2022) represent symmetry-inducing weight sharing

through learnable matrices. Romero & Lohit (2022) and van der Ouderaa et al. (2022) learn partial or soft equivari-

ances for each layer.

Neural architecture search Neural architecture search (NAS) aims to optimize both the architecture and its param-

eters for a given task. Liu et al. (2018) approaches this difﬁcult bi-level optimization by creating a large super-network

containing all possible elements and continuously relaxing the discrete architectural parameters to enable search by

gradient descent. Other NAS approaches include evolutionary algorithms (Real et al., 2017; Lu et al., 2019; Elsken

et al., 2017) and reinforcement learning (Zoph & Le, 2017), which search over discretely represented architectures.

2 Background

We assume familiarity with group theory (see Appendix A.1). Let Gbe a discrete group. The lth G-equivariant group

convolutional layer (Cohen & Welling, 2016) of a group convolutional neural network (G-CNN) convolves the feature

map f:G→RCl−1output from the previous layer with a ﬁlter with kernel size krepresented as learnable parameters

ψ:G→RCl×Cl−1. For each output channel d∈[Cl], where [C] := {1, . . . , C}, and group element g∈G, the

layer’s output is deﬁned via the convolution operator1:

[f ?Gψ]d(g) = X

h∈G

Cl−1

c=1

fc(h)ψd,c(g−1h).(1)

The ﬁrst layer is a special case: the input to the network needs to be lifted via this operation such that the output

feature map of this layer has a domain of G. In the case of image data, an image xwith Cchannels may be interpreted

1We identify the correlation and convolution operators as they only differ where the inverse group element is placed and refer

to both as ”convolution” throughout this work.

Equivariance-aware Architectural Optimization of Neural Networks

Learnable

parameters

Equivariance

relaxation

morphism

[G]-mixed equivariant layer

Training

Expanded ﬁlter

C4 equivariance C2 equivariance

C4 equivariant operation

C2 equivariant operation

C1 equivariant operation

A B

Figure 1: Visualizations of (A) the equivariance relaxation morphism and (B) the [G]-mixed equivariant layer, using

the C4group. In (A), the learnable parameters of a C4-equivariant convolutional layer are expanded using the group

element actions, such that the expanded ﬁlter can be used in a standard convolutional layer. Applying the equivariance

relaxation morphism reparameterizes the layer to only be architecturally constrained to C2equivariance, initialized

to be functionally C4equivariant. In (B), convolutional operations equivariant to subgroups of C4are summed with

learnable architectural weighting parameters.

as a function x:Z2→RCmapping each pixel in coordinate space to a real number for each channel, where the cth

channel of xis referred to as xc. The input is x:Z2→RC0, so the layer is instead a lifting convolution:

[x ?Gψ]d(g) = X

y∈Z2

c=1

xc(y)ψd,c(g−1y).(2)

We present our contributions in the group convolutional layer case, although similar claims apply for the lifting con-

volutional layer case.

3 Towards Architectural Optimization over Subgroups

We propose two mechanisms to enable search over subgroups: the equivariance relaxation morphism and a [G]-mixed

equivariant layer, shown in Figure 1. The proposed morphism, described in Section 3.1, changes the equivariance

constraint from one group to another subgroup while preserving the learned weights of the initial group convolutional

operator. The [G]-mixed equivariant layer, presented in Section 3.2, allows for a single layer to represent equivariance

to multiple subgroups through a weighted sum.

3.1 Equivariance Relaxation Morphism

The equivariance relaxation morphism reparameterizes a G-equivariant group (or lifting) convolutional layer to oper-

ate over a subgroup of G, partially removing weight-sharing constraints from the parameter space while maintaining

the functionality of the layer.

Let G0≤Gbe a subgroup of G. Let Rbe a system of representatives of the left quotient (including the neutral

element), so that G0\G={G0r|r∈R},where G0r:={g0r|g0∈G0}.Given a G-equivariant group convolutional

layer with feature map fand kernel ψ, we deﬁne the relaxed feature map ˜

f:G0→R(Cl−1×|R|)and relaxed kernel

ψ:G0→R(Cl×|R|)×(Cl−1×|R|)as follows. For c∈[Cl−1],s, t ∈R,d∈[Cl]:

f(c,s)(g0):=fc(g0s),(3)

ψ(d,t),(c,s)(g0):=ψd,c(t−1g0s).(4)

We deﬁne the equivariance relaxation morphism from Gto G0as the reparameterization of ψas ˜

ψ(Eq. 4) and

reshaping of fas ˜

f(Eq. 3). We will show that the new output layer, [˜

f ?G0˜

ψ](d,t)(g0), is equivalent to [f ?Gψ]d(g0t)

down to reshaping. Since the mapping G0×R→G, (g0, t)7→ g0t, is bijective, every gcan uniquely be written as

g=g0twith g0∈G0and t∈R. For g∈G,G0g∈G0\Ghas a unique representative t∈Rwith G0g=G0t, and

g0:= gt−1∈G0. By the same argument, h∈Gmay be written as h=h0swith unique h∈G0and s∈R. With

Equivariance-aware Architectural Optimization of Neural Networks

these preliminaries, we get:

[f ?Gψ]d(g0t) = [f ?Gψ]d(g)(5)

h∈G

Cl−1

c=1

fc(h)ψd,c(g−1h),(6)

h0∈G0X

s∈R

Cl−1

c=1

fc(h0s)ψd,c(t−1g0−1h0s),(7)

h0∈G0

Cl−1

c=1 X

s∈R

f(c,s)(h0)˜

ψ(d,t),(c,s)(g0−1h0),(8)

=h˜

f ?G0˜

ψi(d,t)(g0),(9)

which shows the claim. Thus, the convolution of ˜

fwith ˜

ψis equivariant to Gbut parametrized as a G0-equivariant

group convolutional layer, where the representatives are expanded into independent channels. This morphism can be

viewed as initializing a G0-equivariant layer with a pre-trained prior of equivariance to G, maintaining any previous

training.

Standard convolutional layers are a special case of group-equivariant layers, where the group is translational symmetry

over pixel space. Regular group convolutions are often implemented by relaxation to the translational symmetry group

by expanding the kernel via the appropriate group actions, allowing a standard convolution implementation from a

deep learning library to be used. The equivariance relaxation morphism generalizes this concept to any subgroup.

With the given preliminaries and the case of G0=T(2),˜

fand ˜

ψare computed such that ˜

f(c,s)(g0):=fc(g0s)and

ψ(d,t),(c,s)(g0):=ψd,c(t−1g0s)for each g0∈T(2),c∈[Cl−1],s, t ∈R, and d∈[Cl].

Let SG:= |R|. The learnable parameters of the Gl-equivariant lth layer with Cloutput channels, corresponding to

ψ, are stored as a tensor of size Cl×Cl−1×SGl×Kl×Kl. The kernel transformation expands this kernel tensor

by performing the action of each r∈Ron another copy of the tensor to expand its shape along a new dimension,

resulting in a tensor of size Cl×SGl×Cl−1×SGlf×Kl×Kl, which is reshaped to ClSGl×Cl−1SGl×Kl×Kl. The

input tensor to the lth layer, corresponding to f, is in the shape of B×Cl−1×SGl×Hl−1×Wl−1,which is reshaped

to B×Cl−1SGl×Hl−1×Wl−1and convolved with the expanded kernel. The output of shape B×ClSGl×Hl×Wl

is reshaped to B×Cl×SGl×Hl×Wl.

To implement the equivariance relaxation morphism, the new kernel tensor is initialized by applying Equation 4 such

that result of applying the preceding kernel transformation is equivalent. Our implementation of group actions relies

on group channel indexing to represent the order of group elements: to ensure this is consistent before and after

the morphism, the appropriate reordering of the output and input channels of the expanded ﬁlter are applied upon

expansion. The new kernel tensor has a shape of Cl|R| × Cl−1|R| × SGl/|R| × Kl×Kl.

3.2 [G]-Mixed Equivariant Layer

Towards learning equivariance, we additionally propose partial equivariance through a mixture of layers, each con-

strained to equivariance to different groups and applied in parallel to the same input then all combined via a weighted

sum. The equivariance relaxation morphism provides a mapping of group elements between pairs of groups where one

is a subgroup of the other. For a set of groups [G]where each group is a subgroup or supergroup of all other groups

within the set, we deﬁne a [G]-mixed equivariant layer as:

fˆ

?[G][ψ](d,t)(g) = X

G∈[G]

zGhf ?G0f

ψGi(d,t)(g)(10)

=

f ?G0X

G∈[G]

zGf

ψG

(d,t)

(g),(11)

where each element zGof [z]:={zG|G∈[G]}is an architectural weighting parameter such that PG∈[G]zG= 1, G0

is a subgroup of all groups in [G], each element ψGof [ψ]is a kernel with a domain of G, and f

ψGis the transformation

of ψGfrom a domain of Gto G0as deﬁned in Equation 4. Thus, the layer is parametrized by [ψ]and [z], computing

a weighted sum of operations that are equivariant to different groups of [G]. The layer may be equivalently computed

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EQUIVARIANCE-AWAREARCHITECTURALOPTIMIZATIONOFNEURALNETWORKSKaitlinMaileIRIT,UniversityofToulousekaitlin.maile@irit.frDennisG.WilsonISAE-SUPAERO,UniversityofToulousedennis.wilson@isae-supaero.frPatrickForr´eUniversityofAmsterdamp.d.forre@uva.nlABSTRACTIncorporatingequivariancetosymmetrygroupsasaconst...

展开>> 收起<<

EQUIVARIANCE -AWARE ARCHITECTURAL OPTIMIZATION OF NEURAL NETWORKS Kaitlin Maile.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

EQUIVARIANCE -AWARE ARCHITECTURAL OPTIMIZATION OF NEURAL NETWORKS Kaitlin Maile

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: