EQUIVARIANCE -AWARE ARCHITECTURAL OPTIMIZATION OF NEURAL NETWORKS Kaitlin Maile

2025-04-24 0 0 1.24MB 18 页 10玖币
侵权投诉
EQUIVARIANCE-AWARE ARCHITECTURAL OPTIMIZATION OF
NEURAL NETWORKS
Kaitlin Maile
IRIT, University of Toulouse
kaitlin.maile@irit.fr
Dennis G. Wilson
ISAE-SUPAERO, University of Toulouse
dennis.wilson@isae-supaero.fr
Patrick Forr´
e
University of Amsterdam
p.d.forre@uva.nl
ABSTRACT
Incorporating equivariance to symmetry groups as a constraint during neural network training can
improve performance and generalization for tasks exhibiting those symmetries, but such symmetries
are often not perfectly nor explicitly present. This motivates algorithmically optimizing the architec-
tural constraints imposed by equivariance. We propose the equivariance relaxation morphism, which
preserves functionality while reparameterizing a group equivariant layer to operate with equivari-
ance constraints on a subgroup, as well as the [G]-mixed equivariant layer, which mixes layers
constrained to different groups to enable within-layer equivariance optimization. We further present
evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mech-
anisms respectively for equivariance-aware architectural optimization. Experiments across a variety
of datasets show the benefit of dynamically constrained equivariance to find effective architectures
with approximate equivariance.
1 Introduction
Constraining neural networks to be equivariant to symmetry groups present in the data can improve their task perfor-
mance, efficiency, and generalization capabilities (Bronstein et al., 2021), as shown by translation-equivariant convo-
lutional neural networks (Fukushima & Miyake, 1982; LeCun et al., 1989) for image-based tasks (LeCun et al., 1998).
Seminal works have developed general theories and architectures for equivariance in neural networks, providing a
blueprint for equivariant operations on complex structured data (Cohen & Welling, 2016; Ravanbakhsh et al., 2017;
Kondor & Trivedi, 2018; Weiler et al., 2021). However, these works design model constraints based on an explicit
equivariance property. Furthermore, their architectural assumption of full equivariance in every layer may be overly
constraining; e.g., in handwritten digit recognition, full equivariance to 180rotation may lead to misclassifying sam-
ples of “6” and “9”. Weiler & Cesa (2019) found that local equivariance from a final subgroup convolutional layer
improves performance over full equivariance. If appropriate equivariance constraints are instead learned, the benefits
of equivariance could extend to applications where the data may have unknown or imperfect symmetries.
Learning approximate equivariance has been recently approached through novel layer operations (Wang et al., 2022;
Finzi et al., 2021; Zhou et al., 2020; Yeh et al., 2022; Basu et al., 2021). Separately, the field of neural architecture
search (NAS) aims to optimize full neural network architectures (Zoph & Le, 2017; Real et al., 2017; Elsken et al.,
2017; Liu et al., 2018; Lu et al., 2019). Existing NAS methods have not yet been developed for explicitly optimizing
equivariance, although partial or soft equivariant approaches like Romero & Lohit (2022) and van der Ouderaa et al.
(2022) do allow for custom equivariant architectures. An important aspect of NAS is network morphisms: function-
preserving architectural changes (Wei et al., 2016) which can be used during training to change the loss landscape and
gradient descent trajectory while immediately maintaining the current functionality and loss value (Maile et al., 2022).
Developing tools for searching over a space of architectural representations of equivariance would allow for existing
NAS algorithms to be applied towards architectural optimization of equivariance.
arXiv:2210.05484v3 [cs.LG] 7 Feb 2023
Equivariance-aware Architectural Optimization of Neural Networks
Contributions First, we present two mechanisms towards equivariance-aware architectural optimization. The equiv-
ariance relaxation morphism for group convolutional layers partially expands the representation and parameters of the
layer to enable less constrained learning with a prior on symmetry. The [G]-mixed equivariant layer parameterizes a
layer as a weighted sum of layers equivariant to different groups, permitting the learning of architectural weighting
parameters.
Second, we implement these concepts within two algorithms for architectural optimization of partially-equivariant net-
works. Evolutionary Equivariance-Aware NAS (EquiNASE) utilizes the equivariance relaxation morphism in a greedy
evolutionary algorithm, dynamically relaxing constraints throughout the training process. Differentiable Equivariance-
Aware NAS (EquiNASD) implements [G]-mixed equivariant layers throughout a network to learn the appropriate
approximate equivariance of each layer, in addition to their optimized weights, during training.
Finally, we analyze the proposed mechanisms via their respective NAS approaches in multiple image classification
tasks, investigating how the dynamically learned approximate equivariance affects training and performance over
baseline models and other approaches.
1.1 Related works
Approximate equivariance Although no other works on approximate equivariance explicitly study architectural
optimization, some approaches are architectural in nature. We compare our contributions with the most conceptually
similar works to our knowledge.
The main contributions of Basu et al. (2021) and Agrawal & Ostrowski (2022) are similar to our proposed equivariant
relaxation morphism. Basu et al. (2021) also utilizes subgroup decomposition but instead algorithmically builds up
equivariances from smaller groups, while our work focuses on relaxing existing constraints. Agrawal & Ostrowski
(2022) presents theoretical contributions towards network morphisms for group-invariant shallow neural networks:
in comparison, our work focuses on deep group convolutional architectures and implements the morphism in a NAS
algorithm.
The main contributions of Wang et al. (2022) and Finzi et al. (2021) are similar to our proposed [G]-mixed equivariant
layer. Wang et al. (2022) also uses a weighted sum of kernels, but uses the same group for each kernel and defines the
weights over the domain of group elements. Finzi et al. (2021) uses an equivariant layer in parallel to a linear layer
with weighted regularization, thus only using two layers in parallel and weighting them through regularization rather
than parameterization.
In more diverse approaches, Zhou et al. (2020) and Yeh et al. (2022) represent symmetry-inducing weight sharing
through learnable matrices. Romero & Lohit (2022) and van der Ouderaa et al. (2022) learn partial or soft equivari-
ances for each layer.
Neural architecture search Neural architecture search (NAS) aims to optimize both the architecture and its param-
eters for a given task. Liu et al. (2018) approaches this difficult bi-level optimization by creating a large super-network
containing all possible elements and continuously relaxing the discrete architectural parameters to enable search by
gradient descent. Other NAS approaches include evolutionary algorithms (Real et al., 2017; Lu et al., 2019; Elsken
et al., 2017) and reinforcement learning (Zoph & Le, 2017), which search over discretely represented architectures.
2 Background
We assume familiarity with group theory (see Appendix A.1). Let Gbe a discrete group. The lth G-equivariant group
convolutional layer (Cohen & Welling, 2016) of a group convolutional neural network (G-CNN) convolves the feature
map f:GRCl1output from the previous layer with a filter with kernel size krepresented as learnable parameters
ψ:GRCl×Cl1. For each output channel d[Cl], where [C] := {1, . . . , C}, and group element gG, the
layer’s output is defined via the convolution operator1:
[f ?Gψ]d(g) = X
hG
Cl1
X
c=1
fc(h)ψd,c(g1h).(1)
The first layer is a special case: the input to the network needs to be lifted via this operation such that the output
feature map of this layer has a domain of G. In the case of image data, an image xwith Cchannels may be interpreted
1We identify the correlation and convolution operators as they only differ where the inverse group element is placed and refer
to both as ”convolution” throughout this work.
2
Equivariance-aware Architectural Optimization of Neural Networks
+
Learnable
parameters
Equivariance
relaxation
morphism
[G]-mixed equivariant layer
Training
Expanded lter
C4 equivariance C2 equivariance
C4 equivariant operation
C2 equivariant operation
C1 equivariant operation
A B
Figure 1: Visualizations of (A) the equivariance relaxation morphism and (B) the [G]-mixed equivariant layer, using
the C4group. In (A), the learnable parameters of a C4-equivariant convolutional layer are expanded using the group
element actions, such that the expanded filter can be used in a standard convolutional layer. Applying the equivariance
relaxation morphism reparameterizes the layer to only be architecturally constrained to C2equivariance, initialized
to be functionally C4equivariant. In (B), convolutional operations equivariant to subgroups of C4are summed with
learnable architectural weighting parameters.
as a function x:Z2RCmapping each pixel in coordinate space to a real number for each channel, where the cth
channel of xis referred to as xc. The input is x:Z2RC0, so the layer is instead a lifting convolution:
[x ?Gψ]d(g) = X
yZ2
C0
X
c=1
xc(y)ψd,c(g1y).(2)
We present our contributions in the group convolutional layer case, although similar claims apply for the lifting con-
volutional layer case.
3 Towards Architectural Optimization over Subgroups
We propose two mechanisms to enable search over subgroups: the equivariance relaxation morphism and a [G]-mixed
equivariant layer, shown in Figure 1. The proposed morphism, described in Section 3.1, changes the equivariance
constraint from one group to another subgroup while preserving the learned weights of the initial group convolutional
operator. The [G]-mixed equivariant layer, presented in Section 3.2, allows for a single layer to represent equivariance
to multiple subgroups through a weighted sum.
3.1 Equivariance Relaxation Morphism
The equivariance relaxation morphism reparameterizes a G-equivariant group (or lifting) convolutional layer to oper-
ate over a subgroup of G, partially removing weight-sharing constraints from the parameter space while maintaining
the functionality of the layer.
Let G0Gbe a subgroup of G. Let Rbe a system of representatives of the left quotient (including the neutral
element), so that G0\G={G0r|rR},where G0r:={g0r|g0G0}.Given a G-equivariant group convolutional
layer with feature map fand kernel ψ, we define the relaxed feature map ˜
f:G0R(Cl1×|R|)and relaxed kernel
˜
ψ:G0R(Cl×|R|)×(Cl1×|R|)as follows. For c[Cl1],s, t R,d[Cl]:
˜
f(c,s)(g0):=fc(g0s),(3)
˜
ψ(d,t),(c,s)(g0):=ψd,c(t1g0s).(4)
We define the equivariance relaxation morphism from Gto G0as the reparameterization of ψas ˜
ψ(Eq. 4) and
reshaping of fas ˜
f(Eq. 3). We will show that the new output layer, [˜
f ?G0˜
ψ](d,t)(g0), is equivalent to [f ?Gψ]d(g0t)
down to reshaping. Since the mapping G0×RG, (g0, t)7→ g0t, is bijective, every gcan uniquely be written as
g=g0twith g0G0and tR. For gG,G0gG0\Ghas a unique representative tRwith G0g=G0t, and
g0:= gt1G0. By the same argument, hGmay be written as h=h0swith unique hG0and sR. With
3
Equivariance-aware Architectural Optimization of Neural Networks
these preliminaries, we get:
[f ?Gψ]d(g0t) = [f ?Gψ]d(g)(5)
=X
hG
Cl1
X
c=1
fc(h)ψd,c(g1h),(6)
=X
h0G0X
sR
Cl1
X
c=1
fc(h0s)ψd,c(t1g0−1h0s),(7)
=X
h0G0
Cl1
X
c=1 X
sR
˜
f(c,s)(h0)˜
ψ(d,t),(c,s)(g0−1h0),(8)
=h˜
f ?G0˜
ψi(d,t)(g0),(9)
which shows the claim. Thus, the convolution of ˜
fwith ˜
ψis equivariant to Gbut parametrized as a G0-equivariant
group convolutional layer, where the representatives are expanded into independent channels. This morphism can be
viewed as initializing a G0-equivariant layer with a pre-trained prior of equivariance to G, maintaining any previous
training.
Standard convolutional layers are a special case of group-equivariant layers, where the group is translational symmetry
over pixel space. Regular group convolutions are often implemented by relaxation to the translational symmetry group
by expanding the kernel via the appropriate group actions, allowing a standard convolution implementation from a
deep learning library to be used. The equivariance relaxation morphism generalizes this concept to any subgroup.
With the given preliminaries and the case of G0=T(2),˜
fand ˜
ψare computed such that ˜
f(c,s)(g0):=fc(g0s)and
˜
ψ(d,t),(c,s)(g0):=ψd,c(t1g0s)for each g0T(2),c[Cl1],s, t R, and d[Cl].
Let SG:= |R|. The learnable parameters of the Gl-equivariant lth layer with Cloutput channels, corresponding to
ψ, are stored as a tensor of size Cl×Cl1×SGl×Kl×Kl. The kernel transformation expands this kernel tensor
by performing the action of each rRon another copy of the tensor to expand its shape along a new dimension,
resulting in a tensor of size Cl×SGl×Cl1×SGlf×Kl×Kl, which is reshaped to ClSGl×Cl1SGl×Kl×Kl. The
input tensor to the lth layer, corresponding to f, is in the shape of B×Cl1×SGl×Hl1×Wl1,which is reshaped
to B×Cl1SGl×Hl1×Wl1and convolved with the expanded kernel. The output of shape B×ClSGl×Hl×Wl
is reshaped to B×Cl×SGl×Hl×Wl.
To implement the equivariance relaxation morphism, the new kernel tensor is initialized by applying Equation 4 such
that result of applying the preceding kernel transformation is equivalent. Our implementation of group actions relies
on group channel indexing to represent the order of group elements: to ensure this is consistent before and after
the morphism, the appropriate reordering of the output and input channels of the expanded filter are applied upon
expansion. The new kernel tensor has a shape of Cl|R| × Cl1|R| × SGl/|R| × Kl×Kl.
3.2 [G]-Mixed Equivariant Layer
Towards learning equivariance, we additionally propose partial equivariance through a mixture of layers, each con-
strained to equivariance to different groups and applied in parallel to the same input then all combined via a weighted
sum. The equivariance relaxation morphism provides a mapping of group elements between pairs of groups where one
is a subgroup of the other. For a set of groups [G]where each group is a subgroup or supergroup of all other groups
within the set, we define a [G]-mixed equivariant layer as:
fˆ
?[G][ψ](d,t)(g) = X
G[G]
zGhf ?G0f
ψGi(d,t)(g)(10)
=
f ?G0X
G[G]
zGf
ψG
(d,t)
(g),(11)
where each element zGof [z]:={zG|G[G]}is an architectural weighting parameter such that PG[G]zG= 1, G0
is a subgroup of all groups in [G], each element ψGof [ψ]is a kernel with a domain of G, and f
ψGis the transformation
of ψGfrom a domain of Gto G0as defined in Equation 4. Thus, the layer is parametrized by [ψ]and [z], computing
a weighted sum of operations that are equivariant to different groups of [G]. The layer may be equivalently computed
4
摘要:

EQUIVARIANCE-AWAREARCHITECTURALOPTIMIZATIONOFNEURALNETWORKSKaitlinMaileIRIT,UniversityofToulousekaitlin.maile@irit.frDennisG.WilsonISAE-SUPAERO,UniversityofToulousedennis.wilson@isae-supaero.frPatrickForr´eUniversityofAmsterdamp.d.forre@uva.nlABSTRACTIncorporatingequivariancetosymmetrygroupsasaconst...

展开>> 收起<<
EQUIVARIANCE -AWARE ARCHITECTURAL OPTIMIZATION OF NEURAL NETWORKS Kaitlin Maile.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.24MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注