
Contributions
In this paper, we utilize PAC Bayes framework to derive generalization bounds
on equivariant networks. Our focus is on equivariant networks built following [
WGW+18
,
CW16b
].
We combine the representation theoretic framework of [
WC19
] with PAC-Bayes framework of
[
NBS18
] to get generalization bounds on equivariant networks as a function of irreducible
representations (irreps) and their multiplicities. Different from previous PAC-Bayes analysis, we
derive the perturbation analysis in Fourier domain. Without this approach, as we will show, the
naive application of [
NBS18
] lead to degenerate bounds. As part of the proof, we derive a tail
bound on the spectral norm of random matrices characterized as direct sum of irreps. Note that
for building equivariant networks, we need to work with real representations and not complex
ones, which are conventionally studied in representation theory. The obtained generalization
bound provide new insights about the effect of design choices on the generalization error verified
via numerical results. To the best of our knowledge, this is the first generalization bound for
equivariant networks with an explicit equivariant structure. We conduct extensive experiments
to verify our theoretical insights, as well as some of the previous observations about neural
networks.
2 Related Works
Equivariant networks.
In machine learning, there has been many works on incorporating
symmetries and invariances in neural network design. This class of models has been extensively
studied from different perspectives [
ST93
,
ST89
,
KT18
,
CGW18
,
Mal12
,
DDFK16
,
CW16a
,
CW16b
,
WGTB17
,
WHS18
,
WGW+18
,
TSK+18
,
BLV+18
,
WC19
,
Bek20
,
DMGP19
,
FSIW20
,
WFVW21
,
CLW22
] to mention only some. Indeed, the most recent developments in the field
of equivariant networks suggest a design which enforces equivariance not only at a global level
[
LSBP16
] but also at each layer of the model. An interesting question is how a particular choice
for designing equivariant networks affect its performance and in particular generalization. A
related question is whether including this inductive bias helps the generalization. The authors
in [
SGSR17a
] consider general invariant classifiers and obtain robustness based generalization
bounds, as in [
SGSR17b
], assuming that data transformation changes the inputs drastically.
For finite groups, they reported a scaling of generalization error with 1
/p|H|
where
|H|
is the
cardinality of underlying group. In [
Ele22
], the gain of invariant or equivariant hypotheses was
studied in PAC learning framework, where the gain in generalization is attributed to shrinking the
hypothesis space to the space of orbit representatives. A similar argument can be found in [
SIK21
],
where it is shown that the invariant and equivariant models effectively operate on a shrunk space
called Quotient Feature Space (
QFS
), and the generalization error is proportional to its volume.
They generalize the result of [
SGSR17a
] and relax the robustness assumption, although their
bound shows suboptimal exponent for the sample size. Similar to [
SGSR17a
], they focus on
scaling improvement of the generalization error with invariance and do not consider computable
architecture dependent bounds. The authors in [
LvdWK+20
] use PAC-Bayesian framework to
study the effect of invariance on generalization, although do not obtain any explicit bound.
[
BM19
] studies the stability of models equivariant to compact groups from a RKHS point of view,
relating this stability to a Rademacher complexity of the model and a generalization bound. The
authors in [
EZ21
] provided the generalization gain of equivariant models more concretely and
reported VC-dimension analysis. Subsequent works also considered the connection of invariance
and generalization [
ZAH21
]. In contrast with these models, we consider the equivariant networks
with representation theoretic construction. Beyond generalization, we are interested in getting
design insights from the bound.
Generalization error for neural networks.
A more general study of generalization error
in neural networks, however, has been ongoing for some time with huge body of works on the
topic. We refer to some highlights here. A challenge for this problem was brought up in [
ZBH+17
],
where it was shown using the image-net dataset with random labels, that the generalization error
2