
Scale Equivariant U-Net
scale-equivariance was not isolated, as the performance of the models was not measured for inputs on scales unseen
in the training set. Later on, this approach was revisited by [
11
], where the Gaussian scale-space originally used was
generalized to other scale-spaces and the models were tested in experiments where the networks are trained in one
fixed scale and tested on unseen scales, albeit on synthetic or simple datasets. In all these approaches, the authors
either avoided pooling and upsampling in their architectures, or used them but did not discuss their impact on scale
equivariance.
While scale-equivariance has been a topic in the literature for some time, as far as we know a scale-equivariant U-Net
has not yet been proposed, contrary to the rotation-equivariance case [
1
]. Moreover, the current benchmarks for
scale-equivariance were either based on simple datasets like MNIST or did not explicitly measure the equivariance
in their segmentation or classification experiments, by training the networks on one fixed scale and testing on unseen
scales. Here we propose semantic segmentation experiments based on natural data which measure the equivariance of
the predictions.
3 Semigroup Equivariant Convolutional Networks
In this work and following [
19
], image scalings are restricted to image downscalings, which can be viewed as actions
of a semigroup on images. As illustrated by Figure 1, we seek equivariance with respect to both downscalings and
translations. Hence, the network layers are designed to be equivariant with respect to a semigroup combining both
transformations.
3.1 Semigroup Equivariance
A semigroup, contrary to a group, can model non-invertible transformations, e.g. the downsampling operation in a
discrete domain. In the following, (G, ·)denotes a discrete semigroup.
Let
X
be a set, a family of mappings
(ϕg)g∈G
from
X
to itself, is a semigroup action on
X
if it is homomorphic to the
semigroup, that is, if either
∀g, h ∈G
,
ϕg◦ϕh=ϕg·h
(left action), or
∀g, h ∈G
,
ϕg◦ϕh=ϕh·g
(right action). In
this paper we will consider in particular the following right action, acting on Fthe set of functions from Gto Rn.
∀u, g ∈G, ∀f∈ F, Ru(f)(g) = f(u·g).(1)
Given two sets
X
and
Y
, a mapping
H:X → Y
is said equivariant with respect to
G
if there are semigroup actions
(ϕg)g∈G
and
(ψg)g∈G
on
X
and
Y
respectively, such that
∀g∈G, H ◦ϕg=ψg◦H
. This definition gets more
intuitive when
X=Y
is a set of images and
(ϕg)g∈G= (ψg)g∈G
are scalings or translations, as illustrated in Figure 1.
3.2 Scale-cross-correlation
For
γ > 1
an integer, let
Sγ={γn|n∈N}
, endowed with the multiplication, the semigroup representing discrete
scalings of base
γ
. Then we consider the semigroup
G=Sγ×Z2
of discrete scalings and translations, endowed with
the internal operator “·”, defined by
∀k, l ∈N, z, y ∈Z2(γk, z)·(γl, y)=(γk+l, γky+z).(2)
Following
(1)
, the action of this semigroup on functions mapping
Sγ×Z2
to
R
is
Rγk,z[f](γl, y) = f(γk+l, γky+z)
.
In analogy to convolutions, which are linear and equivariant to translations, a key step in equivariant CNNs is defining
linear operators which are equivariant to some class of operators. The semigroup cross-correlation, defined for an image
f:G→R
and a filter
h:G→R
is a generalization of the convolution which is linear and equivariant to the action
Rg
of a semigroup. When applied to the semigroup of scales and translations, we obtain the scale-cross-correlation.
Both were introduced in [19]. The scale-cross-correlation is written2
(f ?Gh)(γk, z) = X
(γl,y)∈G
Rγk,z[f](γl, y)h(γl, y) = X
l≥0X
y∈Z2
f(γk+l, γky+z)h(γl, y).(3)
This operator is suited for single channel images on
G
, but it can be easily extended to multichannel images. Let the
input f= (f1, . . . , fn)∈(Rn)Gbe a signal with nchannels. Assuming the output has mchannels, the filter is of the
form
h:G→Rn×m
. We compute the operator
f ?Gh
at channel
o∈ {1, . . . , m}
as
(f ?Gh)o:=
n
P
c=1
(fc?Ghc,o)
.
The resulting map is equivariant to scalings and translations:
(Rgf ?Gh)o=Rg((f ?Gh)o)
. Note that the composition
of operators which commute with
Rg
still commutes with
Rg
, for which concatenating scale-cross-correlation layers
followed by pointwise activation functions and batch normalization yields equivariant architectures.
2The equations in the case of a general semigroup can be found in Appendix A.
3