2(sub)Riemannian PDE-G-CNNs
networks in such a way that they respect the sym-
metries of the problem, i.e. make them invariant or
equivariant. Think for example of a neural network
that detects cancer cells. It would be disastrous
if, by for example slightly translating an image,
the neural network would give totally different
diagnosis, even though the input is essentially the
same.
One way to make the networks equivariant or
invariant is to simply train them on more data.
One could take the training dataset and augment
it with translated, rotated and reflected versions
of the original images. This approach however
is undesirable: invariance or equivariance is still
not guaranteed and the training takes longer. It
would be better if the networks are inherently
invariant or equivariant by design. This avoids a
waste of network-capacity, guarantees invariance
or equivariance, and increases performances, see
for example [1].
More specifically, many computer vision and
image processing problems are tackled with con-
volutional neural networks (CNNs) [2–4]. Convo-
lution neural networks have the property that
they inherently respect, to some degree, trans-
lation symmetries. CNNs do not however take
into account rotational or reflection symmetries.
Cohen and Welling introduced group equivariant
convolutional neural networks (G-CNNs) in [5]
and designed a classification network that is inher-
ently invariant under 90 degree rotations, integer
translations and vertical/horizontal reflections.
Much work is being done on invariant/equivari-
ant networks that exploit inherent symmetries, a
non-exhaustive list is [1,6–26]. The idea of includ-
ing geometric priors, such as symmetries, into the
design of neural networks is called ‘Geometric
Deep Learning’ in [27].
In [28] partial differential equation (PDE)
based G-CNNs are presented, aptly called PDE-G-
CNNs. In fact, G-CNNs are shown to be a special
case of PDE-G-CNNs (if one restricts the PDE-
G-CNNs only to convection, using many transport
vectors [28, Sec.6]). With PDE-G-CNNs the usual
non-linearities that are present in current net-
works, such as the ReLU activation function and
max-pooling, are replaced by solvers for specifi-
cally chosen non-linear evolution PDEs. Figure 1
illustrates the difference between a traditional
CNN layer and a PDE-G-CNN layer.
The PDEs that are used in PDE-G-CNNs are
not chosen arbitrarily: they come directly from the
world of geometric image analysis, and thus their
effects are geometrically interpretable. This makes
PDE-G-CNNs more geometrically meaningful and
interpretable than traditional CNNs. Specifically,
the PDEs considered are diffusion, convection,
dilation and erosion. These 4 PDEs correspond
to the common notions of smoothing, shifting,
max pooling, and min pooling. They are solved
by linear convolutions, resamplings, and so-called
morphological convolutions. Figure 2illustrates
the basic building block of a PDE-G-CNN.
One shared property of G-CNNs and PDE-
G-CNNs is that the input data usually needs to
be lifted to a higher dimensional space. Take, for
example, the case of image segmentation with a
convolution neural network where we model/ide-
alize the images as real-valued function on R2. If
we keep the data as functions on R2and want the
convolutions within the network to be equivari-
ant, then the only possible ones that are allowed
are with isotropic kernels, [29, p.258]. This type
of short-coming generalizes to other symmetry
groups as well [12, Thm.1]. One can imagine that
this is a constraint too restrictive to work with,
and that is why we lift the image data.
Within the PDE-G-CNN framework the input
images are considered real-valued functions on Rd,
the desired symmetries are represented by the Lie
group of roto-translations SE(d), and the data is
lifted to the homogeneous space of ddimensional
positions and orientations Md. It is on this higher
dimensional space on which the evolution PDEs
are defined, and the effects of diffusion, dilation,
and erosion are completely determined by the Rie-
mannian metric tensor field Gthat is chosen on
Md. If this Riemannian metric tensor field Gis
left-invariant, the overall processing is equivari-
ant, this follows by combining techniques in [30,
Thm. 21, Chpt. 4], [31, Lem. 3, Thm. 4].
The Riemannian metric tensor field Gwe will
use in this article is left-invariant and determined
by three nonnegative parameters: w1,w2, and w3.
The definition can be found in the preliminaries,
Section 2Equation (4). It is exactly these three
parameters that during the training of a PDE-
G-CNN are optimized. Intuitively, the parameters
correspondingly regulate the cost of main spatial,
lateral spatial, and angular motion. An important