Analysis of sub-Riemannian PDE-G-CNNs Gijs Bellaard Daan L. J. Bon Gautam Pai Bart M. N. Smets and Remco Duits Department of Mathematics and Computer Science CASA Eindhoven University of

2025-04-27 0 0 6.94MB 29 页 10玖币
侵权投诉
Analysis of (sub-)Riemannian PDE-G-CNNs
Gijs Bellaard, Daan L. J. Bon, Gautam Pai, Bart M. N. Smets and Remco Duits
Department of Mathematics and Computer Science, CASA, Eindhoven University of
Technology, Eindhoven, The Netherlands .
*Corresponding author(s). E-mail(s): g.bellaard@tue.nl;
Contributing authors: d.l.j.bon@tue.nl;g.pai@tue.nl;b.m.n.smets@tue.nl;r.duits@tue.nl;
Abstract
Group equivariant convolutional neural networks (G-CNNs) have been successfully applied in geomet-
ric deep learning. Typically, G-CNNs have the advantage over CNNs that they do not waste network
capacity on training symmetries that should have been hard-coded in the network. The recently intro-
duced framework of PDE-based G-CNNs (PDE-G-CNNs) generalises G-CNNs. PDE-G-CNNs have
the core advantages that they simultaneously 1) reduce network complexity, 2) increase classification
performance, and 3) provide geometric interpretability. Their implementations primarily consist of
linear and morphological convolutions with kernels.
In this paper we show that the previously suggested approximative morphological kernels do not
always accurately approximate the exact kernels accurately. More specifically, depending on the spatial
anisotropy of the Riemannian metric, we argue that one must resort to sub-Riemannian approxima-
tions. We solve this problem by providing a new approximative kernel that works regardless of the
anisotropy. We provide new theorems with better error estimates of the approximative kernels, and
prove that they all carry the same reflectional symmetries as the exact ones.
We test the effectiveness of multiple approximative kernels within the PDE-G-CNN frame-
work on two datasets, and observe an improvement with the new approximative kernels. We
report that the PDE-G-CNNs again allow for a considerable reduction of network complex-
ity while having comparable or better performance than G-CNNs and CNNs on the two
datasets. Moreover, PDE-G-CNNs have the advantage of better geometric interpretability over
G-CNNs, as the morphological kernels are related to association fields from neurogeometry.
Keywords: Convolutional neural networks, Scale space theory, Geometric deep learning, Morphological
convolutions, PDEs, Riemannian Geometry, sub-Riemannian Geometry
1 Introduction
Many classification, segmentation, and tracking
tasks in computer vision and digital image pro-
cessing require some form of “symmetry”. Think,
for example, of image classification. If one rotates,
reflects, or translates an image the classification
stays the same. We say that an ideal image clas-
sification is invariant under these symmetries.
A slightly different situation is image segmenta-
tion. In this case, if the input image is in some
way changed the output should change accord-
ingly. Therefore, an ideal image segmentation is
equivariant with respect to these symmetries.
Many computer vision and image processing
problems are currently being tackled with neural
networks (NNs). It is desirable to design neural
1
arXiv:2210.00935v4 [cs.LG] 3 Apr 2023
2(sub)Riemannian PDE-G-CNNs
networks in such a way that they respect the sym-
metries of the problem, i.e. make them invariant or
equivariant. Think for example of a neural network
that detects cancer cells. It would be disastrous
if, by for example slightly translating an image,
the neural network would give totally different
diagnosis, even though the input is essentially the
same.
One way to make the networks equivariant or
invariant is to simply train them on more data.
One could take the training dataset and augment
it with translated, rotated and reflected versions
of the original images. This approach however
is undesirable: invariance or equivariance is still
not guaranteed and the training takes longer. It
would be better if the networks are inherently
invariant or equivariant by design. This avoids a
waste of network-capacity, guarantees invariance
or equivariance, and increases performances, see
for example [1].
More specifically, many computer vision and
image processing problems are tackled with con-
volutional neural networks (CNNs) [24]. Convo-
lution neural networks have the property that
they inherently respect, to some degree, trans-
lation symmetries. CNNs do not however take
into account rotational or reflection symmetries.
Cohen and Welling introduced group equivariant
convolutional neural networks (G-CNNs) in [5]
and designed a classification network that is inher-
ently invariant under 90 degree rotations, integer
translations and vertical/horizontal reflections.
Much work is being done on invariant/equivari-
ant networks that exploit inherent symmetries, a
non-exhaustive list is [1,626]. The idea of includ-
ing geometric priors, such as symmetries, into the
design of neural networks is called ‘Geometric
Deep Learning’ in [27].
In [28] partial differential equation (PDE)
based G-CNNs are presented, aptly called PDE-G-
CNNs. In fact, G-CNNs are shown to be a special
case of PDE-G-CNNs (if one restricts the PDE-
G-CNNs only to convection, using many transport
vectors [28, Sec.6]). With PDE-G-CNNs the usual
non-linearities that are present in current net-
works, such as the ReLU activation function and
max-pooling, are replaced by solvers for specifi-
cally chosen non-linear evolution PDEs. Figure 1
illustrates the difference between a traditional
CNN layer and a PDE-G-CNN layer.
The PDEs that are used in PDE-G-CNNs are
not chosen arbitrarily: they come directly from the
world of geometric image analysis, and thus their
effects are geometrically interpretable. This makes
PDE-G-CNNs more geometrically meaningful and
interpretable than traditional CNNs. Specifically,
the PDEs considered are diffusion, convection,
dilation and erosion. These 4 PDEs correspond
to the common notions of smoothing, shifting,
max pooling, and min pooling. They are solved
by linear convolutions, resamplings, and so-called
morphological convolutions. Figure 2illustrates
the basic building block of a PDE-G-CNN.
One shared property of G-CNNs and PDE-
G-CNNs is that the input data usually needs to
be lifted to a higher dimensional space. Take, for
example, the case of image segmentation with a
convolution neural network where we model/ide-
alize the images as real-valued function on R2. If
we keep the data as functions on R2and want the
convolutions within the network to be equivari-
ant, then the only possible ones that are allowed
are with isotropic kernels, [29, p.258]. This type
of short-coming generalizes to other symmetry
groups as well [12, Thm.1]. One can imagine that
this is a constraint too restrictive to work with,
and that is why we lift the image data.
Within the PDE-G-CNN framework the input
images are considered real-valued functions on Rd,
the desired symmetries are represented by the Lie
group of roto-translations SE(d), and the data is
lifted to the homogeneous space of ddimensional
positions and orientations Md. It is on this higher
dimensional space on which the evolution PDEs
are defined, and the effects of diffusion, dilation,
and erosion are completely determined by the Rie-
mannian metric tensor field Gthat is chosen on
Md. If this Riemannian metric tensor field Gis
left-invariant, the overall processing is equivari-
ant, this follows by combining techniques in [30,
Thm. 21, Chpt. 4], [31, Lem. 3, Thm. 4].
The Riemannian metric tensor field Gwe will
use in this article is left-invariant and determined
by three nonnegative parameters: w1,w2, and w3.
The definition can be found in the preliminaries,
Section 2Equation (4). It is exactly these three
parameters that during the training of a PDE-
G-CNN are optimized. Intuitively, the parameters
correspondingly regulate the cost of main spatial,
lateral spatial, and angular motion. An important
(sub)Riemannian PDE-G-CNNs 3
CNN Layer
Affine
Transform
Conv Pool ReLU
Conv Pool ReLU
Conv Pool ReLU
PDE evolution
Affine
Transform
PDE evolution
PDE evolution
PDE Layer
Fig. 1: The difference between a traditional CNN layer and a PDE-G-CNN layer. In contrast to traditional
CNNs, the layers in a PDE-G-CNN do not depend on ad-hoc non-linearities like ReLU’s, and are instead
implemented as solvers of (non)linear PDEs. What the PDE evolution block consists of can be seen in
Figure 2.
PDE evolution
Convection Dilation ErosionDiffusion
Fig. 2: Overview of a PDE evolution block. Con-
vection is solved by resampling, diffusion is solved
by a linear group convolution with a certain kernel
[28, Sec.5.2], and dilation and erosion are solved
by morphological group convolutions (3) with a
morphological kernel (1).
quantity in the analysis of this paper is the spatial
anisotropy ζ:= w1
w2, as will become clear later.
In this article we only consider the 2 dimen-
sional case, i.e. d= 2. In this case, the elements of
both M2and SE(2) can be represented by three
real numbers: (x, y, θ)R2×[0,2π). In the case
of M2the xand yrepresent a position and θrep-
resents an orientation. Throughout the article we
take p0:= (0,0,0) M2as our reference point in
M2. In the case of SE(2) we have that xand y
represent a translation and θa rotation.
As already stated, within the PDE-G-CNN
framework images are lifted to the higher dimen-
sional space of positions and orientations Md.
There are a multitude of ways of achieving this,
but there is one very natural way to do it: the
orientation score transform [30,3234]. In this
transform we pick a point (x, y)R2in an image
and determine how good a certain orientation
θ[0,2π) fits the chosen point. In Figure 3an
example of an orientation score is given. We
refer to [34, Sec.2.1] for a summary of how an
orientation score transform works.
Inspiration for using orientation scores comes
from biology. The Nobel laureates Hubel and
Wiesel found that many cells in the visual cor-
tex of cats have a preferred orientation [35,36].
2𝕄2
θ
x
y
x
y
Fig. 3: An example of an image together with its
orientation score. We can see that the image, a
real-valued function on R2, is lifted to an orien-
tation score, a real-valued function on M2. Notice
that the lines that are crossing in the left image
are disentangled in the orientation score.
Moreover, a neuron that fires for a specific ori-
entation excites neighboring neurons that have
an “aligned” orientation. Petitot and Citti-Sarti
proposed a model [37,38] for the distribution of
the orientation preference and this excitation of
neighbors based on sub-Riemannian geometry on
M2. They relate the phenomenon of preference
of aligned orientations to the concept of asso-
ciation fields [39], which model how a specific
local orientation places expectations on surround-
ing orientations in human vision. Figure 4provides
an impression of such an association field.
As shown in [42, Fig.17] association fields
are closely approximated by (projected) sub-
Riemannian geodesics in M2for which optimal
synthesis has been obtained by Sachkov and Moi-
seev [43,44]. Furthermore, in [45] it is shown that
the Riemannian geodesics in M2converge to the
sub-Riemannian geodesics by increasing the spa-
tial anisotropy ζof the metric. This shows that in
practice one can approximate the sub-Riemannian
model by Riemannian models. Figure 5shows
the relation between association fields and sub-
Riemannian geometry in M2.
4(sub)Riemannian PDE-G-CNNs
Fig. 4: Association field lines from neurogeome-
try [37, Fig.43], [39, Fig.16]. Such association field
lines can be well approximated by spatially pro-
jected sub-Riemannian geodesics in M2[37,38,40,
41], [42, Fig.17].
(a) (b)
Fig. 5: A visualization of the exact Riemannian
distance d, and its relation with association fields.
In 5a we see isocontours of d(p0,·) in M2, and on
the bottom we see the min-projection over θof
these contours (thus we selected the minimal end-
ing angle in contrast to Figure 4). The domain
of the plot is [3,3]2×[π, π)M2. The chosen
contours are d= 0.5,1,1.5,2, and 2.5. The met-
ric parameters are (w1, w2, w3) = (1,64,1). Due to
the very high spatial anisotropy we approach the
sub-Riemannian setting. In 5b we see the same
min-projection together with some corresponding
spatially projected geodesics.
The relation between association fields and
Riemannian geometry on M2directly extends to
a relation between dilation/erosion and associ-
ation fields. Namely, performing dilation on an
orientation score in M2is similar to extending a
line segment along its association field lines. Sim-
ilarly, performing erosion is similar to sharpening
a line segment perpendicular to its association
field lines. This makes dilation/erosion the perfect
candidate for a task such as line completion.
(a) (b)
Fig. 6: One sample of the Lines dataset. In 6a
we see the input, in 6b the perceived curve that
we consider as ground-truth (as the input is con-
structed by interrupting the ground-truth line and
adding random local orientations).
In the line completion problem, the input
is an image containing multiple line segments,
and the desired output is an image of the line
that is “hidden” in the input image. Figure 6
shows such an input and desired output. This is
also what David Field et al. studied in [39]. We
anticipate that PDE-G-CNNs outperform classi-
cal CNNs in the line completion problem due
to PDE-G-CNNs being able to dilate and erode.
To investigate this we made a synthetic dataset
called “Lines” consisting of grayscale 64 ×64 pixel
images, together with their ground-truth line com-
pletion. In Figure 7a complete abstract overview
of the architecture of a PDE-G-CNN performing
line completion is visualized. Figure 8illustrates
how a PDE-G-CNN and CNN incrementally com-
plete a line throughout their layers.
In Proposition 1we show that solving the dila-
tion and erosion PDEs can be done by performing
a morphological convolution with a morphological
kernel kα
t:M2R0, which is easily expressed
in the Riemannian distance d=dGon the mani-
fold:
kα
t(p) = t
βdG(p0,p)
tβ
.(1)
Here p0= (0,0,0) is our reference point in M2,
and time t > 0 controls the amount of erosion and
dilation. Furthermore, α > 1 controls the “soft-
ness” of the max and min-pooling, with 1
α+1
β= 1.
Erosion is done through a direct morphological
convolution (3) with this specific kernel. Dilation
is solved in a slightly different way but again with
the same kernel (Proposition 1in Section 3will
explain the details).
(sub)Riemannian PDE-G-CNNs 5
Lifting Layer PDE Layers Projection Layer
Image
2
Orientation Scores
𝕄2
Processed OSs
𝕄2
Processed Image
2
Fig. 7: The overall architecture for a PDE-G-CNN performing line completion on the Lines data set.
Note how the input image is lifted to an orientation score that lives in the higher dimensional space M2,
run through PDE-G-CNN layers(Figures 1and 2), and afterwards projected down back to R2. Usually
this projection is done by taking the maximum value of a feature map over the orientations θ, for every
position (x, y)R2.
Input Output
Network Depth
CNN
PDE-G-CNN
Fig. 8: Visualization of how a PDE-G-CNN and CNN incrementally complete a line throughout their
layers. The first two rows are of a PDE-G-CNN, the second two rows of a CNN. The first column is the
input, the last column the output. The intermediate columns are a representative selection of feature maps
from the output of the respective CNN or PDE layer (Figure 1). The feature maps of the PDE-G-CNN
live in M2, but for clarity we only show the max-projection over θ. Within the feature maps of the PDE-
G-CNN association fields from neurogeometry [37,39,46] become visible as network depth increases.
Such merging of association fields is not visible in the feature maps of the CNN. This observation is
consistent throughout different inputs.
And this is where a problem arises: calculat-
ing the exact distance don M2required in (1) is
computationally expensive [47]. To alleviate this
issue, we resort to estimating the true distance d
with computationally efficient approximative dis-
tances, denoted throughout the article by ρ. We
then use such a distance approximation within (1)
to create a corresponding approximative morpho-
logical kernel, and in turn use this to efficiently
calculate the effect of dilation and erosion.
In [28] one such distance approximation is
used: the logarithmic distance estimate ρcwhich
uses the logarithmic coordinates ci(8). In short,
ρc(p) is equal to the Riemannian length of the
exponential curve that connects p0to p. The for-
mal definition will follow in Section 4. In Figure 9
an impression of ρcis given.
Clearly, an error is made when the effect of
erosion and dilation is calculated with an approx-
imative morphological kernel. As a morphological
摘要:

Analysisof(sub-)RiemannianPDE-G-CNNsGijsBellaard,DaanL.J.Bon,GautamPai,BartM.N.SmetsandRemcoDuitsDepartmentofMathematicsandComputerScience,CASA,EindhovenUniversityofTechnology,Eindhoven,TheNetherlands.*Correspondingauthor(s).E-mail(s):g.bellaard@tue.nl;Contributingauthors:d.l.j.bon@tue.nl;g.pai@tue....

展开>> 收起<<
Analysis of sub-Riemannian PDE-G-CNNs Gijs Bellaard Daan L. J. Bon Gautam Pai Bart M. N. Smets and Remco Duits Department of Mathematics and Computer Science CASA Eindhoven University of.pdf

共29页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:29 页 大小:6.94MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 29
客服
关注