Analysis of sub-Riemannian PDE-G-CNNs Gijs Bellaard Daan L. J. Bon Gautam Pai Bart M. N. Smets and Remco Duits Department of Mathematics and Computer Science CASA Eindhoven University of

2025-04-27 0 0 6.94MB 29 页 10玖币

侵权投诉

Analysis of (sub-)Riemannian PDE-G-CNNs

Gijs Bellaard, Daan L. J. Bon, Gautam Pai, Bart M. N. Smets and Remco Duits

Department of Mathematics and Computer Science, CASA, Eindhoven University of

Technology, Eindhoven, The Netherlands .

*Corresponding author(s). E-mail(s): g.bellaard@tue.nl;

Contributing authors: d.l.j.bon@tue.nl;g.pai@tue.nl;b.m.n.smets@tue.nl;r.duits@tue.nl;

Abstract

Group equivariant convolutional neural networks (G-CNNs) have been successfully applied in geomet-

ric deep learning. Typically, G-CNNs have the advantage over CNNs that they do not waste network

capacity on training symmetries that should have been hard-coded in the network. The recently intro-

duced framework of PDE-based G-CNNs (PDE-G-CNNs) generalises G-CNNs. PDE-G-CNNs have

the core advantages that they simultaneously 1) reduce network complexity, 2) increase classiﬁcation

performance, and 3) provide geometric interpretability. Their implementations primarily consist of

linear and morphological convolutions with kernels.

In this paper we show that the previously suggested approximative morphological kernels do not

always accurately approximate the exact kernels accurately. More speciﬁcally, depending on the spatial

anisotropy of the Riemannian metric, we argue that one must resort to sub-Riemannian approxima-

tions. We solve this problem by providing a new approximative kernel that works regardless of the

anisotropy. We provide new theorems with better error estimates of the approximative kernels, and

prove that they all carry the same reﬂectional symmetries as the exact ones.

We test the eﬀectiveness of multiple approximative kernels within the PDE-G-CNN frame-

work on two datasets, and observe an improvement with the new approximative kernels. We

report that the PDE-G-CNNs again allow for a considerable reduction of network complex-

ity while having comparable or better performance than G-CNNs and CNNs on the two

datasets. Moreover, PDE-G-CNNs have the advantage of better geometric interpretability over

G-CNNs, as the morphological kernels are related to association ﬁelds from neurogeometry.

Keywords: Convolutional neural networks, Scale space theory, Geometric deep learning, Morphological

convolutions, PDEs, Riemannian Geometry, sub-Riemannian Geometry

1 Introduction

Many classiﬁcation, segmentation, and tracking

tasks in computer vision and digital image pro-

cessing require some form of “symmetry”. Think,

for example, of image classiﬁcation. If one rotates,

reﬂects, or translates an image the classiﬁcation

stays the same. We say that an ideal image clas-

siﬁcation is invariant under these symmetries.

A slightly diﬀerent situation is image segmenta-

tion. In this case, if the input image is in some

way changed the output should change accord-

ingly. Therefore, an ideal image segmentation is

equivariant with respect to these symmetries.

Many computer vision and image processing

problems are currently being tackled with neural

networks (NNs). It is desirable to design neural

arXiv:2210.00935v4 [cs.LG] 3 Apr 2023

2(sub)Riemannian PDE-G-CNNs

networks in such a way that they respect the sym-

metries of the problem, i.e. make them invariant or

equivariant. Think for example of a neural network

that detects cancer cells. It would be disastrous

if, by for example slightly translating an image,

the neural network would give totally diﬀerent

diagnosis, even though the input is essentially the

same.

One way to make the networks equivariant or

invariant is to simply train them on more data.

One could take the training dataset and augment

it with translated, rotated and reﬂected versions

of the original images. This approach however

is undesirable: invariance or equivariance is still

not guaranteed and the training takes longer. It

would be better if the networks are inherently

invariant or equivariant by design. This avoids a

waste of network-capacity, guarantees invariance

or equivariance, and increases performances, see

for example [1].

More speciﬁcally, many computer vision and

image processing problems are tackled with con-

volutional neural networks (CNNs) [2–4]. Convo-

lution neural networks have the property that

they inherently respect, to some degree, trans-

lation symmetries. CNNs do not however take

into account rotational or reﬂection symmetries.

Cohen and Welling introduced group equivariant

convolutional neural networks (G-CNNs) in [5]

and designed a classiﬁcation network that is inher-

ently invariant under 90 degree rotations, integer

translations and vertical/horizontal reﬂections.

Much work is being done on invariant/equivari-

ant networks that exploit inherent symmetries, a

non-exhaustive list is [1,6–26]. The idea of includ-

ing geometric priors, such as symmetries, into the

design of neural networks is called ‘Geometric

Deep Learning’ in [27].

In [28] partial diﬀerential equation (PDE)

based G-CNNs are presented, aptly called PDE-G-

CNNs. In fact, G-CNNs are shown to be a special

case of PDE-G-CNNs (if one restricts the PDE-

G-CNNs only to convection, using many transport

vectors [28, Sec.6]). With PDE-G-CNNs the usual

non-linearities that are present in current net-

works, such as the ReLU activation function and

max-pooling, are replaced by solvers for speciﬁ-

cally chosen non-linear evolution PDEs. Figure 1

illustrates the diﬀerence between a traditional

CNN layer and a PDE-G-CNN layer.

The PDEs that are used in PDE-G-CNNs are

not chosen arbitrarily: they come directly from the

world of geometric image analysis, and thus their

eﬀects are geometrically interpretable. This makes

PDE-G-CNNs more geometrically meaningful and

interpretable than traditional CNNs. Speciﬁcally,

the PDEs considered are diﬀusion, convection,

dilation and erosion. These 4 PDEs correspond

to the common notions of smoothing, shifting,

max pooling, and min pooling. They are solved

by linear convolutions, resamplings, and so-called

morphological convolutions. Figure 2illustrates

the basic building block of a PDE-G-CNN.

One shared property of G-CNNs and PDE-

G-CNNs is that the input data usually needs to

be lifted to a higher dimensional space. Take, for

example, the case of image segmentation with a

convolution neural network where we model/ide-

alize the images as real-valued function on R2. If

we keep the data as functions on R2and want the

convolutions within the network to be equivari-

ant, then the only possible ones that are allowed

are with isotropic kernels, [29, p.258]. This type

of short-coming generalizes to other symmetry

groups as well [12, Thm.1]. One can imagine that

this is a constraint too restrictive to work with,

and that is why we lift the image data.

Within the PDE-G-CNN framework the input

images are considered real-valued functions on Rd,

the desired symmetries are represented by the Lie

group of roto-translations SE(d), and the data is

lifted to the homogeneous space of ddimensional

positions and orientations Md. It is on this higher

dimensional space on which the evolution PDEs

are deﬁned, and the eﬀects of diﬀusion, dilation,

and erosion are completely determined by the Rie-

mannian metric tensor ﬁeld Gthat is chosen on

Md. If this Riemannian metric tensor ﬁeld Gis

left-invariant, the overall processing is equivari-

ant, this follows by combining techniques in [30,

Thm. 21, Chpt. 4], [31, Lem. 3, Thm. 4].

The Riemannian metric tensor ﬁeld Gwe will

use in this article is left-invariant and determined

by three nonnegative parameters: w1,w2, and w3.

The deﬁnition can be found in the preliminaries,

Section 2Equation (4). It is exactly these three

parameters that during the training of a PDE-

G-CNN are optimized. Intuitively, the parameters

correspondingly regulate the cost of main spatial,

lateral spatial, and angular motion. An important

(sub)Riemannian PDE-G-CNNs 3

CNN Layer

Affine

Transform

…

Conv Pool ReLU

PDE evolution

Affine

Transform

PDE evolution

…

PDE Layer

Fig. 1: The diﬀerence between a traditional CNN layer and a PDE-G-CNN layer. In contrast to traditional

CNNs, the layers in a PDE-G-CNN do not depend on ad-hoc non-linearities like ReLU’s, and are instead

implemented as solvers of (non)linear PDEs. What the PDE evolution block consists of can be seen in

Figure 2.

PDE evolution

Convection Dilation ErosionDiffusion

Fig. 2: Overview of a PDE evolution block. Con-

vection is solved by resampling, diﬀusion is solved

by a linear group convolution with a certain kernel

[28, Sec.5.2], and dilation and erosion are solved

by morphological group convolutions (3) with a

morphological kernel (1).

quantity in the analysis of this paper is the spatial

anisotropy ζ:= w1

w2, as will become clear later.

In this article we only consider the 2 dimen-

sional case, i.e. d= 2. In this case, the elements of

both M2and SE(2) can be represented by three

real numbers: (x, y, θ)∈R2×[0,2π). In the case

of M2the xand yrepresent a position and θrep-

resents an orientation. Throughout the article we

take p0:= (0,0,0) ∈M2as our reference point in

M2. In the case of SE(2) we have that xand y

represent a translation and θa rotation.

As already stated, within the PDE-G-CNN

framework images are lifted to the higher dimen-

sional space of positions and orientations Md.

There are a multitude of ways of achieving this,

but there is one very natural way to do it: the

orientation score transform [30,32–34]. In this

transform we pick a point (x, y)∈R2in an image

and determine how good a certain orientation

θ∈[0,2π) ﬁts the chosen point. In Figure 3an

example of an orientation score is given. We

refer to [34, Sec.2.1] for a summary of how an

orientation score transform works.

Inspiration for using orientation scores comes

from biology. The Nobel laureates Hubel and

Wiesel found that many cells in the visual cor-

tex of cats have a preferred orientation [35,36].

ℝ2𝕄2

Fig. 3: An example of an image together with its

orientation score. We can see that the image, a

real-valued function on R2, is lifted to an orien-

tation score, a real-valued function on M2. Notice

that the lines that are crossing in the left image

are disentangled in the orientation score.

Moreover, a neuron that ﬁres for a speciﬁc ori-

entation excites neighboring neurons that have

an “aligned” orientation. Petitot and Citti-Sarti

proposed a model [37,38] for the distribution of

the orientation preference and this excitation of

neighbors based on sub-Riemannian geometry on

M2. They relate the phenomenon of preference

of aligned orientations to the concept of asso-

ciation ﬁelds [39], which model how a speciﬁc

local orientation places expectations on surround-

ing orientations in human vision. Figure 4provides

an impression of such an association ﬁeld.

As shown in [42, Fig.17] association ﬁelds

are closely approximated by (projected) sub-

Riemannian geodesics in M2for which optimal

synthesis has been obtained by Sachkov and Moi-

seev [43,44]. Furthermore, in [45] it is shown that

the Riemannian geodesics in M2converge to the

sub-Riemannian geodesics by increasing the spa-

tial anisotropy ζof the metric. This shows that in

practice one can approximate the sub-Riemannian

model by Riemannian models. Figure 5shows

the relation between association ﬁelds and sub-

Riemannian geometry in M2.

4(sub)Riemannian PDE-G-CNNs

Fig. 4: Association ﬁeld lines from neurogeome-

try [37, Fig.43], [39, Fig.16]. Such association ﬁeld

lines can be well approximated by spatially pro-

jected sub-Riemannian geodesics in M2[37,38,40,

41], [42, Fig.17].

(a) (b)

Fig. 5: A visualization of the exact Riemannian

distance d, and its relation with association ﬁelds.

In 5a we see isocontours of d(p0,·) in M2, and on

the bottom we see the min-projection over θof

these contours (thus we selected the minimal end-

ing angle in contrast to Figure 4). The domain

of the plot is [−3,3]2×[−π, π)⊂M2. The chosen

contours are d= 0.5,1,1.5,2, and 2.5. The met-

ric parameters are (w1, w2, w3) = (1,64,1). Due to

the very high spatial anisotropy we approach the

sub-Riemannian setting. In 5b we see the same

min-projection together with some corresponding

spatially projected geodesics.

The relation between association ﬁelds and

Riemannian geometry on M2directly extends to

a relation between dilation/erosion and associ-

ation ﬁelds. Namely, performing dilation on an

orientation score in M2is similar to extending a

line segment along its association ﬁeld lines. Sim-

ilarly, performing erosion is similar to sharpening

a line segment perpendicular to its association

ﬁeld lines. This makes dilation/erosion the perfect

candidate for a task such as line completion.

(a) (b)

Fig. 6: One sample of the Lines dataset. In 6a

we see the input, in 6b the perceived curve that

we consider as ground-truth (as the input is con-

structed by interrupting the ground-truth line and

adding random local orientations).

In the line completion problem, the input

is an image containing multiple line segments,

and the desired output is an image of the line

that is “hidden” in the input image. Figure 6

shows such an input and desired output. This is

also what David Field et al. studied in [39]. We

anticipate that PDE-G-CNNs outperform classi-

cal CNNs in the line completion problem due

to PDE-G-CNNs being able to dilate and erode.

To investigate this we made a synthetic dataset

called “Lines” consisting of grayscale 64 ×64 pixel

images, together with their ground-truth line com-

pletion. In Figure 7a complete abstract overview

of the architecture of a PDE-G-CNN performing

line completion is visualized. Figure 8illustrates

how a PDE-G-CNN and CNN incrementally com-

plete a line throughout their layers.

In Proposition 1we show that solving the dila-

tion and erosion PDEs can be done by performing

a morphological convolution with a morphological

kernel kα

t:M2→R≥0, which is easily expressed

in the Riemannian distance d=dGon the mani-

fold:

kα

t(p) = t

βdG(p0,p)

tβ

.(1)

Here p0= (0,0,0) is our reference point in M2,

and time t > 0 controls the amount of erosion and

dilation. Furthermore, α > 1 controls the “soft-

ness” of the max and min-pooling, with 1

α+1

β= 1.

Erosion is done through a direct morphological

convolution (3) with this speciﬁc kernel. Dilation

is solved in a slightly diﬀerent way but again with

the same kernel (Proposition 1in Section 3will

explain the details).

(sub)Riemannian PDE-G-CNNs 5

Lifting Layer PDE Layers Projection Layer

Image

ℝ2

Orientation Scores

𝕄2

Processed OSs

𝕄2

Processed Image

ℝ2

Fig. 7: The overall architecture for a PDE-G-CNN performing line completion on the Lines data set.

Note how the input image is lifted to an orientation score that lives in the higher dimensional space M2,

run through PDE-G-CNN layers(Figures 1and 2), and afterwards projected down back to R2. Usually

this projection is done by taking the maximum value of a feature map over the orientations θ, for every

position (x, y)∈R2.

Input Output

Network Depth

CNN

PDE-G-CNN

Fig. 8: Visualization of how a PDE-G-CNN and CNN incrementally complete a line throughout their

layers. The ﬁrst two rows are of a PDE-G-CNN, the second two rows of a CNN. The ﬁrst column is the

input, the last column the output. The intermediate columns are a representative selection of feature maps

from the output of the respective CNN or PDE layer (Figure 1). The feature maps of the PDE-G-CNN

live in M2, but for clarity we only show the max-projection over θ. Within the feature maps of the PDE-

G-CNN association ﬁelds from neurogeometry [37,39,46] become visible as network depth increases.

Such merging of association ﬁelds is not visible in the feature maps of the CNN. This observation is

consistent throughout diﬀerent inputs.

And this is where a problem arises: calculat-

ing the exact distance don M2required in (1) is

computationally expensive [47]. To alleviate this

issue, we resort to estimating the true distance d

with computationally eﬃcient approximative dis-

tances, denoted throughout the article by ρ. We

then use such a distance approximation within (1)

to create a corresponding approximative morpho-

logical kernel, and in turn use this to eﬃciently

calculate the eﬀect of dilation and erosion.

In [28] one such distance approximation is

used: the logarithmic distance estimate ρcwhich

uses the logarithmic coordinates ci(8). In short,

ρc(p) is equal to the Riemannian length of the

exponential curve that connects p0to p. The for-

mal deﬁnition will follow in Section 4. In Figure 9

an impression of ρcis given.

Clearly, an error is made when the eﬀect of

erosion and dilation is calculated with an approx-

imative morphological kernel. As a morphological

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Analysisof(sub-)RiemannianPDE-G-CNNsGijsBellaard,DaanL.J.Bon,GautamPai,BartM.N.SmetsandRemcoDuitsDepartmentofMathematicsandComputerScience,CASA,EindhovenUniversityofTechnology,Eindhoven,TheNetherlands.*Correspondingauthor(s).E-mail(s):g.bellaard@tue.nl;Contributingauthors:d.l.j.bon@tue.nl;g.pai@tue....

展开>> 收起<<

Analysis of sub-Riemannian PDE-G-CNNs Gijs Bellaard Daan L. J. Bon Gautam Pai Bart M. N. Smets and Remco Duits Department of Mathematics and Computer Science CASA Eindhoven University of.pdf

共29页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Analysis of sub-Riemannian PDE-G-CNNs Gijs Bellaard Daan L. J. Bon Gautam Pai Bart M. N. Smets and Remco Duits Department of Mathematics and Computer Science CASA Eindhoven University of

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: