SCALE EQUIVARIANT U-N ET Mateus Sangalli Samy Blusseau Santiago Velasco-Forero Jesús Angulo Center for Mathematical Morphology

2025-05-03 0 0 5.67MB 14 页 10玖币

侵权投诉

SCALE EQUIVARIANT U-NET

Mateus Sangalli, Samy Blusseau, Santiago Velasco-Forero, Jesús Angulo

Center for Mathematical Morphology

Mines ParisTech, PSL University

Fontainebleau, France

{mateus.sangalli, sammy.blusseau, santiago.velasco, jesus.angulo}@minesparis.psl.eu

ABSTRACT

In neural networks, the property of being equivariant to transformations improves generalization

when the corresponding symmetry is present in the data. In particular, scale-equivariant networks

are suited to computer vision tasks where the same classes of objects appear at different scales, like

in most semantic segmentation tasks. Recently, convolutional layers equivariant to a semigroup

of scalings and translations have been proposed. However, the equivariance of subsampling and

upsampling has never been explicitly studied even though they are necessary building blocks in

some segmentation architectures. The U-Net is a representative example of such architectures, which

includes the basic elements used for state-of-the-art semantic segmentation. Therefore, this paper

introduces the Scale Equivariant U-Net (SEU-Net), a U-Net that is made approximately equivariant

to a semigroup of scales and translations through careful application of subsampling and upsampling

layers and the use of aforementioned scale-equivariant layers. Moreover, a scale-dropout is proposed

in order to improve generalization to different scales in approximately scale-equivariant architectures.

The proposed SEU-Net is trained for semantic segmentation of the Oxford Pet IIIT and the DIC-

C2DH-HeLa dataset for cell segmentation. The generalization metric to unseen scales is dramatically

improved in comparison to the U-Net, even when the U-Net is trained with scale jittering, and to

a scale-equivariant architecture that does not perform upsampling operators inside the equivariant

pipeline. The scale-dropout induces better generalization on the scale-equivariant models in the Pet

experiment, but not on the cell segmentation experiment.

1 Introduction

Convolutional Neural Networks (CNN) are based on convolutional layers and achieve state-of-the-art performance in

many image analysis tasks. A translation applied to the inputs of a CNN is equivalent to a translation applied to its

features maps, a property illustrated by Figure 1(a). This property is a particular case of group equivariance [

] and helps

improve the generalization of the network to new data if the data has translation symmetry. An operator

φ:X → Y

is equivariant w.r.t. a group if applying a group action in the input and then

, amounts applying a group action to

the output of

given the original inputs. This is illustrated in Figure 1. In addition to translations, group actions can

model many interesting classes of spatial transformations such as rotations, scalings, and afﬁne transformations. Group

equivariant CNNs [

] are a generalization of CNNs that are equivariant to some transformation group. Many approaches

focus on equivariance to rotations, in different kinds of data [2, 18, 17, 14] and to scalings [20, 3, 7].

Deep scale-spaces [

] introduce neural networks equivariant to the action of semigroups, instead of groups. Semigroup

actions are considered as they can model non-invertible transformations, and the authors focus on equivariance to

downsampling in discrete domains as a way to address equivariance to scalings without creating spurious information

through interpolation. This seminal work laid the basis to deﬁne scale-equivariant CNNs, although it only focused on

convolutional layers and did not address the equivariance of pooling and upsampling layers, which are key elements in

many neural architectures, such as U-Net.

The U-Net [

] has become famous for its great performance in semantic segmentation. It is a fully convolutional

neural network, i.e. a CNN without any dense layer, and therefore it is equivariant to a certain subgroup of translations.

However, architectures like U-Net are not scale equivariant a priori, and experiments show they are not in practice [

]

arXiv:2210.04508v1 [stat.ML] 10 Oct 2022

Scale Equivariant U-Net

(a) Translation Equivariance

(b) Scale Equivariance

Figure 1: Example of equivariance in the cases of translation and scaling. In this case,

is an ideal operator that

computes the semantic segmentation of images. The operators

and

are, respectively, a translation and a re-scaling.

(a) Training scale

(b) Unseen scale

Figure 2: Example where a U-Net trained on one scale and is applied to predict an output on the training(a) and an

unseen(b) scale. The image with the unseen scale represents the same object but the U-Net no longer segments it

correctly.

as illustrated by Figure 2. A scale-equivariant counterpart of such an architecture is desirable as scale symmetry is

frequently present in semantic segmentation data. For example, in urban scenes, objects of the same class appear at

different scales depending on their distances to the camera.

In this work we introduce the Scale-Equivariant U-Net (SEU-Net) based on semigroup cross-correlations [

] and

an adapted use of pooling and upsampling. The rest of the paper is organized as follows. In Section 2 we discuss

some of the related work in the literature. In Section 3 we review the semigroup equivariant neural networks. The

main contribution of this paper, the SEU-Net, is introduced in Section 4 along with its fundamental building blocks.

The whole architecture is tested empirically for its equivariance in Section 5. More precisely, we test the SEU-Net

in segmentation tasks where the test images are in scales unseen during training, on the Oxford-IIIT Pet [

] and the

DIC-HeLa cell [

] datasets. The SEU-Net is shown to overperform the U-Net even when the latter is training with

large values of scale jittering. The paper ends in Section 6 with some conclusions and perspectives for future work.

2 Related Work

Scale-equivariance and scale-invariance are topics already discussed in the deep learning literature [

The experimental benchmarks found in those papers are interesting as a ﬁrst way to measure equivariance, but tend to

be based on very simple tasks, such as the classiﬁcation of re-scaled digits from the MNIST dataset or low resolution

images of clothes from the Fashion-MNIST dataset. In [

], combinations of base ﬁlters are optimized to minimize the

equivariance error of discrete scale convolutions. This is applied to classiﬁcation, tracking and geometry estimation, but

not segmentation.

In [

], instead of treating the scaling as an invertible operation, such as it would behave in a continuous domain, it

is considered the action of downsampling the input image in a discrete domain. Therefore a semigroup-equivariant

generalization of the convolution is introduced. Speciﬁcally, the focus is put on a semigroup of scalings and translations.

These operators can be efﬁciently applied even on large images, since applying it at larger scales has the same

computational cost. In [

] the semigroup equivariant models were applied to classiﬁcation and semantic segmentation

of datasets of large images, achieving better results compared to matched non-equivariant architectures. Yet, the role of

1Code available at https://github.com/mateussangalli/ScaleEquivariantUNet

Scale Equivariant U-Net

scale-equivariance was not isolated, as the performance of the models was not measured for inputs on scales unseen

in the training set. Later on, this approach was revisited by [

], where the Gaussian scale-space originally used was

generalized to other scale-spaces and the models were tested in experiments where the networks are trained in one

ﬁxed scale and tested on unseen scales, albeit on synthetic or simple datasets. In all these approaches, the authors

either avoided pooling and upsampling in their architectures, or used them but did not discuss their impact on scale

equivariance.

While scale-equivariance has been a topic in the literature for some time, as far as we know a scale-equivariant U-Net

has not yet been proposed, contrary to the rotation-equivariance case [

]. Moreover, the current benchmarks for

scale-equivariance were either based on simple datasets like MNIST or did not explicitly measure the equivariance

in their segmentation or classiﬁcation experiments, by training the networks on one ﬁxed scale and testing on unseen

scales. Here we propose semantic segmentation experiments based on natural data which measure the equivariance of

the predictions.

3 Semigroup Equivariant Convolutional Networks

In this work and following [

], image scalings are restricted to image downscalings, which can be viewed as actions

of a semigroup on images. As illustrated by Figure 1, we seek equivariance with respect to both downscalings and

translations. Hence, the network layers are designed to be equivariant with respect to a semigroup combining both

transformations.

3.1 Semigroup Equivariance

A semigroup, contrary to a group, can model non-invertible transformations, e.g. the downsampling operation in a

discrete domain. In the following, (G, ·)denotes a discrete semigroup.

Let

be a set, a family of mappings

(ϕg)g∈G

from

to itself, is a semigroup action on

if it is homomorphic to the

semigroup, that is, if either

∀g, h ∈G

ϕg◦ϕh=ϕg·h

(left action), or

∀g, h ∈G

ϕg◦ϕh=ϕh·g

(right action). In

this paper we will consider in particular the following right action, acting on Fthe set of functions from Gto Rn.

∀u, g ∈G, ∀f∈ F, Ru(f)(g) = f(u·g).(1)

Given two sets

and

, a mapping

H:X → Y

is said equivariant with respect to

if there are semigroup actions

(ϕg)g∈G

and

(ψg)g∈G

and

respectively, such that

∀g∈G, H ◦ϕg=ψg◦H

. This deﬁnition gets more

intuitive when

X=Y

is a set of images and

(ϕg)g∈G= (ψg)g∈G

are scalings or translations, as illustrated in Figure 1.

3.2 Scale-cross-correlation

For

γ > 1

an integer, let

Sγ={γn|n∈N}

, endowed with the multiplication, the semigroup representing discrete

scalings of base

. Then we consider the semigroup

G=Sγ×Z2

of discrete scalings and translations, endowed with

the internal operator “·”, deﬁned by

∀k, l ∈N, z, y ∈Z2(γk, z)·(γl, y)=(γk+l, γky+z).(2)

Following

(1)

, the action of this semigroup on functions mapping

Sγ×Z2

Rγk,z[f](γl, y) = f(γk+l, γky+z)

In analogy to convolutions, which are linear and equivariant to translations, a key step in equivariant CNNs is deﬁning

linear operators which are equivariant to some class of operators. The semigroup cross-correlation, deﬁned for an image

f:G→R

and a ﬁlter

h:G→R

is a generalization of the convolution which is linear and equivariant to the action

of a semigroup. When applied to the semigroup of scales and translations, we obtain the scale-cross-correlation.

Both were introduced in [19]. The scale-cross-correlation is written2

(f ?Gh)(γk, z) = X

(γl,y)∈G

Rγk,z[f](γl, y)h(γl, y) = X

l≥0X

y∈Z2

f(γk+l, γky+z)h(γl, y).(3)

This operator is suited for single channel images on

, but it can be easily extended to multichannel images. Let the

input f= (f1, . . . , fn)∈(Rn)Gbe a signal with nchannels. Assuming the output has mchannels, the ﬁlter is of the

form

h:G→Rn×m

. We compute the operator

f ?Gh

at channel

o∈ {1, . . . , m}

(f ?Gh)o:=

c=1

(fc?Ghc,o)

The resulting map is equivariant to scalings and translations:

(Rgf ?Gh)o=Rg((f ?Gh)o)

. Note that the composition

of operators which commute with

still commutes with

, for which concatenating scale-cross-correlation layers

followed by pointwise activation functions and batch normalization yields equivariant architectures.

2The equations in the case of a general semigroup can be found in Appendix A.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SCALEEQUIVARIANTU-NETMateusSangalli,SamyBlusseau,SantiagoVelasco-Forero,JesúsAnguloCenterforMathematicalMorphologyMinesParisTech,PSLUniversityFontainebleau,France{mateus.sangalli,sammy.blusseau,santiago.velasco,jesus.angulo}@minesparis.psl.euABSTRACTInneuralnetworks,thepropertyofbeingequivarianttotr...

展开>> 收起<<

SCALE EQUIVARIANT U-N ET Mateus Sangalli Samy Blusseau Santiago Velasco-Forero Jesús Angulo Center for Mathematical Morphology.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SCALE EQUIVARIANT U-N ET Mateus Sangalli Samy Blusseau Santiago Velasco-Forero Jesús Angulo Center for Mathematical Morphology

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: