SCALE EQUIVARIANT U-N ET Mateus Sangalli Samy Blusseau Santiago Velasco-Forero Jesús Angulo Center for Mathematical Morphology

2025-05-03 0 0 5.67MB 14 页 10玖币
侵权投诉
SCALE EQUIVARIANT U-NET
Mateus Sangalli, Samy Blusseau, Santiago Velasco-Forero, Jesús Angulo
Center for Mathematical Morphology
Mines ParisTech, PSL University
Fontainebleau, France
{mateus.sangalli, sammy.blusseau, santiago.velasco, jesus.angulo}@minesparis.psl.eu
ABSTRACT
In neural networks, the property of being equivariant to transformations improves generalization
when the corresponding symmetry is present in the data. In particular, scale-equivariant networks
are suited to computer vision tasks where the same classes of objects appear at different scales, like
in most semantic segmentation tasks. Recently, convolutional layers equivariant to a semigroup
of scalings and translations have been proposed. However, the equivariance of subsampling and
upsampling has never been explicitly studied even though they are necessary building blocks in
some segmentation architectures. The U-Net is a representative example of such architectures, which
includes the basic elements used for state-of-the-art semantic segmentation. Therefore, this paper
introduces the Scale Equivariant U-Net (SEU-Net), a U-Net that is made approximately equivariant
to a semigroup of scales and translations through careful application of subsampling and upsampling
layers and the use of aforementioned scale-equivariant layers. Moreover, a scale-dropout is proposed
in order to improve generalization to different scales in approximately scale-equivariant architectures.
The proposed SEU-Net is trained for semantic segmentation of the Oxford Pet IIIT and the DIC-
C2DH-HeLa dataset for cell segmentation. The generalization metric to unseen scales is dramatically
improved in comparison to the U-Net, even when the U-Net is trained with scale jittering, and to
a scale-equivariant architecture that does not perform upsampling operators inside the equivariant
pipeline. The scale-dropout induces better generalization on the scale-equivariant models in the Pet
experiment, but not on the cell segmentation experiment.
1 Introduction
Convolutional Neural Networks (CNN) are based on convolutional layers and achieve state-of-the-art performance in
many image analysis tasks. A translation applied to the inputs of a CNN is equivalent to a translation applied to its
features maps, a property illustrated by Figure 1(a). This property is a particular case of group equivariance [
2
] and helps
improve the generalization of the network to new data if the data has translation symmetry. An operator
φ:X → Y
is equivariant w.r.t. a group if applying a group action in the input and then
φ
, amounts applying a group action to
the output of
φ
given the original inputs. This is illustrated in Figure 1. In addition to translations, group actions can
model many interesting classes of spatial transformations such as rotations, scalings, and affine transformations. Group
equivariant CNNs [
2
] are a generalization of CNNs that are equivariant to some transformation group. Many approaches
focus on equivariance to rotations, in different kinds of data [2, 18, 17, 14] and to scalings [20, 3, 7].
Deep scale-spaces [
19
] introduce neural networks equivariant to the action of semigroups, instead of groups. Semigroup
actions are considered as they can model non-invertible transformations, and the authors focus on equivariance to
downsampling in discrete domains as a way to address equivariance to scalings without creating spurious information
through interpolation. This seminal work laid the basis to define scale-equivariant CNNs, although it only focused on
convolutional layers and did not address the equivariance of pooling and upsampling layers, which are key elements in
many neural architectures, such as U-Net.
The U-Net [
10
] has become famous for its great performance in semantic segmentation. It is a fully convolutional
neural network, i.e. a CNN without any dense layer, and therefore it is equivariant to a certain subgroup of translations.
However, architectures like U-Net are not scale equivariant a priori, and experiments show they are not in practice [
11
]
arXiv:2210.04508v1 [stat.ML] 10 Oct 2022
Scale Equivariant U-Net
Tv
φ
φ
Tv
(a) Translation Equivariance
Rs
φ
φ
Rs
(b) Scale Equivariance
Figure 1: Example of equivariance in the cases of translation and scaling. In this case,
φ
is an ideal operator that
computes the semantic segmentation of images. The operators
Tv
and
Rs
are, respectively, a translation and a re-scaling.
(a) Training scale
(b) Unseen scale
Figure 2: Example where a U-Net trained on one scale and is applied to predict an output on the training(a) and an
unseen(b) scale. The image with the unseen scale represents the same object but the U-Net no longer segments it
correctly.
as illustrated by Figure 2. A scale-equivariant counterpart of such an architecture is desirable as scale symmetry is
frequently present in semantic segmentation data. For example, in urban scenes, objects of the same class appear at
different scales depending on their distances to the camera.
In this work we introduce the Scale-Equivariant U-Net (SEU-Net) based on semigroup cross-correlations [
19
] and
an adapted use of pooling and upsampling. The rest of the paper is organized as follows. In Section 2 we discuss
some of the related work in the literature. In Section 3 we review the semigroup equivariant neural networks. The
main contribution of this paper, the SEU-Net, is introduced in Section 4 along with its fundamental building blocks.
The whole architecture is tested empirically for its equivariance in Section 5. More precisely, we test the SEU-Net
1
in segmentation tasks where the test images are in scales unseen during training, on the Oxford-IIIT Pet [
9
] and the
DIC-HeLa cell [
15
] datasets. The SEU-Net is shown to overperform the U-Net even when the latter is training with
large values of scale jittering. The paper ends in Section 6 with some conclusions and perspectives for future work.
2 Related Work
Scale-equivariance and scale-invariance are topics already discussed in the deep learning literature [
20
,
3
,
5
,
7
,
13
].
The experimental benchmarks found in those papers are interesting as a first way to measure equivariance, but tend to
be based on very simple tasks, such as the classification of re-scaled digits from the MNIST dataset or low resolution
images of clothes from the Fashion-MNIST dataset. In [
12
], combinations of base filters are optimized to minimize the
equivariance error of discrete scale convolutions. This is applied to classification, tracking and geometry estimation, but
not segmentation.
In [
19
], instead of treating the scaling as an invertible operation, such as it would behave in a continuous domain, it
is considered the action of downsampling the input image in a discrete domain. Therefore a semigroup-equivariant
generalization of the convolution is introduced. Specifically, the focus is put on a semigroup of scalings and translations.
These operators can be efficiently applied even on large images, since applying it at larger scales has the same
computational cost. In [
19
] the semigroup equivariant models were applied to classification and semantic segmentation
of datasets of large images, achieving better results compared to matched non-equivariant architectures. Yet, the role of
1Code available at https://github.com/mateussangalli/ScaleEquivariantUNet
2
Scale Equivariant U-Net
scale-equivariance was not isolated, as the performance of the models was not measured for inputs on scales unseen
in the training set. Later on, this approach was revisited by [
11
], where the Gaussian scale-space originally used was
generalized to other scale-spaces and the models were tested in experiments where the networks are trained in one
fixed scale and tested on unseen scales, albeit on synthetic or simple datasets. In all these approaches, the authors
either avoided pooling and upsampling in their architectures, or used them but did not discuss their impact on scale
equivariance.
While scale-equivariance has been a topic in the literature for some time, as far as we know a scale-equivariant U-Net
has not yet been proposed, contrary to the rotation-equivariance case [
1
]. Moreover, the current benchmarks for
scale-equivariance were either based on simple datasets like MNIST or did not explicitly measure the equivariance
in their segmentation or classification experiments, by training the networks on one fixed scale and testing on unseen
scales. Here we propose semantic segmentation experiments based on natural data which measure the equivariance of
the predictions.
3 Semigroup Equivariant Convolutional Networks
In this work and following [
19
], image scalings are restricted to image downscalings, which can be viewed as actions
of a semigroup on images. As illustrated by Figure 1, we seek equivariance with respect to both downscalings and
translations. Hence, the network layers are designed to be equivariant with respect to a semigroup combining both
transformations.
3.1 Semigroup Equivariance
A semigroup, contrary to a group, can model non-invertible transformations, e.g. the downsampling operation in a
discrete domain. In the following, (G, ·)denotes a discrete semigroup.
Let
X
be a set, a family of mappings
(ϕg)gG
from
X
to itself, is a semigroup action on
X
if it is homomorphic to the
semigroup, that is, if either
g, h G
,
ϕgϕh=ϕg·h
(left action), or
g, h G
,
ϕgϕh=ϕh·g
(right action). In
this paper we will consider in particular the following right action, acting on Fthe set of functions from Gto Rn.
u, g G, f∈ F, Ru(f)(g) = f(u·g).(1)
Given two sets
X
and
Y
, a mapping
H:X → Y
is said equivariant with respect to
G
if there are semigroup actions
(ϕg)gG
and
(ψg)gG
on
X
and
Y
respectively, such that
gG, H ϕg=ψgH
. This definition gets more
intuitive when
X=Y
is a set of images and
(ϕg)gG= (ψg)gG
are scalings or translations, as illustrated in Figure 1.
3.2 Scale-cross-correlation
For
γ > 1
an integer, let
Sγ={γn|nN}
, endowed with the multiplication, the semigroup representing discrete
scalings of base
γ
. Then we consider the semigroup
G=Sγ×Z2
of discrete scalings and translations, endowed with
the internal operator ·, defined by
k, l N, z, y Z2(γk, z)·(γl, y)=(γk+l, γky+z).(2)
Following
(1)
, the action of this semigroup on functions mapping
Sγ×Z2
to
R
is
Rγk,z[f](γl, y) = f(γk+l, γky+z)
.
In analogy to convolutions, which are linear and equivariant to translations, a key step in equivariant CNNs is defining
linear operators which are equivariant to some class of operators. The semigroup cross-correlation, defined for an image
f:GR
and a filter
h:GR
is a generalization of the convolution which is linear and equivariant to the action
Rg
of a semigroup. When applied to the semigroup of scales and translations, we obtain the scale-cross-correlation.
Both were introduced in [19]. The scale-cross-correlation is written2
(f ?Gh)(γk, z) = X
(γl,y)G
Rγk,z[f](γl, y)h(γl, y) = X
l0X
yZ2
f(γk+l, γky+z)h(γl, y).(3)
This operator is suited for single channel images on
G
, but it can be easily extended to multichannel images. Let the
input f= (f1, . . . , fn)(Rn)Gbe a signal with nchannels. Assuming the output has mchannels, the filter is of the
form
h:GRn×m
. We compute the operator
f ?Gh
at channel
o∈ {1, . . . , m}
as
(f ?Gh)o:=
n
P
c=1
(fc?Ghc,o)
.
The resulting map is equivariant to scalings and translations:
(Rgf ?Gh)o=Rg((f ?Gh)o)
. Note that the composition
of operators which commute with
Rg
still commutes with
Rg
, for which concatenating scale-cross-correlation layers
followed by pointwise activation functions and batch normalization yields equivariant architectures.
2The equations in the case of a general semigroup can be found in Appendix A.
3
摘要:

SCALEEQUIVARIANTU-NETMateusSangalli,SamyBlusseau,SantiagoVelasco-Forero,JesúsAnguloCenterforMathematicalMorphologyMinesParisTech,PSLUniversityFontainebleau,France{mateus.sangalli,sammy.blusseau,santiago.velasco,jesus.angulo}@minesparis.psl.euABSTRACTInneuralnetworks,thepropertyofbeingequivarianttotr...

展开>> 收起<<
SCALE EQUIVARIANT U-N ET Mateus Sangalli Samy Blusseau Santiago Velasco-Forero Jesús Angulo Center for Mathematical Morphology.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:14 页 大小:5.67MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注