SphNet A Spherical Network for Semantic Pointcloud Segmentation Lukas Bernreiter Lionel Ott Roland Siegwart and Cesar Cadena Abstract Semantic segmentation for robotic systems can

2025-05-02 0 0 1.51MB 8 页 10玖币
侵权投诉
SphNet: A Spherical Network for Semantic Pointcloud Segmentation
Lukas Bernreiter, Lionel Ott, Roland Siegwart and Cesar Cadena
Abstract Semantic segmentation for robotic systems can
enable a wide range of applications, from self-driving cars
and augmented reality systems to domestic robots. We argue
that a spherical representation is a natural one for egocentric
pointclouds. Thus, in this work, we present a novel framework
exploiting such a representation of LiDAR pointclouds for the
task of semantic segmentation. Our approach is based on a
spherical convolutional neural network that can seamlessly
handle observations from various sensor systems (e.g., different
LiDAR systems) and provides an accurate segmentation of the
environment. We operate in two distinct stages: First, we encode
the projected input pointclouds to spherical features. Second,
we decode and back-project the spherical features to achieve an
accurate semantic segmentation of the pointcloud. We evaluate
our method with respect to state-of-the-art projection-based
semantic segmentation approaches using well-known public
datasets. We demonstrate that the spherical representation
enables us to provide more accurate segmentation and to have
a better generalization to sensors with different field-of-view
and number of beams than what was seen during training.
I. INTRODUCTION
Over the past years, there has been a growing demand in
robotics and self-driving cars for reliable semantic segmen-
tation of the environment, i.e., associating a class or label
with each measurement sample for a given input modality.
A semantic understanding of the surroundings is a critical
aspect of robot autonomy. It has the potential to, e.g., enable
a comprehensive description of the navigational risks or
disambiguate challenging situations in planning or mapping.
For many of the currently employed robotic systems, the
long-term stability of maps is a pertaining issue due to the
often limited metrical understanding of the environment for
which high-level semantic information is a possible solution.
With the advances in deep learning, vision-based semantic
segmentation frameworks have become a very mature field.
While there has also been significant progress on LiDAR-
based semantic segmentation frameworks, it is still not as
developed as their vision-based counterpart.
Nevertheless, LiDAR-based approaches have certain crucial
advantages over other modalities as they are unaffected by the
illumination conditions of the environment. This is in contrast
to cameras which provide crucial descriptive information but
are heavily affected by poor lighting conditions. Consequently,
LiDAR-based systems effectively provide a more resilient
segmentation system for a variety of challenging scenarios,
This work was supported as a part of NCCR Robotics, a National Centre of
Competence in Research, funded by the Swiss National Science Foundation
(grant number 51NF40 185543).
All authors are with the Autonomous Systems Lab, ETH Zurich,
Zurich 8092, Switzerland,
{berlukas, lioott, rsiegwart,
cesarc}@ethz.ch.
LiDAR 1
LiDAR 2
LiDAR 3
Fig. 1. We propose a spherical semantic segmentation framework that
can handle pointclouds from various LiDAR sensors with different vertical
field-of-view and angular resolutions.
such as operating at night and dynamically changing lighting
conditions.
Many existing approaches operate using projection models,
which typically transform the irregular pointcloud data into an
ordered 2D domain, allowing them to utilize the extensive re-
search available for images. The downside is that this requires
a predefined configuration based on the number of beams,
angular resolution, and vertical Field-of-View (
FoV
). LiDAR
systems differ in these properties, which means that changing
the sensory system after training might yield projections with
geometrical and structural scarcity. Consequently, the resulting
projection is often insufficient to express the complexity of
arbitrary environments accurately. Accordingly, utilizing these
approaches in generic environments with an arbitrary sensor
system is often impossible without refining the initial network
on the data from the new sensor environment.
LiDAR sensors are known to yield accurate geometrical and
structural cues. Thus, modern LiDAR sensors often provide
large
FoV
measurements to precisely measure the robot’s
surroundings. However, the projection onto the 2D domain
of such large
FoV
scans introduces distortions that deform
the physical dimensions the environment. Thus, dealing with
different
FoV
, sensor frequencies, and scales remains an
open research problem for which the input representation
constitutes a major factor. In recent years, LiDAR sensors
have become more affordable and available, becoming abun-
dant in the context of robotics. However, many state-of-the-
art segmentation methods are often limited to a particular
sensor and cannot benefit from multiple LiDARs due to their
arXiv:2210.13992v1 [cs.RO] 24 Oct 2022
representation of the input. For example, systems employing
multiple LiDARs pointing in different directions are typically
processed sequentially or using multiple instances of the
same network, one for each sensor. However, more crucial
insights and structural dependencies in overlapping areas can
be considered by jointly predicting the segmentation using
all available sensors.
In this work, we propose a framework that takes LiDAR
scans as input (cf. Figure 1), projects them onto a sphere, and
utilizes a spherical Convolutional Neural Network (
CNN
)
for the task of semantic segmentation. The projection of
the LiDAR scans onto the sphere does not introduce any
distortions and is independent of the utilized LiDAR, thus,
yielding an agnostic representation for various LiDAR systems
with different vertical
FoV
. We adapt the structure of
common 2D encoder and decoder networks and support
simultaneous training on different datasets obtained with
varying LiDAR sensors and parameters without having to
adapt our configuration. Moreover, since our approach is
invariant to rotations due to the spherical representation, we
support arbitrarily rotated input pointclouds. In summary, the
key contributions of this paper are as follows:
A spherical end-to-end pipeline for semantic segmenta-
tion supporting various input configurations.
A spherical encoder-decoder structure including a spec-
tral pooling and unpooling operation for
SO(3)
signals.
II. RELATED WORK
Methods using a LiDAR have to deal with the inherent
sparsity and irregularity of the data in contrast to vision-based
approaches. Moreover, LiDAR-based methods have various
choices on how to represent the input data [1], including,
directly using the pointcloud [2], [3], voxel-based [4]–[6] or
projection-based [7]–[10] representations. The selection of
the input representation, which yields the best performance
for a specific task, however, still remains an open research
question.
Direct methods such as PointNet [2], [3] operate on the raw
unordered pointcloud and extract local contextual features
using point convolutions [11]. Voxel-based approaches [4], [5],
[12] keep all the geometric understanding of the environment
and can readily accumulate multiple scans either chronologi-
cally or from different sensors. SpSequenceNet [13] explicitly
uses 4D pointclouds and considers the temporal information
between consecutive scans. However, it is evident that the
computational complexity of voxel-based approaches is high
due to their high-dimensional convolutions, and their accuracy
and performance are directly linked to the chosen voxel
size, which resulted in works that organize the pointclouds
into an Octree, Kdtree, etc. [14] for efficiency. Furthermore,
instead of using a cartesian grid, PolarNet [15] discretizes
the space using a polar grid and shows superior quality.
A different direction of research is offered by graph-based
approaches [16] which can seamlessly model the irregular
structure of pointclouds though more experimental directions
in terms of graph building, and network design are still to
be addressed.
Projection-based methods differ from other approaches by
transforming the pointcloud into a specific domain, such as 2D
images, which the majority of projection-based methods [7]–
[10] rely on.
Furthermore, projections to 2D images are appealing as
it enables leveraging all the research in image-based deep
learning but generally need to rely on the limited amount of
labeled pointcloud data. Hence, the work of Wu et al. [17]
tackles the deficiency in labeled pointcloud data by using
domain-adaption between synthetic and real-world data.
The downsides of the projection onto the 2D domain
are: i) the lack of a detailed geometric understanding of
the environment and ii) the large
FoV
of LiDARs, which
produces significant distortions, decreasing the accuracy of
these methods. Hence, recent approaches have explored
using a combination of several representations [6], [18] and
convolutions [19]. Recent works [20], [21] additionally learn
and extract features from a Bird’s Eye View projection that
would otherwise be difficult to retain with a 2D projection.
In contrast to 2D image projections, projecting onto the
sphere is a more suitable representation for such large
FoV
sensors. Recently, spherical
CNN
s [22]–[24] have shown
great potential for, e.g., omnidirectional images [25], [26]
and cortical surfaces [27], [28].
Moreover, Lohit et al. [25] proposes an encoder-decoder
spherical network design that is rotation-invariant by perform-
ing a global average pooling of the encoded feature map.
However, their work discards the rotation information of the
input signals and thus, needs a special loss that includes a
spherical correlation to find the unknown rotation w.r.t. the
ground truth labels.
Considering the findings above, we propose a composition
of spherical
CNN
s, based on the work of Cohen et al. [23],
that semantically segments pointclouds from various LiDAR
sensor configurations.
III. SPHERICAL SEMANTIC SEGMENTATION
This section describes the core modules of our spherical
semantic segmentation framework, which mainly operates in
three stages: i) feature projection, ii) semantic segmentation,
and iii) back-projection (cf. Figure 2).
Initially, we discuss the projection of LiDAR pointclouds
onto the unit sphere and the feature representation that serves
as input to the spherical
CNN
. Next, we describe the details of
our network design and architecture used to learn a semantic
segmentation of LiDAR scans.
A. Sensor Projection and Feature Representation
Initially, the input to our spherical segmentation network
is a signal defined on the sphere
S2=pR3| kpk2= 1
,
with the parametrization as proposed by Healy et al. [29], i.e.
ω(φ, θ) = [cos φsin θ, sin φsin θ, cos θ]>,(1)
where
ωS2
, and
φ[0,2π]
and
θ[0, π]
are azimuthal
and polar angle, respectively.
We then operate in an end-to-end fashion by transforming
the input modality (i.e., the pointcloud scan) into a spherical
摘要:

SphNet:ASphericalNetworkforSemanticPointcloudSegmentationLukasBernreiter,LionelOtt,RolandSiegwartandCesarCadenaAbstract—Semanticsegmentationforroboticsystemscanenableawiderangeofapplications,fromself-drivingcarsandaugmentedrealitysystemstodomesticrobots.Wearguethatasphericalrepresentationisanaturalo...

展开>> 收起<<
SphNet A Spherical Network for Semantic Pointcloud Segmentation Lukas Bernreiter Lionel Ott Roland Siegwart and Cesar Cadena Abstract Semantic segmentation for robotic systems can.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.51MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注