SphNet A Spherical Network for Semantic Pointcloud Segmentation Lukas Bernreiter Lionel Ott Roland Siegwart and Cesar Cadena Abstract Semantic segmentation for robotic systems can

2025-05-02 0 0 1.51MB 8 页 10玖币

侵权投诉

SphNet: A Spherical Network for Semantic Pointcloud Segmentation

Lukas Bernreiter, Lionel Ott, Roland Siegwart and Cesar Cadena

Abstract— Semantic segmentation for robotic systems can

enable a wide range of applications, from self-driving cars

and augmented reality systems to domestic robots. We argue

that a spherical representation is a natural one for egocentric

pointclouds. Thus, in this work, we present a novel framework

exploiting such a representation of LiDAR pointclouds for the

task of semantic segmentation. Our approach is based on a

spherical convolutional neural network that can seamlessly

handle observations from various sensor systems (e.g., different

LiDAR systems) and provides an accurate segmentation of the

environment. We operate in two distinct stages: First, we encode

the projected input pointclouds to spherical features. Second,

we decode and back-project the spherical features to achieve an

accurate semantic segmentation of the pointcloud. We evaluate

our method with respect to state-of-the-art projection-based

semantic segmentation approaches using well-known public

datasets. We demonstrate that the spherical representation

enables us to provide more accurate segmentation and to have

a better generalization to sensors with different ﬁeld-of-view

and number of beams than what was seen during training.

I. INTRODUCTION

Over the past years, there has been a growing demand in

robotics and self-driving cars for reliable semantic segmen-

tation of the environment, i.e., associating a class or label

with each measurement sample for a given input modality.

A semantic understanding of the surroundings is a critical

aspect of robot autonomy. It has the potential to, e.g., enable

a comprehensive description of the navigational risks or

disambiguate challenging situations in planning or mapping.

For many of the currently employed robotic systems, the

long-term stability of maps is a pertaining issue due to the

often limited metrical understanding of the environment for

which high-level semantic information is a possible solution.

With the advances in deep learning, vision-based semantic

segmentation frameworks have become a very mature ﬁeld.

While there has also been signiﬁcant progress on LiDAR-

based semantic segmentation frameworks, it is still not as

developed as their vision-based counterpart.

Nevertheless, LiDAR-based approaches have certain crucial

advantages over other modalities as they are unaffected by the

illumination conditions of the environment. This is in contrast

to cameras which provide crucial descriptive information but

are heavily affected by poor lighting conditions. Consequently,

LiDAR-based systems effectively provide a more resilient

segmentation system for a variety of challenging scenarios,

This work was supported as a part of NCCR Robotics, a National Centre of

Competence in Research, funded by the Swiss National Science Foundation

(grant number 51NF40 185543).

All authors are with the Autonomous Systems Lab, ETH Zurich,

Zurich 8092, Switzerland,

{berlukas, lioott, rsiegwart,

cesarc}@ethz.ch.

LiDAR 1

LiDAR 2

LiDAR 3

Fig. 1. We propose a spherical semantic segmentation framework that

can handle pointclouds from various LiDAR sensors with different vertical

ﬁeld-of-view and angular resolutions.

such as operating at night and dynamically changing lighting

conditions.

Many existing approaches operate using projection models,

which typically transform the irregular pointcloud data into an

ordered 2D domain, allowing them to utilize the extensive re-

search available for images. The downside is that this requires

a predeﬁned conﬁguration based on the number of beams,

angular resolution, and vertical Field-of-View (

FoV

). LiDAR

systems differ in these properties, which means that changing

the sensory system after training might yield projections with

geometrical and structural scarcity. Consequently, the resulting

projection is often insufﬁcient to express the complexity of

arbitrary environments accurately. Accordingly, utilizing these

approaches in generic environments with an arbitrary sensor

system is often impossible without reﬁning the initial network

on the data from the new sensor environment.

LiDAR sensors are known to yield accurate geometrical and

structural cues. Thus, modern LiDAR sensors often provide

large

FoV

measurements to precisely measure the robot’s

surroundings. However, the projection onto the 2D domain

of such large

FoV

scans introduces distortions that deform

the physical dimensions the environment. Thus, dealing with

different

FoV

, sensor frequencies, and scales remains an

open research problem for which the input representation

constitutes a major factor. In recent years, LiDAR sensors

have become more affordable and available, becoming abun-

dant in the context of robotics. However, many state-of-the-

art segmentation methods are often limited to a particular

sensor and cannot beneﬁt from multiple LiDARs due to their

arXiv:2210.13992v1 [cs.RO] 24 Oct 2022

representation of the input. For example, systems employing

multiple LiDARs pointing in different directions are typically

processed sequentially or using multiple instances of the

same network, one for each sensor. However, more crucial

insights and structural dependencies in overlapping areas can

be considered by jointly predicting the segmentation using

all available sensors.

In this work, we propose a framework that takes LiDAR

scans as input (cf. Figure 1), projects them onto a sphere, and

utilizes a spherical Convolutional Neural Network (

CNN

)

for the task of semantic segmentation. The projection of

the LiDAR scans onto the sphere does not introduce any

distortions and is independent of the utilized LiDAR, thus,

yielding an agnostic representation for various LiDAR systems

with different vertical

FoV

. We adapt the structure of

common 2D encoder and decoder networks and support

simultaneous training on different datasets obtained with

varying LiDAR sensors and parameters without having to

adapt our conﬁguration. Moreover, since our approach is

invariant to rotations due to the spherical representation, we

support arbitrarily rotated input pointclouds. In summary, the

key contributions of this paper are as follows:

•

A spherical end-to-end pipeline for semantic segmenta-

tion supporting various input conﬁgurations.

•

A spherical encoder-decoder structure including a spec-

tral pooling and unpooling operation for

SO(3)

signals.

II. RELATED WORK

Methods using a LiDAR have to deal with the inherent

sparsity and irregularity of the data in contrast to vision-based

approaches. Moreover, LiDAR-based methods have various

choices on how to represent the input data [1], including,

directly using the pointcloud [2], [3], voxel-based [4]–[6] or

projection-based [7]–[10] representations. The selection of

the input representation, which yields the best performance

for a speciﬁc task, however, still remains an open research

question.

Direct methods such as PointNet [2], [3] operate on the raw

unordered pointcloud and extract local contextual features

using point convolutions [11]. Voxel-based approaches [4], [5],

[12] keep all the geometric understanding of the environment

and can readily accumulate multiple scans either chronologi-

cally or from different sensors. SpSequenceNet [13] explicitly

uses 4D pointclouds and considers the temporal information

between consecutive scans. However, it is evident that the

computational complexity of voxel-based approaches is high

due to their high-dimensional convolutions, and their accuracy

and performance are directly linked to the chosen voxel

size, which resulted in works that organize the pointclouds

into an Octree, Kdtree, etc. [14] for efﬁciency. Furthermore,

instead of using a cartesian grid, PolarNet [15] discretizes

the space using a polar grid and shows superior quality.

A different direction of research is offered by graph-based

approaches [16] which can seamlessly model the irregular

structure of pointclouds though more experimental directions

in terms of graph building, and network design are still to

be addressed.

Projection-based methods differ from other approaches by

transforming the pointcloud into a speciﬁc domain, such as 2D

images, which the majority of projection-based methods [7]–

[10] rely on.

Furthermore, projections to 2D images are appealing as

it enables leveraging all the research in image-based deep

learning but generally need to rely on the limited amount of

labeled pointcloud data. Hence, the work of Wu et al. [17]

tackles the deﬁciency in labeled pointcloud data by using

domain-adaption between synthetic and real-world data.

The downsides of the projection onto the 2D domain

are: i) the lack of a detailed geometric understanding of

the environment and ii) the large

FoV

of LiDARs, which

produces signiﬁcant distortions, decreasing the accuracy of

these methods. Hence, recent approaches have explored

using a combination of several representations [6], [18] and

convolutions [19]. Recent works [20], [21] additionally learn

and extract features from a Bird’s Eye View projection that

would otherwise be difﬁcult to retain with a 2D projection.

In contrast to 2D image projections, projecting onto the

sphere is a more suitable representation for such large

FoV

sensors. Recently, spherical

CNN

s [22]–[24] have shown

great potential for, e.g., omnidirectional images [25], [26]

and cortical surfaces [27], [28].

Moreover, Lohit et al. [25] proposes an encoder-decoder

spherical network design that is rotation-invariant by perform-

ing a global average pooling of the encoded feature map.

However, their work discards the rotation information of the

input signals and thus, needs a special loss that includes a

spherical correlation to ﬁnd the unknown rotation w.r.t. the

ground truth labels.

Considering the ﬁndings above, we propose a composition

of spherical

CNN

s, based on the work of Cohen et al. [23],

that semantically segments pointclouds from various LiDAR

sensor conﬁgurations.

III. SPHERICAL SEMANTIC SEGMENTATION

This section describes the core modules of our spherical

semantic segmentation framework, which mainly operates in

three stages: i) feature projection, ii) semantic segmentation,

and iii) back-projection (cf. Figure 2).

Initially, we discuss the projection of LiDAR pointclouds

onto the unit sphere and the feature representation that serves

as input to the spherical

CNN

. Next, we describe the details of

our network design and architecture used to learn a semantic

segmentation of LiDAR scans.

A. Sensor Projection and Feature Representation

Initially, the input to our spherical segmentation network

is a signal deﬁned on the sphere

S2=p∈R3| kpk2= 1

with the parametrization as proposed by Healy et al. [29], i.e.

ω(φ, θ) = [cos φsin θ, sin φsin θ, cos θ]>,(1)

where

ω∈S2

, and

φ∈[0,2π]

and

θ∈[0, π]

are azimuthal

and polar angle, respectively.

We then operate in an end-to-end fashion by transforming

the input modality (i.e., the pointcloud scan) into a spherical

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SphNet:ASphericalNetworkforSemanticPointcloudSegmentationLukasBernreiter,LionelOtt,RolandSiegwartandCesarCadenaAbstractSemanticsegmentationforroboticsystemscanenableawiderangeofapplications,fromself-drivingcarsandaugmentedrealitysystemstodomesticrobots.Wearguethatasphericalrepresentationisanaturalo...

展开>> 收起<<

SphNet A Spherical Network for Semantic Pointcloud Segmentation Lukas Bernreiter Lionel Ott Roland Siegwart and Cesar Cadena Abstract Semantic segmentation for robotic systems can.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SphNet A Spherical Network for Semantic Pointcloud Segmentation Lukas Bernreiter Lionel Ott Roland Siegwart and Cesar Cadena Abstract Semantic segmentation for robotic systems can

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: