DeepRING Learning Roto-translation Invariant Representation for LiDAR based Place Recognition Sha Lu1 Xuecheng Xu1 Li Tang2 Rong Xiong1and Yue Wang1y

2025-05-06 0 0 2.41MB 8 页 10玖币

侵权投诉

DeepRING: Learning Roto-translation Invariant Representation for

LiDAR based Place Recognition

Sha Lu1, Xuecheng Xu1, Li Tang2, Rong Xiong1and Yue Wang1†

Abstract— LiDAR based place recognition is popular for loop

closure detection and re-localization. In recent years, deep

learning brings improvements to place recognition by learnable

feature extraction. However, these methods degenerate when the

robot re-visits previous places with large perspective difference.

To address the challenge, we propose DeepRING to learn the

roto-translation invariant representation from LiDAR scan, so

that robot visits the same place with different perspective can

have similar representations. There are two keys in DeepRING:

the feature is extracted from sinogram, and the feature is ag-

gregated by magnitude spectrum. The two steps keeps the ﬁnal

representation with both discrimination and roto-translation

invariance. Moreover, we state the place recognition as a one-

shot learning problem with each place being a class, leveraging

relation learning to build representation similarity. Substantial

experiments are carried out on public datasets, validating the

effectiveness of each proposed component, and showing that

DeepRING outperforms the comparative methods, especially

in dataset level generalization.

I. INTRODUCTION

Place recognition plays a signiﬁcant role in autonomous

driving applications. It retrieves the closest place from the

past trajectory of robot for loop closure detection in SLAM

systems to reduce the accumulated error. Because of the ro-

bustness to ever-changing environmental conditions, LiDAR

sensors have been widely used for place recognition in recent

years. As the ﬁeld of view of LiDAR is wide, the LiDAR

scans can have signiﬁcant overlap when a robot revisits a

previous place with different perspective. However, there still

remains as a challenge for place recognition when a large

perspective difference presents between the current scan and

the scan taken at the previous place.

A popular way to deal with the challenge ﬁrst occurs

in handcrafted methods. By explicitly design the rotation

invariant representation, scans taken by robot spinning in

spot can keep the same. Therefore, the perspective difference

visiting the same place can be suppressed, place can be

determined by ﬁnding the most similar previous scan to

the current one. Several methods have been proposed for

achieving this property, including histogram, polar gram

(PG), and principal component analysis [1,2,3,4,5,6].

More recently, translation invariance is also derived for the

scan representation to address large perspective difference

[7]. However, the handcrafted feature extraction limits the

*This work was not supported by any organization

1State Key Laboratory of Industrial Control and Technology, and the

Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou,

310058, China.

2Alibaba Group, Hangzhou, 310052, China.

†Corresponding author wangyue@iipc.zju.edu.cn.

discrimination of this line of works, most of whom employ

simple features e.g. occupancy.

To improve the features, deep network is employed for

feature learning from data [8,9]. However, their networks

do not inherently yield invariant representations, thus calling

for data augmentation with artiﬁcially added rotation and

translation to improve the robustness of scan representation

again perspective change. Motivated by the representation

design in handcrafted methods, in [10,11,12,13], the

histogram, PG, and range image are inserted into neural

networks to explicitly build rotation invariant representations.

Together with deep learning, the discrimination can also be

improved. Nevertheless, the rotation invariance loses when

there is an obvious translation difference between the scans.

It remains unclear how to keep deep features invariant when

both rotation and translation differences present.

To address the problem, we propose a neural network

architecture, named DeepRING, to learn roto-translation

invariant representation for a LiDAR scan by endowing the

deep feature extraction with the RING architecture [7]. As

shown in Fig. 1, we show that by formulating the LiDAR

scan as sinogram (SG), taking it as input to the convolution

network, and calculating the magnitude spectrum of the out-

put, the resultant representation is roto-translation invariant,

and can be learned from data for better discrimination in an

end-to-end manner, bridging the advantages of two lines of

works. Furthermore, we state the place recognition as a one-

shot learning problem. Speciﬁcally, each place is regarded as

a class, with the scan taken at that place as a shot, building a

multi-class support set, while the current scan and the place

form a query set. Such statement leverages the relation learn-

ing to replace the popular Siamese network and Euclidean

triplet/quadruplet loss frequently used in place recognition.

The experimental results validate the effectiveness of the

proposed modules, showing that DeepRING outperforms the

comparative methods, especially in data level generalization.

In summary, the main contributions of our method comprise:

•An end-to-end learning framework, DeepRING, to en-

dow scan feature extraction with roto-translation in-

variant property, which tackles the problem of large

perspective difference.

•Statement of place recognition as one-shot learning, to

leverage relation learning for building better similarity.

The efﬁcient implementation saves the computation.

•Validation of the proposed method on two large-scale

benchmark datasets, showing superior performance of

DeepRING, especially for generalization, verifying the

effectiveness of invariance design and one-shot learning.

arXiv:2210.11029v1 [cs.CV] 20 Oct 2022

Fig. 1: Overall framework of the proposed method DeepRING.

II. RELATED WORKS

In this section, we embark on related works review in

terms of handcrafted methods and learning-based methods.

In addition, we also introduce one-shot learning in brief.

A. Handcrafted Methods

Fast Histogram [2] leverages the range of 3D points and

encode it into histogram as the global descriptor. M2DP

[14] projects the LiDAR scan into multiple 2D planes and

generate the global descriptor that is robust to rotation change

via PCA. Scan Context series [3,5] utilize an egocentric

spatial descriptor encoded by the maximum height of points.

Likewise, LiDAR-Iris [4] extracts the LiDAR-Iris binary

image and transforms it into frequency domain to achieve

rotation invariance. RING [7] proposes sinogram to represent

point clouds for both place recognition and pose estimation.

In these methods, the similarity computation generally calls

for an exhaustive search. Moreover, the feature extraction in

these methods limits the discrimination of the representation.

B. Learning-based Methods

PointNetVLAD [8] extracts features from the raw 3D point

cloud with PointNet [15] and aggregates them to global

descriptor with NetVLAD [16]. LPD-Net [9] proposes a

graph-based neighborhood module to aggregate the extracted

adaptive local features, which enhances the place description

of the global descriptor especially for large-scale environ-

ments. Locus [17] aggregates the topological relationship

with temporal information of point clouds to improve the

place description ability. Expect for global descriptors, some

works [18,19,20] learn both global and local features of

LiDAR scans to address a 6DoF localization problem. With-

out explicit invariant representation, these methods achieve

robustness against perspective difference by data augmenta-

tion. LocNet [10] encodes the range histogram as ﬁngerprint

and learns a semi-learning neural network to achieve rotation

invariance. OverlapNet [12] exploits multiple cues from

scans and predicts the overlap together with the relative yaw

angle between two scans using a Siamese network. DiSCO

[11] converts the point cloud in the cylindrical coordinate,

extracts features using a encoder-decoder network and trans-

forms these features to frequency domain to reach rotation

invariance. RINet [13] further exploits the stage of inserting

feature extraction to keep the rotation invariance. However,

the rotation invariance in these methods are sensitive to

translation difference. A larger translation may degenerate

their performances.

C. One-shot Learning

With the progress of deep learning in the past decades,

learning based methods have presented excellent perfor-

mance. A typical scenario is that the class in test phase are all

seen in training phase, and in each class there are lots of sam-

ples. However, this is not true in tasks like face recognition,

or place recognition, where each class only has few samples,

and the class in test phase are all unseen. To deal with this

problem, one-shot learning is proposed, a typical method is

distance metric learning based via ”learning to compare with

distance metrics” [21]. For instance, Matching Network [22]

employs cosine similarity as the distance metric. Prototypical

Network [23] utilizes Euclidean distance in the embedding

space to compare different classes. Relation Network [24]

designs a CNN to learn a distance metric to compare the

relation of images. Inspired by the works in this direction,

we state the place recognition problem as one-shot learning

to leverage fruitful relation learning methods to build the

similarity between scans.

III. METHODOLOGY

We propose a one-shot learning framework based on sino-

gram (SG) representation, named DeepRING, to construct

roto-translation invariance for robust place recognition.

A. Rotation Equivariant Representation

Sinogram: In this subsection, we convert a LiDAR scan

to a sinogram representation which is visualized in Fig. 1.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeepRING:LearningRoto-translationInvariantRepresentationforLiDARbasedPlaceRecognitionShaLu1,XuechengXu1,LiTang2,RongXiong1andYueWang1yAbstractLiDARbasedplacerecognitionispopularforloopclosuredetectionandre-localization.Inrecentyears,deeplearningbringsimprovementstoplacerecognitionbylearnablefeature...

展开>> 收起<<

DeepRING Learning Roto-translation Invariant Representation for LiDAR based Place Recognition Sha Lu1 Xuecheng Xu1 Li Tang2 Rong Xiong1and Yue Wang1y.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DeepRING Learning Roto-translation Invariant Representation for LiDAR based Place Recognition Sha Lu1 Xuecheng Xu1 Li Tang2 Rong Xiong1and Yue Wang1y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: