DeepRING: Learning Roto-translation Invariant Representation for
LiDAR based Place Recognition
Sha Lu1, Xuecheng Xu1, Li Tang2, Rong Xiong1and Yue Wang1†
Abstract— LiDAR based place recognition is popular for loop
closure detection and re-localization. In recent years, deep
learning brings improvements to place recognition by learnable
feature extraction. However, these methods degenerate when the
robot re-visits previous places with large perspective difference.
To address the challenge, we propose DeepRING to learn the
roto-translation invariant representation from LiDAR scan, so
that robot visits the same place with different perspective can
have similar representations. There are two keys in DeepRING:
the feature is extracted from sinogram, and the feature is ag-
gregated by magnitude spectrum. The two steps keeps the final
representation with both discrimination and roto-translation
invariance. Moreover, we state the place recognition as a one-
shot learning problem with each place being a class, leveraging
relation learning to build representation similarity. Substantial
experiments are carried out on public datasets, validating the
effectiveness of each proposed component, and showing that
DeepRING outperforms the comparative methods, especially
in dataset level generalization.
I. INTRODUCTION
Place recognition plays a significant role in autonomous
driving applications. It retrieves the closest place from the
past trajectory of robot for loop closure detection in SLAM
systems to reduce the accumulated error. Because of the ro-
bustness to ever-changing environmental conditions, LiDAR
sensors have been widely used for place recognition in recent
years. As the field of view of LiDAR is wide, the LiDAR
scans can have significant overlap when a robot revisits a
previous place with different perspective. However, there still
remains as a challenge for place recognition when a large
perspective difference presents between the current scan and
the scan taken at the previous place.
A popular way to deal with the challenge first occurs
in handcrafted methods. By explicitly design the rotation
invariant representation, scans taken by robot spinning in
spot can keep the same. Therefore, the perspective difference
visiting the same place can be suppressed, place can be
determined by finding the most similar previous scan to
the current one. Several methods have been proposed for
achieving this property, including histogram, polar gram
(PG), and principal component analysis [1,2,3,4,5,6].
More recently, translation invariance is also derived for the
scan representation to address large perspective difference
[7]. However, the handcrafted feature extraction limits the
*This work was not supported by any organization
1State Key Laboratory of Industrial Control and Technology, and the
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou,
310058, China.
2Alibaba Group, Hangzhou, 310052, China.
†Corresponding author wangyue@iipc.zju.edu.cn.
discrimination of this line of works, most of whom employ
simple features e.g. occupancy.
To improve the features, deep network is employed for
feature learning from data [8,9]. However, their networks
do not inherently yield invariant representations, thus calling
for data augmentation with artificially added rotation and
translation to improve the robustness of scan representation
again perspective change. Motivated by the representation
design in handcrafted methods, in [10,11,12,13], the
histogram, PG, and range image are inserted into neural
networks to explicitly build rotation invariant representations.
Together with deep learning, the discrimination can also be
improved. Nevertheless, the rotation invariance loses when
there is an obvious translation difference between the scans.
It remains unclear how to keep deep features invariant when
both rotation and translation differences present.
To address the problem, we propose a neural network
architecture, named DeepRING, to learn roto-translation
invariant representation for a LiDAR scan by endowing the
deep feature extraction with the RING architecture [7]. As
shown in Fig. 1, we show that by formulating the LiDAR
scan as sinogram (SG), taking it as input to the convolution
network, and calculating the magnitude spectrum of the out-
put, the resultant representation is roto-translation invariant,
and can be learned from data for better discrimination in an
end-to-end manner, bridging the advantages of two lines of
works. Furthermore, we state the place recognition as a one-
shot learning problem. Specifically, each place is regarded as
a class, with the scan taken at that place as a shot, building a
multi-class support set, while the current scan and the place
form a query set. Such statement leverages the relation learn-
ing to replace the popular Siamese network and Euclidean
triplet/quadruplet loss frequently used in place recognition.
The experimental results validate the effectiveness of the
proposed modules, showing that DeepRING outperforms the
comparative methods, especially in data level generalization.
In summary, the main contributions of our method comprise:
•An end-to-end learning framework, DeepRING, to en-
dow scan feature extraction with roto-translation in-
variant property, which tackles the problem of large
perspective difference.
•Statement of place recognition as one-shot learning, to
leverage relation learning for building better similarity.
The efficient implementation saves the computation.
•Validation of the proposed method on two large-scale
benchmark datasets, showing superior performance of
DeepRING, especially for generalization, verifying the
effectiveness of invariance design and one-shot learning.
arXiv:2210.11029v1 [cs.CV] 20 Oct 2022