DeepRING Learning Roto-translation Invariant Representation for LiDAR based Place Recognition Sha Lu1 Xuecheng Xu1 Li Tang2 Rong Xiong1and Yue Wang1y

2025-05-06 0 0 2.41MB 8 页 10玖币
侵权投诉
DeepRING: Learning Roto-translation Invariant Representation for
LiDAR based Place Recognition
Sha Lu1, Xuecheng Xu1, Li Tang2, Rong Xiong1and Yue Wang1
Abstract LiDAR based place recognition is popular for loop
closure detection and re-localization. In recent years, deep
learning brings improvements to place recognition by learnable
feature extraction. However, these methods degenerate when the
robot re-visits previous places with large perspective difference.
To address the challenge, we propose DeepRING to learn the
roto-translation invariant representation from LiDAR scan, so
that robot visits the same place with different perspective can
have similar representations. There are two keys in DeepRING:
the feature is extracted from sinogram, and the feature is ag-
gregated by magnitude spectrum. The two steps keeps the final
representation with both discrimination and roto-translation
invariance. Moreover, we state the place recognition as a one-
shot learning problem with each place being a class, leveraging
relation learning to build representation similarity. Substantial
experiments are carried out on public datasets, validating the
effectiveness of each proposed component, and showing that
DeepRING outperforms the comparative methods, especially
in dataset level generalization.
I. INTRODUCTION
Place recognition plays a significant role in autonomous
driving applications. It retrieves the closest place from the
past trajectory of robot for loop closure detection in SLAM
systems to reduce the accumulated error. Because of the ro-
bustness to ever-changing environmental conditions, LiDAR
sensors have been widely used for place recognition in recent
years. As the field of view of LiDAR is wide, the LiDAR
scans can have significant overlap when a robot revisits a
previous place with different perspective. However, there still
remains as a challenge for place recognition when a large
perspective difference presents between the current scan and
the scan taken at the previous place.
A popular way to deal with the challenge first occurs
in handcrafted methods. By explicitly design the rotation
invariant representation, scans taken by robot spinning in
spot can keep the same. Therefore, the perspective difference
visiting the same place can be suppressed, place can be
determined by finding the most similar previous scan to
the current one. Several methods have been proposed for
achieving this property, including histogram, polar gram
(PG), and principal component analysis [1,2,3,4,5,6].
More recently, translation invariance is also derived for the
scan representation to address large perspective difference
[7]. However, the handcrafted feature extraction limits the
*This work was not supported by any organization
1State Key Laboratory of Industrial Control and Technology, and the
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou,
310058, China.
2Alibaba Group, Hangzhou, 310052, China.
Corresponding author wangyue@iipc.zju.edu.cn.
discrimination of this line of works, most of whom employ
simple features e.g. occupancy.
To improve the features, deep network is employed for
feature learning from data [8,9]. However, their networks
do not inherently yield invariant representations, thus calling
for data augmentation with artificially added rotation and
translation to improve the robustness of scan representation
again perspective change. Motivated by the representation
design in handcrafted methods, in [10,11,12,13], the
histogram, PG, and range image are inserted into neural
networks to explicitly build rotation invariant representations.
Together with deep learning, the discrimination can also be
improved. Nevertheless, the rotation invariance loses when
there is an obvious translation difference between the scans.
It remains unclear how to keep deep features invariant when
both rotation and translation differences present.
To address the problem, we propose a neural network
architecture, named DeepRING, to learn roto-translation
invariant representation for a LiDAR scan by endowing the
deep feature extraction with the RING architecture [7]. As
shown in Fig. 1, we show that by formulating the LiDAR
scan as sinogram (SG), taking it as input to the convolution
network, and calculating the magnitude spectrum of the out-
put, the resultant representation is roto-translation invariant,
and can be learned from data for better discrimination in an
end-to-end manner, bridging the advantages of two lines of
works. Furthermore, we state the place recognition as a one-
shot learning problem. Specifically, each place is regarded as
a class, with the scan taken at that place as a shot, building a
multi-class support set, while the current scan and the place
form a query set. Such statement leverages the relation learn-
ing to replace the popular Siamese network and Euclidean
triplet/quadruplet loss frequently used in place recognition.
The experimental results validate the effectiveness of the
proposed modules, showing that DeepRING outperforms the
comparative methods, especially in data level generalization.
In summary, the main contributions of our method comprise:
An end-to-end learning framework, DeepRING, to en-
dow scan feature extraction with roto-translation in-
variant property, which tackles the problem of large
perspective difference.
Statement of place recognition as one-shot learning, to
leverage relation learning for building better similarity.
The efficient implementation saves the computation.
Validation of the proposed method on two large-scale
benchmark datasets, showing superior performance of
DeepRING, especially for generalization, verifying the
effectiveness of invariance design and one-shot learning.
arXiv:2210.11029v1 [cs.CV] 20 Oct 2022
Fig. 1: Overall framework of the proposed method DeepRING.
II. RELATED WORKS
In this section, we embark on related works review in
terms of handcrafted methods and learning-based methods.
In addition, we also introduce one-shot learning in brief.
A. Handcrafted Methods
Fast Histogram [2] leverages the range of 3D points and
encode it into histogram as the global descriptor. M2DP
[14] projects the LiDAR scan into multiple 2D planes and
generate the global descriptor that is robust to rotation change
via PCA. Scan Context series [3,5] utilize an egocentric
spatial descriptor encoded by the maximum height of points.
Likewise, LiDAR-Iris [4] extracts the LiDAR-Iris binary
image and transforms it into frequency domain to achieve
rotation invariance. RING [7] proposes sinogram to represent
point clouds for both place recognition and pose estimation.
In these methods, the similarity computation generally calls
for an exhaustive search. Moreover, the feature extraction in
these methods limits the discrimination of the representation.
B. Learning-based Methods
PointNetVLAD [8] extracts features from the raw 3D point
cloud with PointNet [15] and aggregates them to global
descriptor with NetVLAD [16]. LPD-Net [9] proposes a
graph-based neighborhood module to aggregate the extracted
adaptive local features, which enhances the place description
of the global descriptor especially for large-scale environ-
ments. Locus [17] aggregates the topological relationship
with temporal information of point clouds to improve the
place description ability. Expect for global descriptors, some
works [18,19,20] learn both global and local features of
LiDAR scans to address a 6DoF localization problem. With-
out explicit invariant representation, these methods achieve
robustness against perspective difference by data augmenta-
tion. LocNet [10] encodes the range histogram as fingerprint
and learns a semi-learning neural network to achieve rotation
invariance. OverlapNet [12] exploits multiple cues from
scans and predicts the overlap together with the relative yaw
angle between two scans using a Siamese network. DiSCO
[11] converts the point cloud in the cylindrical coordinate,
extracts features using a encoder-decoder network and trans-
forms these features to frequency domain to reach rotation
invariance. RINet [13] further exploits the stage of inserting
feature extraction to keep the rotation invariance. However,
the rotation invariance in these methods are sensitive to
translation difference. A larger translation may degenerate
their performances.
C. One-shot Learning
With the progress of deep learning in the past decades,
learning based methods have presented excellent perfor-
mance. A typical scenario is that the class in test phase are all
seen in training phase, and in each class there are lots of sam-
ples. However, this is not true in tasks like face recognition,
or place recognition, where each class only has few samples,
and the class in test phase are all unseen. To deal with this
problem, one-shot learning is proposed, a typical method is
distance metric learning based via ”learning to compare with
distance metrics” [21]. For instance, Matching Network [22]
employs cosine similarity as the distance metric. Prototypical
Network [23] utilizes Euclidean distance in the embedding
space to compare different classes. Relation Network [24]
designs a CNN to learn a distance metric to compare the
relation of images. Inspired by the works in this direction,
we state the place recognition problem as one-shot learning
to leverage fruitful relation learning methods to build the
similarity between scans.
III. METHODOLOGY
We propose a one-shot learning framework based on sino-
gram (SG) representation, named DeepRING, to construct
roto-translation invariance for robust place recognition.
A. Rotation Equivariant Representation
Sinogram: In this subsection, we convert a LiDAR scan
to a sinogram representation which is visualized in Fig. 1.
摘要:

DeepRING:LearningRoto-translationInvariantRepresentationforLiDARbasedPlaceRecognitionShaLu1,XuechengXu1,LiTang2,RongXiong1andYueWang1yAbstract—LiDARbasedplacerecognitionispopularforloopclosuredetectionandre-localization.Inrecentyears,deeplearningbringsimprovementstoplacerecognitionbylearnablefeature...

展开>> 收起<<
DeepRING Learning Roto-translation Invariant Representation for LiDAR based Place Recognition Sha Lu1 Xuecheng Xu1 Li Tang2 Rong Xiong1and Yue Wang1y.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:2.41MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注