Long-Term Localization using Semantic Cues in Floor Plan Maps Nicky Zimmerman Tiziano Guadagnino Xieyuanli Chen Jens Behley Cyrill Stachniss Abstract Lifelong localization in a given map is an essential

2025-05-02 0 0 3.28MB 8 页 10玖币
侵权投诉
Long-Term Localization using Semantic Cues in Floor Plan Maps
Nicky Zimmerman Tiziano Guadagnino Xieyuanli Chen Jens Behley Cyrill Stachniss
Abstract Lifelong localization in a given map is an essential
capability for autonomous service robots. In this paper, we
consider the task of long-term localization in a changing indoor
environment given sparse CAD floor plans. The commonly used
pre-built maps from the robot sensors may increase the cost and
time of deployment. Furthermore, their detailed nature requires
that they are updated when significant changes occur. We
address the difficulty of localization when the correspondence
between the map and the observations is low due to the sparsity
of the CAD map and the changing environment. To overcome
both challenges, we propose to exploit semantic cues that are
commonly present in human-oriented spaces. These semantic
cues can be detected using RGB cameras by utilizing object
detection, and are matched against an easy-to-update, abstract
semantic map. The semantic information is integrated into a
Monte Carlo localization framework using a particle filter that
operates on 2D LiDAR scans and camera data. We provide a
long-term localization solution and a semantic map format, for
environments that undergo changes to their interior structure
and detailed geometric maps are not available. We evaluate
our localization framework on multiple challenging indoor
scenarios in an office environment, taken weeks apart. The
experiments suggest that our approach is robust to structural
changes and can run on an onboard computer. We released the
open source implementation1of our approach written in C++
together with a ROS wrapper.
I. INTRODUCTION
To operate autonomously in indoor environments, such
as factories or offices, mobile robots must be able to de-
termine their pose. For localization in a given map, there
are two challenges: the changing nature of human-occupied
environment and the quality of available maps. Precise,
highly-detailed maps are an accurate representation of the
environment only at the time they were captured, and they
become outdated in the presence of “quasi-static” changes
such as moving furniture, clutter, opening and closing doors.
We describe “quasi-static” changes as long-lasting alterations
(hours, days, weeks) that cause deviation between sensor
observations and the given map, in contrast to dynamics
such as humans and fast-moving objects. The availability of
feature-rich, dense maps is not guaranteed and construction
of such maps can be costly. Therefore, autonomous robots
benefit from localizing in sparse maps such as floor plans
or hand-crafted room layouts as they are seldom affected
by changes. Architectural drawings are familiar to inexpert
users and can be easily updated with CAD software. As they
All authors are with the University of Bonn, Germany. Cyrill Stachniss is
additionally with the Department of Engineering Science at the University
of Oxford, UK.
This work has partially been funded by the European Union’s Hori-
zon 2020 research and innovation programme under grant agreement
No 101017008 (Harmony).
1https://github.com/PRBonn/hsmcl
Fig. 1: Floor plan maps include high degree of symmetry and low
similarity to actual LiDAR measurements. This leads to multiple
hypotheses that cannot be resolved correctly. We propose integrating
semantic cues from a high level, abstract semantic map to assist
with global localization. The red cross indicates the ground truth
pose and the green dots are the particles. Left: 2D LiDAR MCL
with multiple hypotheses. Right: Convergence to a single hypothesis
when exploiting semantic cues, in an abstract semantic maps
including various objects (colored rectangles).
capture persistent structures, they typically do not require
updates. However, using these sparse maps is challenging
due to the paramount discrepancies between the robot’s
observations of the environment and the information depicted
in the maps. Additionally, floor plans lack geometric infor-
mation necessary to localize in a highly repetitive indoor
environment, as can be seen in Fig. 1.
Additional sources of information can be used to overcome
the challenges of global localization, and such cues have been
frequently used by researchers to improve robot localization.
For example, WiFi, an extremely prevalent utility, can aid
in pose estimation by considering the signal strength [14].
Textual information, constantly used by humans to navi-
gate, is readily available in human-occupied environments.
However, very few works consider textual cues for localiza-
tion [7][28][43].
Another avenue is exploiting semantic information. The
last decade was marked by significant advances in object
detection [2][41] and semantic segmentation [12][32], where
semantic cues can be efficiently inferred from images (with
some fine-tuning). The most commonly used map represen-
tation for robotics is an occupancy grid map [24]. However,
human environments tend to be object-centric, and humans
do not require precise metric information in order to navigate
them [21][39]. Rather, humans rely on a small number of
specific landmarks, and associate places with the objects
present there. For this reason, we consider localization in a
sparse, approximate map, that does not require an elaborate
map acquisition process. No work on semantic localization
arXiv:2210.01456v1 [cs.RO] 4 Oct 2022
in sparse maps with abstract and hierarchical semantic infor-
mation exists to our knowledge.
The main contributions of this paper is a global localiza-
tion system in floor plan maps that integrates semantic cues.
We propose to leverage semantic cues to break the symmetry
and distinguish between locations that appear similar or
identical in the nondescript maps. Semantic information is
commonly available in the form of furniture, machinery and
textual cues and can be used to distinguish between spaces
with similar layout. To avoid the complexity of building a
3D map from scans and to enable easy updates to semantic
information, we present a 2D, high level semantic map.
Thus, we present a format for abstract semantic map with
an editing application and a sensor model for semantic infor-
mation that complements LiDAR-based observation models.
Additionally, we provide a way to incorporate hierarchical
semantic information. Unlike most modern semantic-based
SLAM approaches [6][20][31][37][38], our approach does
not require a GPU and can run online on an onboard
computer. In our experiments, we show that our approach
is able to: (i) localize in sparse floor plan-like map with
high symmetry using semantic cues, (ii) localize long-term
without updating the map, (iii) localize in previously unseen
environment. (iv) localize the robot online using an onboard
computer. These claims are backed up by the paper and our
experimental evaluation.
II. RELATED WORK
Localization in 2D maps has been thoroughly re-
searched [5][35][36][40]. Among the most robust and
commonly-used approaches, are the probabilistic methods
for pose estimation, including Markov localization by Fox et
al. [11], the extended Kalman filter (EKF) [16] and particle
filters, also known as Monte Carlo localization (MCL) by
Dellaert et al. [8]. These works laid the foundation for
localization using range sensors and cameras.
Localization in detailed, feature-rich maps, usually con-
structed by range sensors, is extensively-studied [23], but
few works address the problem of localization in sparse,
floor plan-like maps, despite their benefits. Floor plans are
readily-available in many facilities, and therefore do not
depend on prior mapping. As they only include information
on permanent structures, they do not require frequent updates
when objects, such as furniture, are relocated. Their main
drawback comes from their sparse nature, and the lack of
detailed geometric information can results in global local-
ization failures when multiple rooms look alike. Another
concern is the possible mismatch between the floor plans
and the constructed building [3]. Li et al. [17] address the
scale difference between constructed structure and floor plans
by introducing a new state variable. Boniardi et al. [4] uses
cameras to infer the room layout via edge extraction and
match it against the floor plan. In the evaluation, the authors
initialized the pose within 10 cm and 15from the ground-
truth pose, and did not evaluate global localization. We spec-
ulate that edge extraction of the walls is not sufficient in a
highly repetitive indoor environment where many rooms have
the same size. Both approaches provide tracking capabilities,
but not global localization.
Recent works in extracting semantic information with
deep learning models showed significant improvement in
performance for both text spotting [18][33] and object de-
tection [2][41]. The use of textual cues for localization is
surprisingly uncommon, with notable works by Cui et al. [7]
and Zimmerman et al. [43]. Both works considered using
textual information within an MCL framework, but used
different approaches to integrate it. In our approach, we
expand our previous work [43] to consider semantic cues
via object detection, not only textual ones.
The use of semantic information for localization and place
recognition is applied to a variety of sensors, including 2D
and 3D LiDARs, RGB and RGB-D cameras. Rottmann et
al. [30] use AdaBoost features from 2D LiDAR scans to infer
semantic labels such as office, corridor and kitchen. They
combine the semantic information with occupancy grid map
in an MCL framework. Unlike our approach, their method
requires a detailed map and manually assigning a semantic
label to every grid cell. Hendrikx et al. [13] utilize available
building information model to extract both geometric and
semantic information, and localize by matching 2D LiDAR-
based features corresponding to walls, corners and columns.
While the automatic extraction of semantic and geometric
maps from a BIM is promising, the approach is not suitable
for global localization as it cannot overcome the challenges
of a repetitively-structured environment.
Atanasov et al. [1] treat semantic objects as landmarks
that include their 3D pose, semantic label and possible
shape priors. They detect objects using a deformable part
model [9], and use their semantic observation model in an
MCL framework. The results they report do not outperform
LiDAR-based localization. An alternative representation for
semantic information is a constellation model, as suggested
by Ranganathan et al. [29]. In their approach, they use
stereo cameras, exploiting depth information. They rely on
hand-crafted features including SIFT [19] to detect objects.
Places are associated with constellations of objects, where
every object has shape and appearance distribution and a
relative transformation to the base location. Unlike these two
approaches, our approach does not require exact poses for the
semantic objects. A more flexible representation is proposed
by Yi et al. [39], who use topological-semantic graphs to
represent the environment. They extract topological nodes
from an occupancy grid map, and characterize each node by
the semantic objects in its vicinity. It suffers when objects
are far from the camera and can easily diverge when objects
cannot be detected, while our approach is more robust as it
relied additionally on LiDAR observations and textual cues.
Similarly to the above mentioned approaches, we also use
sparse representation for semantic objects. However, by using
deep learning to detect objects, we are able to detect a larger
variety of objects with greater confidence, and localize in
previously unseen places.
S¨
underhauf et al. [34] construct semantic maps from
camera by assigning a place category to each occupancy
摘要:

Long-TermLocalizationusingSemanticCuesinFloorPlanMapsNickyZimmermanTizianoGuadagninoXieyuanliChenJensBehleyCyrillStachnissAbstract—Lifelonglocalizationinagivenmapisanessentialcapabilityforautonomousservicerobots.Inthispaper,weconsiderthetaskoflong-termlocalizationinachangingindoorenvironmentgivenspa...

展开>> 收起<<
Long-Term Localization using Semantic Cues in Floor Plan Maps Nicky Zimmerman Tiziano Guadagnino Xieyuanli Chen Jens Behley Cyrill Stachniss Abstract Lifelong localization in a given map is an essential.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:3.28MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注