Long-Term Localization using Semantic Cues in Floor Plan Maps Nicky Zimmerman Tiziano Guadagnino Xieyuanli Chen Jens Behley Cyrill Stachniss Abstract Lifelong localization in a given map is an essential

2025-05-02 0 0 3.28MB 8 页 10玖币

侵权投诉

Long-Term Localization using Semantic Cues in Floor Plan Maps

Nicky Zimmerman Tiziano Guadagnino Xieyuanli Chen Jens Behley Cyrill Stachniss

Abstract— Lifelong localization in a given map is an essential

capability for autonomous service robots. In this paper, we

consider the task of long-term localization in a changing indoor

environment given sparse CAD ﬂoor plans. The commonly used

pre-built maps from the robot sensors may increase the cost and

time of deployment. Furthermore, their detailed nature requires

that they are updated when signiﬁcant changes occur. We

address the difﬁculty of localization when the correspondence

between the map and the observations is low due to the sparsity

of the CAD map and the changing environment. To overcome

both challenges, we propose to exploit semantic cues that are

commonly present in human-oriented spaces. These semantic

cues can be detected using RGB cameras by utilizing object

detection, and are matched against an easy-to-update, abstract

semantic map. The semantic information is integrated into a

Monte Carlo localization framework using a particle ﬁlter that

operates on 2D LiDAR scans and camera data. We provide a

long-term localization solution and a semantic map format, for

environments that undergo changes to their interior structure

and detailed geometric maps are not available. We evaluate

our localization framework on multiple challenging indoor

scenarios in an ofﬁce environment, taken weeks apart. The

experiments suggest that our approach is robust to structural

changes and can run on an onboard computer. We released the

open source implementation1of our approach written in C++

together with a ROS wrapper.

I. INTRODUCTION

To operate autonomously in indoor environments, such

as factories or ofﬁces, mobile robots must be able to de-

termine their pose. For localization in a given map, there

are two challenges: the changing nature of human-occupied

environment and the quality of available maps. Precise,

highly-detailed maps are an accurate representation of the

environment only at the time they were captured, and they

become outdated in the presence of “quasi-static” changes

such as moving furniture, clutter, opening and closing doors.

We describe “quasi-static” changes as long-lasting alterations

(hours, days, weeks) that cause deviation between sensor

observations and the given map, in contrast to dynamics

such as humans and fast-moving objects. The availability of

feature-rich, dense maps is not guaranteed and construction

of such maps can be costly. Therefore, autonomous robots

beneﬁt from localizing in sparse maps such as ﬂoor plans

or hand-crafted room layouts as they are seldom affected

by changes. Architectural drawings are familiar to inexpert

users and can be easily updated with CAD software. As they

All authors are with the University of Bonn, Germany. Cyrill Stachniss is

additionally with the Department of Engineering Science at the University

of Oxford, UK.

This work has partially been funded by the European Union’s Hori-

zon 2020 research and innovation programme under grant agreement

No 101017008 (Harmony).

1https://github.com/PRBonn/hsmcl

Fig. 1: Floor plan maps include high degree of symmetry and low

similarity to actual LiDAR measurements. This leads to multiple

hypotheses that cannot be resolved correctly. We propose integrating

semantic cues from a high level, abstract semantic map to assist

with global localization. The red cross indicates the ground truth

pose and the green dots are the particles. Left: 2D LiDAR MCL

with multiple hypotheses. Right: Convergence to a single hypothesis

when exploiting semantic cues, in an abstract semantic maps

including various objects (colored rectangles).

capture persistent structures, they typically do not require

updates. However, using these sparse maps is challenging

due to the paramount discrepancies between the robot’s

observations of the environment and the information depicted

in the maps. Additionally, ﬂoor plans lack geometric infor-

mation necessary to localize in a highly repetitive indoor

environment, as can be seen in Fig. 1.

Additional sources of information can be used to overcome

the challenges of global localization, and such cues have been

frequently used by researchers to improve robot localization.

For example, WiFi, an extremely prevalent utility, can aid

in pose estimation by considering the signal strength [14].

Textual information, constantly used by humans to navi-

gate, is readily available in human-occupied environments.

However, very few works consider textual cues for localiza-

tion [7][28][43].

Another avenue is exploiting semantic information. The

last decade was marked by signiﬁcant advances in object

detection [2][41] and semantic segmentation [12][32], where

semantic cues can be efﬁciently inferred from images (with

some ﬁne-tuning). The most commonly used map represen-

tation for robotics is an occupancy grid map [24]. However,

human environments tend to be object-centric, and humans

do not require precise metric information in order to navigate

them [21][39]. Rather, humans rely on a small number of

speciﬁc landmarks, and associate places with the objects

present there. For this reason, we consider localization in a

sparse, approximate map, that does not require an elaborate

map acquisition process. No work on semantic localization

arXiv:2210.01456v1 [cs.RO] 4 Oct 2022

in sparse maps with abstract and hierarchical semantic infor-

mation exists to our knowledge.

The main contributions of this paper is a global localiza-

tion system in ﬂoor plan maps that integrates semantic cues.

We propose to leverage semantic cues to break the symmetry

and distinguish between locations that appear similar or

identical in the nondescript maps. Semantic information is

commonly available in the form of furniture, machinery and

textual cues and can be used to distinguish between spaces

with similar layout. To avoid the complexity of building a

3D map from scans and to enable easy updates to semantic

information, we present a 2D, high level semantic map.

Thus, we present a format for abstract semantic map with

an editing application and a sensor model for semantic infor-

mation that complements LiDAR-based observation models.

Additionally, we provide a way to incorporate hierarchical

semantic information. Unlike most modern semantic-based

SLAM approaches [6][20][31][37][38], our approach does

not require a GPU and can run online on an onboard

computer. In our experiments, we show that our approach

is able to: (i) localize in sparse ﬂoor plan-like map with

high symmetry using semantic cues, (ii) localize long-term

without updating the map, (iii) localize in previously unseen

environment. (iv) localize the robot online using an onboard

computer. These claims are backed up by the paper and our

experimental evaluation.

II. RELATED WORK

Localization in 2D maps has been thoroughly re-

searched [5][35][36][40]. Among the most robust and

commonly-used approaches, are the probabilistic methods

for pose estimation, including Markov localization by Fox et

al. [11], the extended Kalman ﬁlter (EKF) [16] and particle

ﬁlters, also known as Monte Carlo localization (MCL) by

Dellaert et al. [8]. These works laid the foundation for

localization using range sensors and cameras.

Localization in detailed, feature-rich maps, usually con-

structed by range sensors, is extensively-studied [23], but

few works address the problem of localization in sparse,

ﬂoor plan-like maps, despite their beneﬁts. Floor plans are

readily-available in many facilities, and therefore do not

depend on prior mapping. As they only include information

on permanent structures, they do not require frequent updates

when objects, such as furniture, are relocated. Their main

drawback comes from their sparse nature, and the lack of

detailed geometric information can results in global local-

ization failures when multiple rooms look alike. Another

concern is the possible mismatch between the ﬂoor plans

and the constructed building [3]. Li et al. [17] address the

scale difference between constructed structure and ﬂoor plans

by introducing a new state variable. Boniardi et al. [4] uses

cameras to infer the room layout via edge extraction and

match it against the ﬂoor plan. In the evaluation, the authors

initialized the pose within 10 cm and 15◦from the ground-

truth pose, and did not evaluate global localization. We spec-

ulate that edge extraction of the walls is not sufﬁcient in a

highly repetitive indoor environment where many rooms have

the same size. Both approaches provide tracking capabilities,

but not global localization.

Recent works in extracting semantic information with

deep learning models showed signiﬁcant improvement in

performance for both text spotting [18][33] and object de-

tection [2][41]. The use of textual cues for localization is

surprisingly uncommon, with notable works by Cui et al. [7]

and Zimmerman et al. [43]. Both works considered using

textual information within an MCL framework, but used

different approaches to integrate it. In our approach, we

expand our previous work [43] to consider semantic cues

via object detection, not only textual ones.

The use of semantic information for localization and place

recognition is applied to a variety of sensors, including 2D

and 3D LiDARs, RGB and RGB-D cameras. Rottmann et

al. [30] use AdaBoost features from 2D LiDAR scans to infer

semantic labels such as ofﬁce, corridor and kitchen. They

combine the semantic information with occupancy grid map

in an MCL framework. Unlike our approach, their method

requires a detailed map and manually assigning a semantic

label to every grid cell. Hendrikx et al. [13] utilize available

building information model to extract both geometric and

semantic information, and localize by matching 2D LiDAR-

based features corresponding to walls, corners and columns.

While the automatic extraction of semantic and geometric

maps from a BIM is promising, the approach is not suitable

for global localization as it cannot overcome the challenges

of a repetitively-structured environment.

Atanasov et al. [1] treat semantic objects as landmarks

that include their 3D pose, semantic label and possible

shape priors. They detect objects using a deformable part

model [9], and use their semantic observation model in an

MCL framework. The results they report do not outperform

LiDAR-based localization. An alternative representation for

semantic information is a constellation model, as suggested

by Ranganathan et al. [29]. In their approach, they use

stereo cameras, exploiting depth information. They rely on

hand-crafted features including SIFT [19] to detect objects.

Places are associated with constellations of objects, where

every object has shape and appearance distribution and a

relative transformation to the base location. Unlike these two

approaches, our approach does not require exact poses for the

semantic objects. A more ﬂexible representation is proposed

by Yi et al. [39], who use topological-semantic graphs to

represent the environment. They extract topological nodes

from an occupancy grid map, and characterize each node by

the semantic objects in its vicinity. It suffers when objects

are far from the camera and can easily diverge when objects

cannot be detected, while our approach is more robust as it

relied additionally on LiDAR observations and textual cues.

Similarly to the above mentioned approaches, we also use

sparse representation for semantic objects. However, by using

deep learning to detect objects, we are able to detect a larger

variety of objects with greater conﬁdence, and localize in

previously unseen places.

S¨

underhauf et al. [34] construct semantic maps from

camera by assigning a place category to each occupancy

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Long-TermLocalizationusingSemanticCuesinFloorPlanMapsNickyZimmermanTizianoGuadagninoXieyuanliChenJensBehleyCyrillStachnissAbstractLifelonglocalizationinagivenmapisanessentialcapabilityforautonomousservicerobots.Inthispaper,weconsiderthetaskoflong-termlocalizationinachangingindoorenvironmentgivenspa...

展开>> 收起<<

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Long-Term Localization using Semantic Cues in Floor Plan Maps Nicky Zimmerman Tiziano Guadagnino Xieyuanli Chen Jens Behley Cyrill Stachniss Abstract Lifelong localization in a given map is an essential

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: