Floorplan-Aware Camera Poses Reﬁnement Anna Sokolova1 Filipp Nikitin1 Anna V orontsova1 Anton Konushin1 Abstract Processing large indoor scenes is a challenging

2025-04-27 0 0 4.61MB 8 页 10玖币

侵权投诉

Floorplan-Aware Camera Poses Reﬁnement

Anna Sokolova1, Filipp Nikitin1, Anna Vorontsova1, Anton Konushin1

Abstract— Processing large indoor scenes is a challenging

task, as scan registration and camera trajectory estimation

methods accumulate errors across time. As a result, the quality

of reconstructed scans is insufﬁcient for some applications, such

as visual-based localization and navigation, where the correct

position of walls is crucial.

For many indoor scenes, there exists an image of a technical

ﬂoorplan that contains information about the geometry and

main structural elements of the scene, such as walls, partitions,

and doors. We argue that such a ﬂoorplan is a useful source of

spatial information, which can guide a 3D model optimization.

The standard RGB-D 3D reconstruction pipeline consists of

a tracking module applied to an RGB-D sequence and a bundle

adjustment (BA) module that takes the posed RGB-D sequence

and corrects the camera poses to improve consistency. We

propose a novel optimization algorithm expanding conventional

BA that leverages the prior knowledge about the scene structure

in the form of a ﬂoorplan. Our experiments on the Redwood

dataset and our self-captured data demonstrate that utilizing

ﬂoorplan improves accuracy of 3D reconstructions.

I. INTRODUCTION

Restoring general scene structure formed with ﬂoor and

walls is complicated for multiple reasons. First, both ﬂoor

and walls are often textureless or covered with repetitive

patterns, so the keypoints cannot be detected or correctly

matched across different frames. Then, the ﬂoor and walls

might not superimpose after a loop closure in BA due to

the errors accumulated over time. Alternatively, the surfaces

might not match perfectly when aligning partial scans of

large-scale scenes. Either way, multiple duplicate layers

appear, making the overall scan corrupted; we refer to this

unwanted effect as to layering. In addition, each surface

might have hills and pits, worsening the visual impression;

we call it unevenness. Hence, the reconstructed scans come

imperfect and should be additionally optimized.

Overall, no-reference approaches are limited by design, so

a signiﬁcant improvement cannot be achieved without addi-

tional information about the scene. We argue that a technical

ﬂoorplan of a scene is one of the most available, intuitive,

and easy-to-use sources of spatial data. Floorplans reﬂect

the general structure of the scene, so we can use them as

guidance during optimization, comparing the reconstructed

scan with a ﬂoorplan and penalizing their divergence.

Accordingly, we address the following problem: given a

posed RGB-D sequence and a ﬂoorplan, reﬁne camera poses

so that the scan reconstructed using these poses is consistent

with the ﬂoorplan. We assume that we have a ﬂoorplan

image that depicts vertical architectural surfaces comprising

1All authors are with Samsung AI Center, Moscow,

Russia, {a.sokolova, f.nikitin, a.vorontsova,

a.konushin}@samsung.com

Fig. 1: The reconstructed scan before (left) and after (right)

camera poses reﬁnement with a ﬂoorplan guidance. Through

reﬁnement, the misplaced upper right room gets aligned with

the ﬂoorplan, and multiple reconstruction artifacts (marked

with red ellipses) decrease or disappear.

the general scene structure (Fig. 4). The coordinate trans-

formation (scale, shift, and rotation) between a scan and its

ﬂoorplan might be unknown.

Typically, in scan reconstruction, camera poses are esti-

mated roughly and then reﬁned using a bundle adjustment

(BA). We propose a novel optimization algorithm that ex-

pands BA using prior knowledge about the scene structure.

We assume that the ﬂoor surface is planar, and a scene

is bounded with planar walls matching the walls on the

ﬂoorplan. To obtain a scan that satisﬁes these requirements,

we impose additional constraints in BA. Speciﬁcally, we

apply semantic segmentation to select points corresponding

to ﬂoor and walls and penalize ﬂoor unevenness and the

divergence between the walls and the ﬂoorplan.

II. RELATED WORK

We propose a ﬂoorplan-aware camera poses reﬁnement

method which extends BA. We aim to align the scan with the

ﬂoorplan and also improve geometric consistency. Besides,

we rely on semantic segmentation to detect a ﬂoor and walls

in the scan. Therefore, we review existing formulations of

geometric consistency, semantic-based pose reﬁnement, and

ﬂoorplan-aware 3D reconstruction.

A. Geometric Consistency

The reconstructed scan should be geometry consistent, so

scan optimization (known as BA) minimizes the discrepancy

between different measurements. The BA term that reﬂects

geometric inconsistency can be formalized in various ways

depending on the input data, the model of a scene, and

possible applications. One of the most popular geometric

terms is based on reprojection error. However, reprojection-

based functions are not deﬁned everywhere and exhibit sin-

gularities, making the optimization process sensitive to initial

arXiv:2210.04572v1 [cs.CV] 10 Oct 2022

Posed RGB-D set

Walls term

RGB images Camera poses

Bundle

adjustment

Floor term

Geometric

term

Comparing walls

with a floorplan

Floor point cloud

Walls point cloud

Depth maps

Refined camera poses

Floorplan

Semantic maps

Segmentation

Backprojection

w/ depth maps

and camera

poses

Fig. 2: The scheme of the proposed camera poses reﬁnement method. The novel modules and terms are colored turquoise.

conditions and outliers. In alternative BA formulations [7],

[24], the cost function is based on the minimum distance

between the rays of cameras observing the same 3D point.

Other works incorporate depth into the BA cost function [16],

[22], [23], [29]. BA problems are by no means limited with

these formulations. Additional constraints might reﬂect the

scene structure for more complex scene models that include

semantics, planes, geometric primitives, or objects. For in-

stance, CPA-SLAM [15] models a scan with a set of planes

and penalizes the angle between normals of planes observed

from different frames. KDP-SLAM [9] extracts planes from

the fused depth maps, matches these planes iteratively, and

penalizes point-to-plane distances for points in the landmark

planes. In BAD SLAM, the scan is represented as surfels —

oriented 3D disks with visual descriptors; similar to CPA-

SLAM, the angle between surfel normals is minimized.

We do not build a special scene representation to enforce

geometric consistency in our approach. Instead, we penalize

the distance between the matched keypoints backprojected

to 3D space using depth maps. Such point-to-point error

calculated in 3D space increases the robustness of BA and

allows to handle difﬁcult conﬁgurations without incurring the

risks posed by a reprojection-based cost function.

B. Semantic-based Pose Reﬁnement

SLAM methods that estimate and reﬁne camera poses

might leverage semantic information in various ways: from

ignoring matched keypoints with different semantic labels [3]

to more inventive object-based approaches. For instance,

Frost et al. [6] adds a BA term based on the size of

detected objects and proves it to prevent scale drift over a

long trajectory. Other SLAM methods [1], [2], [13], [28]

exploit semantic segmentation to remove or detect potential

moving objects. In our camera reﬁnement approach, we are

interested in detecting structural elements rather than objects.

Speciﬁcally, we need the semantic labels to create ﬂoor and

walls point clouds used in reﬁnement.

C. Floorplan-Aware 3D Reconstruction

The ﬂoorplan can facilitate 3D reconstruction in various

applications. Howard et al. [8] uses a ﬂoorplan-based 3D

model for indoor localization and estimates camera pose by

comparing image features and layout features calculated on

a grid. Wijmans et al. [27] aligns RGB-D panoramas of large

indoor scenes with a ﬂoorplan. Goran et al. [20] utilizes a

ﬂoorplan in the grid-based Rao-Blackwellized particle ﬁlter

and shows that initializing the internal grid with the ﬂoorplan

information allows obtaining a more precise 2D map of an

environment. Contrary to other works, Mielle et al. [17] does

not bind the SLAM map with the ﬂoorplan but matches

the ﬂoorplan onto the SLAM map to complete missing

information and unexplored areas.

Rent3D [14] takes a ﬂoorplan and a set of RGB images

as inputs, estimates camera poses, and backprojects pixels

onto the generated coarse mesh. This approach provides a

non-realistic 3D model with objects projected onto surfaces;

moreover, it is limited to one-room scenes. Plan2Scene [26]

also constructs a 3D model, yet expands to multiple rooms

and generates more realistic surfaces via texture synthesis.

Either way, Rent3D scans lack furniture, and Plan2Scene

replaces scene objects with CAD models. Differently, we

use ﬂoorplan not to build a 3D model resembling the original

scene but to reconstruct an actual scene.

Overall, none of the existing methods address the problem

in the same formulation. Since we cannot compare with

competing approaches, we analyze each component of our

method: we expound the motivation, propose several design

choices for this component, and compare these choices

quantitatively and qualitatively in ablation studies.

III. METHOD

The pipeline of the proposed method is shown in Fig. 2.

Calculating our ﬂoorplan-aware BA cost function requires

additional steps: converting a ﬂoorplan image into a 3D

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Floorplan-AwareCameraPosesRenementAnnaSokolova1,FilippNikitin1,AnnaVorontsova1,AntonKonushin1AbstractProcessinglargeindoorscenesisachallengingtask,asscanregistrationandcameratrajectoryestimationmethodsaccumulateerrorsacrosstime.Asaresult,thequalityofreconstructedscansisinsufcientforsomeapplicatio...

展开>> 收起<<

Floorplan-Aware Camera Poses Reﬁnement Anna Sokolova1 Filipp Nikitin1 Anna V orontsova1 Anton Konushin1 Abstract Processing large indoor scenes is a challenging.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Floorplan-Aware Camera Poses Reﬁnement Anna Sokolova1 Filipp Nikitin1 Anna V orontsova1 Anton Konushin1 Abstract Processing large indoor scenes is a challenging

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: