Floorplan-Aware Camera Poses Refinement Anna Sokolova1 Filipp Nikitin1 Anna V orontsova1 Anton Konushin1 Abstract Processing large indoor scenes is a challenging

2025-04-27 0 0 4.61MB 8 页 10玖币
侵权投诉
Floorplan-Aware Camera Poses Refinement
Anna Sokolova1, Filipp Nikitin1, Anna Vorontsova1, Anton Konushin1
Abstract Processing large indoor scenes is a challenging
task, as scan registration and camera trajectory estimation
methods accumulate errors across time. As a result, the quality
of reconstructed scans is insufficient for some applications, such
as visual-based localization and navigation, where the correct
position of walls is crucial.
For many indoor scenes, there exists an image of a technical
floorplan that contains information about the geometry and
main structural elements of the scene, such as walls, partitions,
and doors. We argue that such a floorplan is a useful source of
spatial information, which can guide a 3D model optimization.
The standard RGB-D 3D reconstruction pipeline consists of
a tracking module applied to an RGB-D sequence and a bundle
adjustment (BA) module that takes the posed RGB-D sequence
and corrects the camera poses to improve consistency. We
propose a novel optimization algorithm expanding conventional
BA that leverages the prior knowledge about the scene structure
in the form of a floorplan. Our experiments on the Redwood
dataset and our self-captured data demonstrate that utilizing
floorplan improves accuracy of 3D reconstructions.
I. INTRODUCTION
Restoring general scene structure formed with floor and
walls is complicated for multiple reasons. First, both floor
and walls are often textureless or covered with repetitive
patterns, so the keypoints cannot be detected or correctly
matched across different frames. Then, the floor and walls
might not superimpose after a loop closure in BA due to
the errors accumulated over time. Alternatively, the surfaces
might not match perfectly when aligning partial scans of
large-scale scenes. Either way, multiple duplicate layers
appear, making the overall scan corrupted; we refer to this
unwanted effect as to layering. In addition, each surface
might have hills and pits, worsening the visual impression;
we call it unevenness. Hence, the reconstructed scans come
imperfect and should be additionally optimized.
Overall, no-reference approaches are limited by design, so
a significant improvement cannot be achieved without addi-
tional information about the scene. We argue that a technical
floorplan of a scene is one of the most available, intuitive,
and easy-to-use sources of spatial data. Floorplans reflect
the general structure of the scene, so we can use them as
guidance during optimization, comparing the reconstructed
scan with a floorplan and penalizing their divergence.
Accordingly, we address the following problem: given a
posed RGB-D sequence and a floorplan, refine camera poses
so that the scan reconstructed using these poses is consistent
with the floorplan. We assume that we have a floorplan
image that depicts vertical architectural surfaces comprising
1All authors are with Samsung AI Center, Moscow,
Russia, {a.sokolova, f.nikitin, a.vorontsova,
a.konushin}@samsung.com
Fig. 1: The reconstructed scan before (left) and after (right)
camera poses refinement with a floorplan guidance. Through
refinement, the misplaced upper right room gets aligned with
the floorplan, and multiple reconstruction artifacts (marked
with red ellipses) decrease or disappear.
the general scene structure (Fig. 4). The coordinate trans-
formation (scale, shift, and rotation) between a scan and its
floorplan might be unknown.
Typically, in scan reconstruction, camera poses are esti-
mated roughly and then refined using a bundle adjustment
(BA). We propose a novel optimization algorithm that ex-
pands BA using prior knowledge about the scene structure.
We assume that the floor surface is planar, and a scene
is bounded with planar walls matching the walls on the
floorplan. To obtain a scan that satisfies these requirements,
we impose additional constraints in BA. Specifically, we
apply semantic segmentation to select points corresponding
to floor and walls and penalize floor unevenness and the
divergence between the walls and the floorplan.
II. RELATED WORK
We propose a floorplan-aware camera poses refinement
method which extends BA. We aim to align the scan with the
floorplan and also improve geometric consistency. Besides,
we rely on semantic segmentation to detect a floor and walls
in the scan. Therefore, we review existing formulations of
geometric consistency, semantic-based pose refinement, and
floorplan-aware 3D reconstruction.
A. Geometric Consistency
The reconstructed scan should be geometry consistent, so
scan optimization (known as BA) minimizes the discrepancy
between different measurements. The BA term that reflects
geometric inconsistency can be formalized in various ways
depending on the input data, the model of a scene, and
possible applications. One of the most popular geometric
terms is based on reprojection error. However, reprojection-
based functions are not defined everywhere and exhibit sin-
gularities, making the optimization process sensitive to initial
arXiv:2210.04572v1 [cs.CV] 10 Oct 2022
Posed RGB-D set
Walls term
RGB images Camera poses
Bundle
adjustment
Floor term
Geometric
term
Comparing walls
with a floorplan
Floor point cloud
Walls point cloud
Depth maps
Refined camera poses
Floorplan
Semantic maps
Segmentation
Backprojection
w/ depth maps
and camera
poses
Fig. 2: The scheme of the proposed camera poses refinement method. The novel modules and terms are colored turquoise.
conditions and outliers. In alternative BA formulations [7],
[24], the cost function is based on the minimum distance
between the rays of cameras observing the same 3D point.
Other works incorporate depth into the BA cost function [16],
[22], [23], [29]. BA problems are by no means limited with
these formulations. Additional constraints might reflect the
scene structure for more complex scene models that include
semantics, planes, geometric primitives, or objects. For in-
stance, CPA-SLAM [15] models a scan with a set of planes
and penalizes the angle between normals of planes observed
from different frames. KDP-SLAM [9] extracts planes from
the fused depth maps, matches these planes iteratively, and
penalizes point-to-plane distances for points in the landmark
planes. In BAD SLAM, the scan is represented as surfels —
oriented 3D disks with visual descriptors; similar to CPA-
SLAM, the angle between surfel normals is minimized.
We do not build a special scene representation to enforce
geometric consistency in our approach. Instead, we penalize
the distance between the matched keypoints backprojected
to 3D space using depth maps. Such point-to-point error
calculated in 3D space increases the robustness of BA and
allows to handle difficult configurations without incurring the
risks posed by a reprojection-based cost function.
B. Semantic-based Pose Refinement
SLAM methods that estimate and refine camera poses
might leverage semantic information in various ways: from
ignoring matched keypoints with different semantic labels [3]
to more inventive object-based approaches. For instance,
Frost et al. [6] adds a BA term based on the size of
detected objects and proves it to prevent scale drift over a
long trajectory. Other SLAM methods [1], [2], [13], [28]
exploit semantic segmentation to remove or detect potential
moving objects. In our camera refinement approach, we are
interested in detecting structural elements rather than objects.
Specifically, we need the semantic labels to create floor and
walls point clouds used in refinement.
C. Floorplan-Aware 3D Reconstruction
The floorplan can facilitate 3D reconstruction in various
applications. Howard et al. [8] uses a floorplan-based 3D
model for indoor localization and estimates camera pose by
comparing image features and layout features calculated on
a grid. Wijmans et al. [27] aligns RGB-D panoramas of large
indoor scenes with a floorplan. Goran et al. [20] utilizes a
floorplan in the grid-based Rao-Blackwellized particle filter
and shows that initializing the internal grid with the floorplan
information allows obtaining a more precise 2D map of an
environment. Contrary to other works, Mielle et al. [17] does
not bind the SLAM map with the floorplan but matches
the floorplan onto the SLAM map to complete missing
information and unexplored areas.
Rent3D [14] takes a floorplan and a set of RGB images
as inputs, estimates camera poses, and backprojects pixels
onto the generated coarse mesh. This approach provides a
non-realistic 3D model with objects projected onto surfaces;
moreover, it is limited to one-room scenes. Plan2Scene [26]
also constructs a 3D model, yet expands to multiple rooms
and generates more realistic surfaces via texture synthesis.
Either way, Rent3D scans lack furniture, and Plan2Scene
replaces scene objects with CAD models. Differently, we
use floorplan not to build a 3D model resembling the original
scene but to reconstruct an actual scene.
Overall, none of the existing methods address the problem
in the same formulation. Since we cannot compare with
competing approaches, we analyze each component of our
method: we expound the motivation, propose several design
choices for this component, and compare these choices
quantitatively and qualitatively in ablation studies.
III. METHOD
The pipeline of the proposed method is shown in Fig. 2.
Calculating our floorplan-aware BA cost function requires
additional steps: converting a floorplan image into a 3D
摘要:

Floorplan-AwareCameraPosesRenementAnnaSokolova1,FilippNikitin1,AnnaVorontsova1,AntonKonushin1Abstract—Processinglargeindoorscenesisachallengingtask,asscanregistrationandcameratrajectoryestimationmethodsaccumulateerrorsacrosstime.Asaresult,thequalityofreconstructedscansisinsufcientforsomeapplicatio...

展开>> 收起<<
Floorplan-Aware Camera Poses Refinement Anna Sokolova1 Filipp Nikitin1 Anna V orontsova1 Anton Konushin1 Abstract Processing large indoor scenes is a challenging.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:4.61MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注