
Floorplan-Aware Camera Poses Refinement
Anna Sokolova1, Filipp Nikitin1, Anna Vorontsova1, Anton Konushin1
Abstract— Processing large indoor scenes is a challenging
task, as scan registration and camera trajectory estimation
methods accumulate errors across time. As a result, the quality
of reconstructed scans is insufficient for some applications, such
as visual-based localization and navigation, where the correct
position of walls is crucial.
For many indoor scenes, there exists an image of a technical
floorplan that contains information about the geometry and
main structural elements of the scene, such as walls, partitions,
and doors. We argue that such a floorplan is a useful source of
spatial information, which can guide a 3D model optimization.
The standard RGB-D 3D reconstruction pipeline consists of
a tracking module applied to an RGB-D sequence and a bundle
adjustment (BA) module that takes the posed RGB-D sequence
and corrects the camera poses to improve consistency. We
propose a novel optimization algorithm expanding conventional
BA that leverages the prior knowledge about the scene structure
in the form of a floorplan. Our experiments on the Redwood
dataset and our self-captured data demonstrate that utilizing
floorplan improves accuracy of 3D reconstructions.
I. INTRODUCTION
Restoring general scene structure formed with floor and
walls is complicated for multiple reasons. First, both floor
and walls are often textureless or covered with repetitive
patterns, so the keypoints cannot be detected or correctly
matched across different frames. Then, the floor and walls
might not superimpose after a loop closure in BA due to
the errors accumulated over time. Alternatively, the surfaces
might not match perfectly when aligning partial scans of
large-scale scenes. Either way, multiple duplicate layers
appear, making the overall scan corrupted; we refer to this
unwanted effect as to layering. In addition, each surface
might have hills and pits, worsening the visual impression;
we call it unevenness. Hence, the reconstructed scans come
imperfect and should be additionally optimized.
Overall, no-reference approaches are limited by design, so
a significant improvement cannot be achieved without addi-
tional information about the scene. We argue that a technical
floorplan of a scene is one of the most available, intuitive,
and easy-to-use sources of spatial data. Floorplans reflect
the general structure of the scene, so we can use them as
guidance during optimization, comparing the reconstructed
scan with a floorplan and penalizing their divergence.
Accordingly, we address the following problem: given a
posed RGB-D sequence and a floorplan, refine camera poses
so that the scan reconstructed using these poses is consistent
with the floorplan. We assume that we have a floorplan
image that depicts vertical architectural surfaces comprising
1All authors are with Samsung AI Center, Moscow,
Russia, {a.sokolova, f.nikitin, a.vorontsova,
a.konushin}@samsung.com
Fig. 1: The reconstructed scan before (left) and after (right)
camera poses refinement with a floorplan guidance. Through
refinement, the misplaced upper right room gets aligned with
the floorplan, and multiple reconstruction artifacts (marked
with red ellipses) decrease or disappear.
the general scene structure (Fig. 4). The coordinate trans-
formation (scale, shift, and rotation) between a scan and its
floorplan might be unknown.
Typically, in scan reconstruction, camera poses are esti-
mated roughly and then refined using a bundle adjustment
(BA). We propose a novel optimization algorithm that ex-
pands BA using prior knowledge about the scene structure.
We assume that the floor surface is planar, and a scene
is bounded with planar walls matching the walls on the
floorplan. To obtain a scan that satisfies these requirements,
we impose additional constraints in BA. Specifically, we
apply semantic segmentation to select points corresponding
to floor and walls and penalize floor unevenness and the
divergence between the walls and the floorplan.
II. RELATED WORK
We propose a floorplan-aware camera poses refinement
method which extends BA. We aim to align the scan with the
floorplan and also improve geometric consistency. Besides,
we rely on semantic segmentation to detect a floor and walls
in the scan. Therefore, we review existing formulations of
geometric consistency, semantic-based pose refinement, and
floorplan-aware 3D reconstruction.
A. Geometric Consistency
The reconstructed scan should be geometry consistent, so
scan optimization (known as BA) minimizes the discrepancy
between different measurements. The BA term that reflects
geometric inconsistency can be formalized in various ways
depending on the input data, the model of a scene, and
possible applications. One of the most popular geometric
terms is based on reprojection error. However, reprojection-
based functions are not defined everywhere and exhibit sin-
gularities, making the optimization process sensitive to initial
arXiv:2210.04572v1 [cs.CV] 10 Oct 2022