GPR-N ET M ULTI -VIEW LAYOUT ESTIMATION VIA A GEOMETRY -AWARE PANORAMA REGISTRATION NET- WORK

2025-05-06 0 0 6.15MB 12 页 10玖币
侵权投诉
GPR-NET: MULTI-VIEW LAYOUT ESTIMATION VIA A
GEOMETRY-AWARE PANORAMA REGISTRATION NET-
WORK
Jheng-Wei Su1, Chi-Han Peng2, Peter Wonka3, Hung-Kuo Chu4
1,4National Tsing Hua University
2National Yang Ming Chiao Tung University
3King Abdullah University of Science and Technology (KAUST)
1jhengweisu@gapp.nthu.edu.tw, 2pengchihan@nycu.edu.tw
3pwonka@gmail.com, 4hkchu@cs.nthu.edu.tw
ABSTRACT
Reconstructing 3D layouts from multiple 360panoramas has received increas-
ing attention recently as estimating a complete layout of a large-scale and com-
plex room from a single panorama is very difficult. The state-of-the-art method,
called PSMNet (Wang et al., 2022), introduces the first learning-based framework
that jointly estimates the room layout and registration given a pair of panoramas.
However, PSMNet relies on an approximate (i.e., ”noisy”) registration as input.
Obtaining this input requires a solution for wide baseline registration which is a
challenging problem. In this work, we present a complete multi-view panoramic
layout estimation framework that jointly learns panorama registration and layout
estimation given a pair of panoramas without relying on a pose prior. The ma-
jor improvement over PSMNet comes from a novel Geometry-aware Panorama
Registration Network or GPR-Net that effectively tackles the wide baseline reg-
istration problem by exploiting the layout geometry and computing fine-grained
correspondences on the layout boundaries, instead of the global pixel-space. Our
architecture consists of two parts. First, given two panoramas, we adopt a vision
transformer to learn a set of 1D horizon features sampled on the panorama. These
1D horizon features encode the depths of individual layout boundary samples and
the correspondence and covisibility maps between layout boundaries. We then
exploit a non-linear registration module to convert these 1D horizon features into
a set of corresponding 2D boundary points on the layout. Finally, we estimate
the final relative camera pose via RANSAC and obtain the complete layout sim-
ply by taking the union of registered layouts. Experimental results indicate that
our method achieves state-of-the-art performance in both panorama registration
and layout estimation on a large-scale indoor panorama dataset ZInD (Cruz et al.,
2021).
1 INTRODUCTION
In this paper, we tackle the problem of room layout estimation from multiple 360panoramas.
Many approaches that can estimate room layouts from a single panorama have been proposed (Zou
et al., 2018; Yang et al., 2019; Sun et al., 2019; Pintore et al., 2020). However, these methods did
not take advantage of ”multi-view” data in which multiple panoramas are taken to better capture
a single room. These kinds of data are actually common as evidenced by several indoor datasets
such as ZInD (Cruz et al., 2021), Matterport3D (Chang et al., 2017), Gibson (Xia et al., 2018), and
Structure3D (Zheng et al., 2020) in which photographers often take multiple panoramas to better
capture complex, non-convex rooms that would be partially occluded from just a single location.
Our work mainly improves upon a recent paper, PSMNet (Wang et al., 2022), that tackled the prob-
lem of layout estimation from two panoramas captured in the same room. The idea of PSMNet is
to build an architecture that first registers two panoramas in their ceiling view projections and then
jointly estimates a 2D layout segmentation. An important aspect of their architecture is that the
1
arXiv:2210.11419v2 [cs.CV] 21 Oct 2022
layout estimation and registration can be trained jointly. However, a major limitation of PSMNet
(also mentioned in their paper) is that the architecture relies on an initial approximate registration.
The authors argued that such an approximate registration could be given either manually or com-
puted by external methods such as Structure from Motion (SfM) methods or Shabani et al. (2021).
While a manual registration may work, the method would no longer be automatic. When experi-
menting with existing methods for approximate registration, we observed that they frequently make
registration errors and even fail to provide a registration in a substantial number of cases. The main
reason is that the required registration mainly falls into the category of wide baseline registration
with only two given images. For example, our results show that the state-of-the-art SfM method
OpenMVG (Moulon et al., 2016) fails to register 76% of panorama pairs from our test dataset. It is
thus impractical to assume an independent algorithm that can reliably give an approximate solution
to the challenging wide baseline registration problem. In addition, relying on such an algorithm
moves a critical part of the problem to a pre-process.
Therefore, we set out to develop a complete multi-view panorama registration and layout estimation
framework that no longer relies on an approximate registration given as input as shown in Figure 1.
To achieve this, we propose a novel Geometry-aware Panorama Registration Network, or GPR-
Net, based on the following design ideas. First, our experiments indicate that a global (pixel-space)
registration that directly regresses pose parameters (i.e., translation and rotation) is too ambitious.
Instead, we propose to compute more fine-grained correspondences in a different space. Specifically,
GPR-Net conceptually samples the layout boundaries of two input layouts and computes features
for the sampled locations. For each boundary sample in each of the two panoramas, it estimates the
distance from the camera (depth). In addition, it estimates the correspondence map from the samples
in the first panorama to the second panorama and a covisibility map describing if a sample in the
first panorama is visible in the second panorama. Each of these maps (depth, correspondence, and
covisibility) is a 1D sequence of values.
This representation has the advantages of having more elements to register (e.g., 256 samples per
panorama) and more supervision signal for fine-grained estimation. This leads to better learning
performance. Second, we build a non-linear registration module to compute the final relative camera
pose. The module combines two horizon-depth maps with the horizon-correspondence and horizon-
covisibility maps to obtain a set of covisible corresponding boundary samples in a 2D coordinate
system aligned with the ceiling plane, followed by a RANSAC-based pose estimation. Note that this
non-linear space is more expressive and can encode a richer range of maps between two panoramas.
The final complete layout is obtained simply by taking the union of two registered layouts.
We extensively validate our model by comparing with the state-of-the-art panorama registra-
tion method and multi-view layout estimation method on a large-scale indoor panorama dataset
ZInD (Cruz et al., 2021). The experimental results demonstrate that our model is superior
to competing methods by achieving a significant performance boost in both panorama registra-
tion accuracy (mAA@5:+68.5%(rotation), +63.0%(translation), mAA@10:+74.1%(rotation),
+72.3%(translation)) and layout reconstruction accuracy (2D IoU +4.5%).
In summary, our contributions are as follows:
We propose the first complete multi-view panoramic layout estimation framework. Our
architecture jointly learns the layout and registration from data, is end-to-end trainable, and
most importantly, does not rely on a pose prior.
We devise a novel panorama registration framework to effectively tackle the wide base-
line registration problem by exploiting the layout geometry and computing a fine-grained
correspondence of samples on the layout boundaries.
We achieve state-of-the-art performance on ZInD (Cruz et al., 2021) dataset for both the
stereo panorama registration and layout reconstruction tasks.
2 RELATED WORK
2.1 SINGLE-VIEW ROOM LAYOUT ESTIMATION
There exist many methods to estimate the room layouts from just a single image taken inside an
indoor environment. Methods that take only one perspective image include earlier attempts that
2
Figure 1: Proposed multi-view panorama registration and layout estimation framework. Given
two panorama images, a neural network (GPR-Net) jointly predicts layout boundary correspon-
dences and individual layouts for the two panoramas (section 3.1,section 3.2). The correspondences
are fed to a registration module to compute the relative camera pose (T, R)(section 3.3). A layout
fusion module then computes a unified 3D layout given the camera pose and the individual layouts
(section 3.4).
relied on image clues and optimization (Hedau et al., 2009; Hoiem et al., 2007; Ramalingam &
Brand, 2013) and later neural networks (Lee et al., 2017; Yan et al., 2020). Capturing the increasing
availability and popularity of full 360panoramic images, the seminal work by Zhang et al. (Zhang
et al., 2014) proposed to take panoramas as native inputs for scene understanding. Recently, several
methods were proposed to predict the room layouts from a single panorama using neural networks.
A major difference between these methods is the assumption on the shape of the room layouts - from
being strictly a cuboid (Zou et al., 2018), Manhattan world (Yang et al., 2019; Sun et al., 2019), to
more recently general 2D layouts (Atlanta world) (Pintore et al., 2020). For our work, we choose to
adopt the Manhattan assumption because more corresponding data is available. See Zou et al. (2021)
for a thorough survey on predicting Manhattan room layouts from a single panorama. More recent
methods delivered state-of-the-art performance by transforming the problem into a depth-estimation
one (Wang et al., 2022) or by leveraging powerful transformer-based network architecture (Jiang
et al., 2022). Although these single-view methods perform well in the cuboid and L-shape rooms,
they tend to fail in the large-scale, complex and non-convex rooms where a single-view panorama
covers only part of the whole space due to occlusion.
2.2 PANORAMA REGISTRATION
Image registration, i.e., finding transformations between the cameras of two or multiple images taken
of the same scene, is a key component of Structure-from-Motion (SfM). See ¨
Ozyes¸il et al. (2017) for
a recent survey and Hartley & Zisserman (2003) for an extensive study. Registration problems can
be categorized by: 1) the assumptions about the camera model, e.g., perspective (pinhole camera),
weak-perspective, or orthographic, 2) the assumptions about the transformation, e.g., rigid, affine,
or general non-rigid, and 3) the types of the image inputs, e.g., perspective images or full 360
panoramas, and with/without depths. In addition, the difficulty differs greatly on whether the images
are taken densely or sparsely. Modern takes on registration problems often leverage state-of-art
programs/libraries such as COLMAP (Sch¨
onberger & Frahm, 2016) and OpenMVG (Moulon et al.,
2016). Our problem falls into a lesser-studied category: registering rigid transforms between sparse
panoramas. While there exist methods that tackle sparse perspective image inputs (Sala¨
un et al.,
2017; Fabbri et al., 2020) and methods that handle panoramas natively (Pagani & Stricker, 2011;
Taneja et al., 2012; Ji et al., 2020), our results show that we can improve upon the state-of-the-art
panorama registration methods in our sparse view setting. A key bottleneck is that traditional SfM
methods fail to handle the wide baseline registration problem where the views are far apart from each
other. In Shabani et al. (2021), SfM of extremely sparse panoramas was tackled by matching room
types and specific elements such as doors and windows. In contrast to previous methods that perform
the registration in the global pixel-space, we propose a novel learning-based panorama registration
framework that directly compute the registration between two panoramas without taking any prior
knowledge as input. Our method may have some similarities to the ECCV 2022 paper (Hutchcroft
3
摘要:

GPR-NET:MULTI-VIEWLAYOUTESTIMATIONVIAAGEOMETRY-AWAREPANORAMAREGISTRATIONNET-WORKJheng-WeiSu1,Chi-HanPeng2,PeterWonka3,Hung-KuoChu41;4NationalTsingHuaUniversity2NationalYangMingChiaoTungUniversity3KingAbdullahUniversityofScienceandTechnology(KAUST)1jhengweisu@gapp.nthu.edu.tw,2pengchihan@nycu.edu.tw3...

展开>> 收起<<
GPR-N ET M ULTI -VIEW LAYOUT ESTIMATION VIA A GEOMETRY -AWARE PANORAMA REGISTRATION NET- WORK.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:6.15MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注