GPR-N ET M ULTI -VIEW LAYOUT ESTIMATION VIA A GEOMETRY -AWARE PANORAMA REGISTRATION NET- WORK

2025-05-06 0 0 6.15MB 12 页 10玖币

侵权投诉

GPR-NET: MULTI-VIEW LAYOUT ESTIMATION VIA A

GEOMETRY-AWARE PANORAMA REGISTRATION NET-

WORK

Jheng-Wei Su1, Chi-Han Peng2, Peter Wonka3, Hung-Kuo Chu4

1,4National Tsing Hua University

2National Yang Ming Chiao Tung University

3King Abdullah University of Science and Technology (KAUST)

1jhengweisu@gapp.nthu.edu.tw, 2pengchihan@nycu.edu.tw

3pwonka@gmail.com, 4hkchu@cs.nthu.edu.tw

ABSTRACT

Reconstructing 3D layouts from multiple 360◦panoramas has received increas-

ing attention recently as estimating a complete layout of a large-scale and com-

plex room from a single panorama is very difﬁcult. The state-of-the-art method,

called PSMNet (Wang et al., 2022), introduces the ﬁrst learning-based framework

that jointly estimates the room layout and registration given a pair of panoramas.

However, PSMNet relies on an approximate (i.e., ”noisy”) registration as input.

Obtaining this input requires a solution for wide baseline registration which is a

challenging problem. In this work, we present a complete multi-view panoramic

layout estimation framework that jointly learns panorama registration and layout

estimation given a pair of panoramas without relying on a pose prior. The ma-

jor improvement over PSMNet comes from a novel Geometry-aware Panorama

Registration Network or GPR-Net that effectively tackles the wide baseline reg-

istration problem by exploiting the layout geometry and computing ﬁne-grained

correspondences on the layout boundaries, instead of the global pixel-space. Our

architecture consists of two parts. First, given two panoramas, we adopt a vision

transformer to learn a set of 1D horizon features sampled on the panorama. These

1D horizon features encode the depths of individual layout boundary samples and

the correspondence and covisibility maps between layout boundaries. We then

exploit a non-linear registration module to convert these 1D horizon features into

a set of corresponding 2D boundary points on the layout. Finally, we estimate

the ﬁnal relative camera pose via RANSAC and obtain the complete layout sim-

ply by taking the union of registered layouts. Experimental results indicate that

our method achieves state-of-the-art performance in both panorama registration

and layout estimation on a large-scale indoor panorama dataset ZInD (Cruz et al.,

2021).

1 INTRODUCTION

In this paper, we tackle the problem of room layout estimation from multiple 360◦panoramas.

Many approaches that can estimate room layouts from a single panorama have been proposed (Zou

et al., 2018; Yang et al., 2019; Sun et al., 2019; Pintore et al., 2020). However, these methods did

not take advantage of ”multi-view” data in which multiple panoramas are taken to better capture

a single room. These kinds of data are actually common as evidenced by several indoor datasets

such as ZInD (Cruz et al., 2021), Matterport3D (Chang et al., 2017), Gibson (Xia et al., 2018), and

Structure3D (Zheng et al., 2020) in which photographers often take multiple panoramas to better

capture complex, non-convex rooms that would be partially occluded from just a single location.

Our work mainly improves upon a recent paper, PSMNet (Wang et al., 2022), that tackled the prob-

lem of layout estimation from two panoramas captured in the same room. The idea of PSMNet is

to build an architecture that ﬁrst registers two panoramas in their ceiling view projections and then

jointly estimates a 2D layout segmentation. An important aspect of their architecture is that the

arXiv:2210.11419v2 [cs.CV] 21 Oct 2022

layout estimation and registration can be trained jointly. However, a major limitation of PSMNet

(also mentioned in their paper) is that the architecture relies on an initial approximate registration.

The authors argued that such an approximate registration could be given either manually or com-

puted by external methods such as Structure from Motion (SfM) methods or Shabani et al. (2021).

While a manual registration may work, the method would no longer be automatic. When experi-

menting with existing methods for approximate registration, we observed that they frequently make

registration errors and even fail to provide a registration in a substantial number of cases. The main

reason is that the required registration mainly falls into the category of wide baseline registration

with only two given images. For example, our results show that the state-of-the-art SfM method

OpenMVG (Moulon et al., 2016) fails to register 76% of panorama pairs from our test dataset. It is

thus impractical to assume an independent algorithm that can reliably give an approximate solution

to the challenging wide baseline registration problem. In addition, relying on such an algorithm

moves a critical part of the problem to a pre-process.

Therefore, we set out to develop a complete multi-view panorama registration and layout estimation

framework that no longer relies on an approximate registration given as input as shown in Figure 1.

To achieve this, we propose a novel Geometry-aware Panorama Registration Network, or GPR-

Net, based on the following design ideas. First, our experiments indicate that a global (pixel-space)

registration that directly regresses pose parameters (i.e., translation and rotation) is too ambitious.

Instead, we propose to compute more ﬁne-grained correspondences in a different space. Speciﬁcally,

GPR-Net conceptually samples the layout boundaries of two input layouts and computes features

for the sampled locations. For each boundary sample in each of the two panoramas, it estimates the

distance from the camera (depth). In addition, it estimates the correspondence map from the samples

in the ﬁrst panorama to the second panorama and a covisibility map describing if a sample in the

ﬁrst panorama is visible in the second panorama. Each of these maps (depth, correspondence, and

covisibility) is a 1D sequence of values.

This representation has the advantages of having more elements to register (e.g., 256 samples per

panorama) and more supervision signal for ﬁne-grained estimation. This leads to better learning

performance. Second, we build a non-linear registration module to compute the ﬁnal relative camera

pose. The module combines two horizon-depth maps with the horizon-correspondence and horizon-

covisibility maps to obtain a set of covisible corresponding boundary samples in a 2D coordinate

system aligned with the ceiling plane, followed by a RANSAC-based pose estimation. Note that this

non-linear space is more expressive and can encode a richer range of maps between two panoramas.

The ﬁnal complete layout is obtained simply by taking the union of two registered layouts.

We extensively validate our model by comparing with the state-of-the-art panorama registra-

tion method and multi-view layout estimation method on a large-scale indoor panorama dataset

ZInD (Cruz et al., 2021). The experimental results demonstrate that our model is superior

to competing methods by achieving a signiﬁcant performance boost in both panorama registra-

tion accuracy (mAA@5◦:+68.5%(rotation), +63.0%(translation), mAA@10◦:+74.1%(rotation),

+72.3%(translation)) and layout reconstruction accuracy (2D IoU +4.5%).

In summary, our contributions are as follows:

• We propose the ﬁrst complete multi-view panoramic layout estimation framework. Our

architecture jointly learns the layout and registration from data, is end-to-end trainable, and

most importantly, does not rely on a pose prior.

• We devise a novel panorama registration framework to effectively tackle the wide base-

line registration problem by exploiting the layout geometry and computing a ﬁne-grained

correspondence of samples on the layout boundaries.

• We achieve state-of-the-art performance on ZInD (Cruz et al., 2021) dataset for both the

stereo panorama registration and layout reconstruction tasks.

2 RELATED WORK

2.1 SINGLE-VIEW ROOM LAYOUT ESTIMATION

There exist many methods to estimate the room layouts from just a single image taken inside an

indoor environment. Methods that take only one perspective image include earlier attempts that

Figure 1: Proposed multi-view panorama registration and layout estimation framework. Given

two panorama images, a neural network (GPR-Net) jointly predicts layout boundary correspon-

dences and individual layouts for the two panoramas (section 3.1,section 3.2). The correspondences

are fed to a registration module to compute the relative camera pose (T, R)(section 3.3). A layout

fusion module then computes a uniﬁed 3D layout given the camera pose and the individual layouts

(section 3.4).

relied on image clues and optimization (Hedau et al., 2009; Hoiem et al., 2007; Ramalingam &

Brand, 2013) and later neural networks (Lee et al., 2017; Yan et al., 2020). Capturing the increasing

availability and popularity of full 360◦panoramic images, the seminal work by Zhang et al. (Zhang

et al., 2014) proposed to take panoramas as native inputs for scene understanding. Recently, several

methods were proposed to predict the room layouts from a single panorama using neural networks.

A major difference between these methods is the assumption on the shape of the room layouts - from

being strictly a cuboid (Zou et al., 2018), Manhattan world (Yang et al., 2019; Sun et al., 2019), to

more recently general 2D layouts (Atlanta world) (Pintore et al., 2020). For our work, we choose to

adopt the Manhattan assumption because more corresponding data is available. See Zou et al. (2021)

for a thorough survey on predicting Manhattan room layouts from a single panorama. More recent

methods delivered state-of-the-art performance by transforming the problem into a depth-estimation

one (Wang et al., 2022) or by leveraging powerful transformer-based network architecture (Jiang

et al., 2022). Although these single-view methods perform well in the cuboid and L-shape rooms,

they tend to fail in the large-scale, complex and non-convex rooms where a single-view panorama

covers only part of the whole space due to occlusion.

2.2 PANORAMA REGISTRATION

Image registration, i.e., ﬁnding transformations between the cameras of two or multiple images taken

of the same scene, is a key component of Structure-from-Motion (SfM). See ¨

Ozyes¸il et al. (2017) for

a recent survey and Hartley & Zisserman (2003) for an extensive study. Registration problems can

be categorized by: 1) the assumptions about the camera model, e.g., perspective (pinhole camera),

weak-perspective, or orthographic, 2) the assumptions about the transformation, e.g., rigid, afﬁne,

or general non-rigid, and 3) the types of the image inputs, e.g., perspective images or full 360◦

panoramas, and with/without depths. In addition, the difﬁculty differs greatly on whether the images

are taken densely or sparsely. Modern takes on registration problems often leverage state-of-art

programs/libraries such as COLMAP (Sch¨

onberger & Frahm, 2016) and OpenMVG (Moulon et al.,

2016). Our problem falls into a lesser-studied category: registering rigid transforms between sparse

panoramas. While there exist methods that tackle sparse perspective image inputs (Sala¨

un et al.,

2017; Fabbri et al., 2020) and methods that handle panoramas natively (Pagani & Stricker, 2011;

Taneja et al., 2012; Ji et al., 2020), our results show that we can improve upon the state-of-the-art

panorama registration methods in our sparse view setting. A key bottleneck is that traditional SfM

methods fail to handle the wide baseline registration problem where the views are far apart from each

other. In Shabani et al. (2021), SfM of extremely sparse panoramas was tackled by matching room

types and speciﬁc elements such as doors and windows. In contrast to previous methods that perform

the registration in the global pixel-space, we propose a novel learning-based panorama registration

framework that directly compute the registration between two panoramas without taking any prior

knowledge as input. Our method may have some similarities to the ECCV 2022 paper (Hutchcroft

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GPR-NET:MULTI-VIEWLAYOUTESTIMATIONVIAAGEOMETRY-AWAREPANORAMAREGISTRATIONNET-WORKJheng-WeiSu1,Chi-HanPeng2,PeterWonka3,Hung-KuoChu41;4NationalTsingHuaUniversity2NationalYangMingChiaoTungUniversity3KingAbdullahUniversityofScienceandTechnology(KAUST)1jhengweisu@gapp.nthu.edu.tw,2pengchihan@nycu.edu.tw3...

展开>> 收起<<

GPR-N ET M ULTI -VIEW LAYOUT ESTIMATION VIA A GEOMETRY -AWARE PANORAMA REGISTRATION NET- WORK.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

GPR-N ET M ULTI -VIEW LAYOUT ESTIMATION VIA A GEOMETRY -AWARE PANORAMA REGISTRATION NET- WORK

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: