GraphCSPN Geometry-Aware Depth Completion via Dynamic GCNs Xin Liu1 Xiaofei Shao2 Bo Wang2 Yali Li1 and Shengjin Wang1

2025-05-06 0 0 5.31MB 17 页 10玖币

侵权投诉

GraphCSPN: Geometry-Aware Depth

Completion via Dynamic GCNs

Xin Liu1, Xiaofei Shao2, Bo Wang2, Yali Li1, and Shengjin Wang1⋆

1Beijing National Research Center for Information Science and Technology (BNRist)

Department of Electronic Engineering, Tsinghua University

xinliu20@mails.tsinghua.edu.cn, {liyali13, wgsgj}@tsinghua.edu.cn

2Deptrum Ltd.

{xiaofei.shao, bo.wang}@deptrum.com

Abstract. Image guided depth completion aims to recover per-pixel

dense depth maps from sparse depth measurements with the help of

aligned color images, which has a wide range of applications from robotics

to autonomous driving. However, the 3D nature of sparse-to-dense depth

completion has not been fully explored by previous methods. In this

work, we propose a Graph Convolution based Spatial Propagation

Network (GraphCSPN) as a general approach for depth completion.

First, unlike previous methods, we leverage convolution neural networks

as well as graph neural networks in a complementary way for geomet-

ric representation learning. In addition, the proposed networks explic-

itly incorporate learnable geometric constraints to regularize the prop-

agation process performed in three-dimensional space rather than in

two-dimensional plane. Furthermore, we construct the graph utilizing

sequences of feature patches, and update it dynamically with an edge

attention module during propagation, so as to better capture both the

local neighboring features and global relationships over long distance. Ex-

tensive experiments on both indoor NYU-Depth-v2 and outdoor KITTI

datasets demonstrate that our method achieves the state-of-the-art per-

formance, especially when compared in the case of using only a few prop-

agation steps. Code and models are available at the project page 3.

Keywords: Depth completion, Graph neural network, Spatial propaga-

tion

1 Introduction

Depth perception plays an important role in various real-world applications of

computer vision, such as navigation of robotics [8,27] and autonomous vehi-

cles [1,11], augmented reality [5,6], and 3D face recognition [16,35]. However, it

is diﬃcult to directly acquire dense depth maps using depth sensors, including

LiDAR, time-of-ﬂight or structure-light-based 3D cameras, either because of the

⋆Corresponding author

3https://github.com/xinliu20/GraphCSPN_ECCV2022

arXiv:2210.10758v1 [cs.CV] 19 Oct 2022

2 Liu et al.

(a) RGB (b) sparse depth (c) ground truth (d) initial depth (e) propagation (f) final prediction

Fig. 1. Illustration of depth completion task using our framework. A backbone

model receives the sparse depth map and corresponding RGB image as input and

outputs an initial depth prediction. And then the initial depth is iteratively reﬁned by

our geometry-aware GraphCSPN in 3D space to produce the ﬁnal depth prediction.

Sparse depth map (b) has less than 1% valid values and is dilated for visualization.

inherent limitations of hardware or due to the interference of surrounding en-

vironment. Since depth sensors can only provide sparse depth measurements of

object at distance, there has been a growing interest within both academia and

industry in reconstructing depth in full resolution with the guidance of corre-

sponding color images.

To address this challenging problem of sparse-to-dense depth completion, a

wide variety of methods have been proposed. Early approaches [46,41,36] mainly

focus on handcrafted features which often lead to inaccurate results and have

poor generalization ability. Recent advance in deep convolutional neural net-

works (CNN) has demonstrated its promising performance on the task of depth

completion [4,37,20]. Although CNN based methods have already achieved im-

pressive results for depth completion, the inherent local connection property of

CNN makes it diﬃcult to work on depth map with sparse and irregular distribu-

tion, and hence fail to capture 3D geometric features. Inspired by graph neural

networks (GNN) that can operate on irregular data represented by a graph, we

propose a geometry-aware and dynamically constructed GNN. And it is com-

bined with CNN in a complementary way for geometric representation learning,

in order to fully explore the 3D nature of depth prediction.

Among the state-of-the-art methods for depth completion, spatial propaga-

tion [32] based models achieve better results and are more eﬃcient and inter-

pretable than direct depth completion models [33]. Convolutional spatial propa-

gation network (CSPN) [4] and other methods built on it [3,37] learn the initial

depth prediction and aﬃnity matrix for neighboring pixels, and then iteratively

reﬁne the depth prediction through recurrent convolutional operations. Recently,

Park et al . [37] propose a non-local spatial propagation network (NLSPN) which

alleviates the mixed-depth problem on object boundaries. Nevertheless, there

are several limitations regarding to such approaches. Firstly, the neighbors and

aﬃnity matrix are both ﬁxed during the entire iterative propagation process,

which may lead to incorrect predictions because of the propagation of errors in

reﬁnement module. In addition, the previous spatial propagation based methods

perform propagation in two dimensional plane without geometric constraints,

neglecting the 3D nature of depth estimation. Moreover, they suﬀer from the

problem of demanding numerous steps (e.g., 24) of iteration to obtain accu-

GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs 3

rate results. The long iteration process indicates the ineﬃciency of information

propagation and may limit their real-world applications.

To address the limitations stated above, we relax those restrictions and gen-

eralize all previous spatial propagation based methods into a uniﬁed framework

leveraging graph neural networks. The motivation behind our proposed model

is not only because GNN is capable of working on irregular data in 3D space,

but also the message passing principle [15] of GNN is strongly in accord with

the process of spatial propagation. We adopt an encoder-decoder architecture as

a simple while eﬀective multi-modality fusion strategy to learn the joint repre-

sentation of RGB and depth images, which is utilized to construct the graph.

Then the graph propagation is performed in 3D space under learnable geomet-

ric constraints with neighbors updated dynamically for every step. Furthermore,

to facilitate the propagation process, we propose an edge attention module to

aggregate information from corresponding position of neighboring patches. In

summary, the main contributions of the paper are as follows:

•We propose a graph convolution based spatial propagation network for sparse-

to-dense depth completion. It is a generic and propagation-eﬃcient frame-

work and only requires 3 or less propagation steps compared with 18 or more

steps used in previous methods.

•We develop a geometry-aware and dynamically constructed graph neural

network with an edge attention module. The proposed model provides new

insights on how GNN can help to deal with 2D images in 3D perception

related tasks.

•Extensive experiments are conducted on both indoor NYU-Depth-v2 and

outdoor KITTI datasets which show that our method achieves better results

than previous state-of-the-art approaches.

2 Related Work

Depth Completion Image guided depth completion is an important subﬁeld

of depth estimation, which aims to predict dense depth maps from various input

information with diﬀerent modalities. However, depth estimation from only a

single RGB image often leads to unreliable results due to the inherent ambiguity

of depth prediction from images. To attain a robust and accurate estimation,

Ma and Karaman [33] proposed a deep regression model for depth completion,

which boosts the accuracy of prediction by a large margin compared to using only

RGB images. To address the problems of image guided depth completion, various

deep learning based methods have been proposed – e.g., sparse invariant con-

volution [22,9,23], conﬁdence propagation [10,18], multi-modality fusion [43,21],

utilizing Bayesian networks [39] and unsupervised learning [48], exploiting se-

mantic segmentation [26,40] and surface normal [38,50] as auxiliary tasks.

Spatial Propagation Network The spatial propagation network (SPN) pro-

posed in [32] can learn semantically-aware aﬃnity matrix for vision tasks includ-

ing depth completion. The propagation of SPN is performed sequentially in a

row-wise and column-wise manner with a three-way connection, which can only

4 Liu et al.

capture limited local features in an ineﬃcient way. Cheng et al. [4] applied SPN

on the task of depth completion and proposed a convolutional spatial propaga-

tion network (CSPN), which performs propagation with a manner of recurrent

convolutional operation and alleviates the ineﬃciency problem of SPN. Later,

CSPN++ [3] was proposed to learn context aware and resource aware convo-

lutional spatial propagation networks and improves the accuracy and eﬃciency

of depth completion. Recently, Park et al. [37] proposed NLSPN to learn de-

formable kernels for propagation which is robust to mixed-depth problem on

depth boundaries. Following this family of approaches based on spatial propaga-

tion, we further propose a graph convolution based spatial propagation network

(GraphCSPN) which provides a generic framework for depth completion. Unlike

previous methods, GraphCSPN is constructed dynamically by learned patch-

wise aﬃnities and performs eﬃcient propagation with geometrically relevant

neighbors in three-dimensional space rather than in two-dimensional plane.

Graph Neural Network Graph neural networks (GNNs) receive a set of nodes

as input and are invariant to permutations of the node sequence. GNNs work

directly on graph-based data and capture dependency of objects via message

passing between nodes [52,15,30]. GNNs have been applied in various vision

tasks, such as image classiﬁcation [12,47], object detection [19,17] and visual

question answering [44,34]. Unlike previous depth completion methods using

GNNs for multi-modality fusion [51], learning dynamic kernel [49], we leverage

GNNs as its message passing principle is in accord with spatial propagation. In

addition, we develop a geometry-aware and dynamically constructed GCN with

edge attention to aggregate and update information from neighboring nodes.

3 Method

In this section, we start by introducing the spatial propagation network (SPN)

and previous methods that build on SPN. To address the limitations of those

methods, we present our graph convolution based spatial propagation network

and show how it extends and generalizes earlier approaches into a uniﬁed frame-

work. Then we describe in details every component of the proposed frame-

work, including graph construction, neighborhood estimation and graph propa-

gation. Furthermore, a theoretical analysis of our method from the perspective

of anisotropic diﬀusion is provided in supplementary material.

3.1 Spatial Propagation Network

In the task of sparse-to-dense depth completion, spatial propagation network [32]

is designed to be a reﬁne module working on the initial depth prediction in a

recursive manner. The initial depth prediction can be the output of an encoder-

decoder network or other networks utilizing more complicated multi-modality

fusion strategies. After several iteration steps, the ﬁnal prediction result is ob-

tained with more detailed and accurate structure. We formulate the updating

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GraphCSPN:Geometry-AwareDepthCompletionviaDynamicGCNsXinLiu1,XiaofeiShao2,BoWang2,YaliLi1,andShengjinWang1⋆1BeijingNationalResearchCenterforInformationScienceandTechnology(BNRist)DepartmentofElectronicEngineering,TsinghuaUniversityxinliu20@mails.tsinghua.edu.cn,{liyali13,wgsgj}@tsinghua.edu.cn2Deptr...

展开>> 收起<<

GraphCSPN Geometry-Aware Depth Completion via Dynamic GCNs Xin Liu1 Xiaofei Shao2 Bo Wang2 Yali Li1 and Shengjin Wang1.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

GraphCSPN Geometry-Aware Depth Completion via Dynamic GCNs Xin Liu1 Xiaofei Shao2 Bo Wang2 Yali Li1 and Shengjin Wang1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: