GraphCSPN Geometry-Aware Depth Completion via Dynamic GCNs Xin Liu1 Xiaofei Shao2 Bo Wang2 Yali Li1 and Shengjin Wang1

2025-05-06 0 0 5.31MB 17 页 10玖币
侵权投诉
GraphCSPN: Geometry-Aware Depth
Completion via Dynamic GCNs
Xin Liu1, Xiaofei Shao2, Bo Wang2, Yali Li1, and Shengjin Wang1
1Beijing National Research Center for Information Science and Technology (BNRist)
Department of Electronic Engineering, Tsinghua University
xinliu20@mails.tsinghua.edu.cn, {liyali13, wgsgj}@tsinghua.edu.cn
2Deptrum Ltd.
{xiaofei.shao, bo.wang}@deptrum.com
Abstract. Image guided depth completion aims to recover per-pixel
dense depth maps from sparse depth measurements with the help of
aligned color images, which has a wide range of applications from robotics
to autonomous driving. However, the 3D nature of sparse-to-dense depth
completion has not been fully explored by previous methods. In this
work, we propose a Graph Convolution based Spatial Propagation
Network (GraphCSPN) as a general approach for depth completion.
First, unlike previous methods, we leverage convolution neural networks
as well as graph neural networks in a complementary way for geomet-
ric representation learning. In addition, the proposed networks explic-
itly incorporate learnable geometric constraints to regularize the prop-
agation process performed in three-dimensional space rather than in
two-dimensional plane. Furthermore, we construct the graph utilizing
sequences of feature patches, and update it dynamically with an edge
attention module during propagation, so as to better capture both the
local neighboring features and global relationships over long distance. Ex-
tensive experiments on both indoor NYU-Depth-v2 and outdoor KITTI
datasets demonstrate that our method achieves the state-of-the-art per-
formance, especially when compared in the case of using only a few prop-
agation steps. Code and models are available at the project page 3.
Keywords: Depth completion, Graph neural network, Spatial propaga-
tion
1 Introduction
Depth perception plays an important role in various real-world applications of
computer vision, such as navigation of robotics [8,27] and autonomous vehi-
cles [1,11], augmented reality [5,6], and 3D face recognition [16,35]. However, it
is difficult to directly acquire dense depth maps using depth sensors, including
LiDAR, time-of-flight or structure-light-based 3D cameras, either because of the
Corresponding author
3https://github.com/xinliu20/GraphCSPN_ECCV2022
arXiv:2210.10758v1 [cs.CV] 19 Oct 2022
2 Liu et al.
(a) RGB (b) sparse depth (c) ground truth (d) initial depth (e) propagation (f) final prediction
Fig. 1. Illustration of depth completion task using our framework. A backbone
model receives the sparse depth map and corresponding RGB image as input and
outputs an initial depth prediction. And then the initial depth is iteratively refined by
our geometry-aware GraphCSPN in 3D space to produce the final depth prediction.
Sparse depth map (b) has less than 1% valid values and is dilated for visualization.
inherent limitations of hardware or due to the interference of surrounding en-
vironment. Since depth sensors can only provide sparse depth measurements of
object at distance, there has been a growing interest within both academia and
industry in reconstructing depth in full resolution with the guidance of corre-
sponding color images.
To address this challenging problem of sparse-to-dense depth completion, a
wide variety of methods have been proposed. Early approaches [46,41,36] mainly
focus on handcrafted features which often lead to inaccurate results and have
poor generalization ability. Recent advance in deep convolutional neural net-
works (CNN) has demonstrated its promising performance on the task of depth
completion [4,37,20]. Although CNN based methods have already achieved im-
pressive results for depth completion, the inherent local connection property of
CNN makes it difficult to work on depth map with sparse and irregular distribu-
tion, and hence fail to capture 3D geometric features. Inspired by graph neural
networks (GNN) that can operate on irregular data represented by a graph, we
propose a geometry-aware and dynamically constructed GNN. And it is com-
bined with CNN in a complementary way for geometric representation learning,
in order to fully explore the 3D nature of depth prediction.
Among the state-of-the-art methods for depth completion, spatial propaga-
tion [32] based models achieve better results and are more efficient and inter-
pretable than direct depth completion models [33]. Convolutional spatial propa-
gation network (CSPN) [4] and other methods built on it [3,37] learn the initial
depth prediction and affinity matrix for neighboring pixels, and then iteratively
refine the depth prediction through recurrent convolutional operations. Recently,
Park et al . [37] propose a non-local spatial propagation network (NLSPN) which
alleviates the mixed-depth problem on object boundaries. Nevertheless, there
are several limitations regarding to such approaches. Firstly, the neighbors and
affinity matrix are both fixed during the entire iterative propagation process,
which may lead to incorrect predictions because of the propagation of errors in
refinement module. In addition, the previous spatial propagation based methods
perform propagation in two dimensional plane without geometric constraints,
neglecting the 3D nature of depth estimation. Moreover, they suffer from the
problem of demanding numerous steps (e.g., 24) of iteration to obtain accu-
GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs 3
rate results. The long iteration process indicates the inefficiency of information
propagation and may limit their real-world applications.
To address the limitations stated above, we relax those restrictions and gen-
eralize all previous spatial propagation based methods into a unified framework
leveraging graph neural networks. The motivation behind our proposed model
is not only because GNN is capable of working on irregular data in 3D space,
but also the message passing principle [15] of GNN is strongly in accord with
the process of spatial propagation. We adopt an encoder-decoder architecture as
a simple while effective multi-modality fusion strategy to learn the joint repre-
sentation of RGB and depth images, which is utilized to construct the graph.
Then the graph propagation is performed in 3D space under learnable geomet-
ric constraints with neighbors updated dynamically for every step. Furthermore,
to facilitate the propagation process, we propose an edge attention module to
aggregate information from corresponding position of neighboring patches. In
summary, the main contributions of the paper are as follows:
We propose a graph convolution based spatial propagation network for sparse-
to-dense depth completion. It is a generic and propagation-efficient frame-
work and only requires 3 or less propagation steps compared with 18 or more
steps used in previous methods.
We develop a geometry-aware and dynamically constructed graph neural
network with an edge attention module. The proposed model provides new
insights on how GNN can help to deal with 2D images in 3D perception
related tasks.
Extensive experiments are conducted on both indoor NYU-Depth-v2 and
outdoor KITTI datasets which show that our method achieves better results
than previous state-of-the-art approaches.
2 Related Work
Depth Completion Image guided depth completion is an important subfield
of depth estimation, which aims to predict dense depth maps from various input
information with different modalities. However, depth estimation from only a
single RGB image often leads to unreliable results due to the inherent ambiguity
of depth prediction from images. To attain a robust and accurate estimation,
Ma and Karaman [33] proposed a deep regression model for depth completion,
which boosts the accuracy of prediction by a large margin compared to using only
RGB images. To address the problems of image guided depth completion, various
deep learning based methods have been proposed – e.g., sparse invariant con-
volution [22,9,23], confidence propagation [10,18], multi-modality fusion [43,21],
utilizing Bayesian networks [39] and unsupervised learning [48], exploiting se-
mantic segmentation [26,40] and surface normal [38,50] as auxiliary tasks.
Spatial Propagation Network The spatial propagation network (SPN) pro-
posed in [32] can learn semantically-aware affinity matrix for vision tasks includ-
ing depth completion. The propagation of SPN is performed sequentially in a
row-wise and column-wise manner with a three-way connection, which can only
4 Liu et al.
capture limited local features in an inefficient way. Cheng et al. [4] applied SPN
on the task of depth completion and proposed a convolutional spatial propaga-
tion network (CSPN), which performs propagation with a manner of recurrent
convolutional operation and alleviates the inefficiency problem of SPN. Later,
CSPN++ [3] was proposed to learn context aware and resource aware convo-
lutional spatial propagation networks and improves the accuracy and efficiency
of depth completion. Recently, Park et al. [37] proposed NLSPN to learn de-
formable kernels for propagation which is robust to mixed-depth problem on
depth boundaries. Following this family of approaches based on spatial propaga-
tion, we further propose a graph convolution based spatial propagation network
(GraphCSPN) which provides a generic framework for depth completion. Unlike
previous methods, GraphCSPN is constructed dynamically by learned patch-
wise affinities and performs efficient propagation with geometrically relevant
neighbors in three-dimensional space rather than in two-dimensional plane.
Graph Neural Network Graph neural networks (GNNs) receive a set of nodes
as input and are invariant to permutations of the node sequence. GNNs work
directly on graph-based data and capture dependency of objects via message
passing between nodes [52,15,30]. GNNs have been applied in various vision
tasks, such as image classification [12,47], object detection [19,17] and visual
question answering [44,34]. Unlike previous depth completion methods using
GNNs for multi-modality fusion [51], learning dynamic kernel [49], we leverage
GNNs as its message passing principle is in accord with spatial propagation. In
addition, we develop a geometry-aware and dynamically constructed GCN with
edge attention to aggregate and update information from neighboring nodes.
3 Method
In this section, we start by introducing the spatial propagation network (SPN)
and previous methods that build on SPN. To address the limitations of those
methods, we present our graph convolution based spatial propagation network
and show how it extends and generalizes earlier approaches into a unified frame-
work. Then we describe in details every component of the proposed frame-
work, including graph construction, neighborhood estimation and graph propa-
gation. Furthermore, a theoretical analysis of our method from the perspective
of anisotropic diffusion is provided in supplementary material.
3.1 Spatial Propagation Network
In the task of sparse-to-dense depth completion, spatial propagation network [32]
is designed to be a refine module working on the initial depth prediction in a
recursive manner. The initial depth prediction can be the output of an encoder-
decoder network or other networks utilizing more complicated multi-modality
fusion strategies. After several iteration steps, the final prediction result is ob-
tained with more detailed and accurate structure. We formulate the updating
摘要:

GraphCSPN:Geometry-AwareDepthCompletionviaDynamicGCNsXinLiu1,XiaofeiShao2,BoWang2,YaliLi1,andShengjinWang1⋆1BeijingNationalResearchCenterforInformationScienceandTechnology(BNRist)DepartmentofElectronicEngineering,TsinghuaUniversityxinliu20@mails.tsinghua.edu.cn,{liyali13,wgsgj}@tsinghua.edu.cn2Deptr...

展开>> 收起<<
GraphCSPN Geometry-Aware Depth Completion via Dynamic GCNs Xin Liu1 Xiaofei Shao2 Bo Wang2 Yali Li1 and Shengjin Wang1.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:5.31MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注