Segmentation-guided Domain Adaptation for Ecient Depth Completion Fabian M arkert1 Martin Sunkel2 Anselm Haselho1 and Stefan Rudolph2

2025-05-03 0 0 3.7MB 13 页 10玖币
侵权投诉
Segmentation-guided Domain Adaptation for
Efficient Depth Completion
Fabian M¨arkert1, Martin Sunkel2, Anselm Haselhoff1, and Stefan Rudolph2
1Ruhr West University of Applied Science, Bottrop, Germany
fabian.maerkert@stud.hs-ruhrwest.de
anselm.haselhoff@hs-ruhrwest.de
2e:fs TechHub GmbH, Gaimersheim, Germany
{martin.sunkel,stefan.rudolph}@efs-techhub.com
Abstract. Complete depth information and efficient estimators have
become vital ingredients in scene understanding for automated driving
tasks. A major problem for LiDAR-based depth completion is the ineffi-
cient utilization of convolutions due to the lack of coherent information as
provided by the sparse nature of uncorrelated LiDAR point clouds, which
often leads to complex and resource-demanding networks. The problem
is reinforced by the expensive aquisition of depth data for supervised
training.
In this paper, we propose an efficient depth completion model based on
a vgg05-like CNN architecture and propose a semi-supervised domain
adaptation approach to transfer knowledge from synthetic to real world
data to improve data-efficiency and reduce the need for a large database.
In order to boost spatial coherence, we guide the learning process using
segmentations as additional source of information. The efficiency and
accuracy of our approach is evaluated on the KITTI dataset. Our ap-
proach improves on previous efficient and low parameter state of the art
approaches while having a noticeably lower computational footprint.
Keywords: Deep Learning ·Depth Completion ·Domain Adaptation ·
Autonomous Driving
1 Introduction
In automated driving typically multiple sensors, like camera, LiDAR and radar
sensors, are used for a better understanding of a scene. Each of these sensors
have their own advantages and drawbacks. A problem of LiDAR sensors is that
the resolution is typically a fraction of the resolution of a camera image and in
consequence depth information appears sparse and anisotropic. Convolutional
neural networks are used to solve this problem by estimating dense depth maps
in a task called depth completion.
However, a major problem of using convolutional neural networks for this
type of task is the inefficiency of convolutions for sparse input data, which is
typically solved by using complex networks with a high number of parameters.
arXiv:2210.09213v2 [cs.CV] 14 Dec 2022
2 F. M¨arkert et al.
Due to the limited processing power available in autonomous vehicles, such net-
works are not suitable for real-time autonomy. Another problem with the task of
depth completion is the expensive and hard to obtain ground truth data, which
typically contains large unlabeled image regions (e.g. KITTI depth completion
upper 100px of the image). These unlabeled regions in the ground truth data
lead to partly unrealistic depth estimates. In this paper we call these unlabeled
regions out of training distribution regions.
In this paper, we build upon the VGG05-like ScaffNet [1] for the task of
depth completion. We first train the network on the synthetic Virtual KITTI [11]
dataset and analyze its performance on real world data. We then apply a semi-
supervised domain adaptation approach to transfer the synthetically learned
topology to a real world domain. We improve the overall accuracy and out of
training distribution sample capabilities by adding an additional segmentation
input.
We analyze our approach in terms of accuracy for different configurations
using standard metrics. The evaluation is performed for in and out of training
distribution samples and finally we compare the results with a baseline network
(ScaffNet) and the efficient state of the art FusionNet [1]. In addition information
about the computational footprint of the different approaches is provided.
2 Related Work
The task of depth completion can be learned supervised using given ground truth
training data or unsupervised using camera pairs or a series of images.
In the past many approaches were based on matching stereo images or on the
parallax effect and optical flow. These approaches were effective at the time but
are now outdated due to the advancements in machine learning based methods.
Today’s self-supervised high performing depth completion approaches are
mostly based on photometric loss functions [5,1,2]. These approaches are useful
and effective if no ground truth depth data is given but they lack performance
in comparison to supervised approaches.
The authors of [7] make use of synthetic datasets and unsupervised learning
of depth priors but the approach is affected by a domain gap when applied to
real world data. To overcome this problem [4] applies advanced augmentation
of synthetic dense depth maps and uses generative adversarial networks to close
the domain gap between synthetic and real camera images. This way the domain
gap is reduced but the training on the other hand became way more complex
and resource-demanding.
The approaches proposed by the authors of [6] and [8] deliver exceptional ac-
curacies during supervised training, but are refining the estimated depth, which
leads to complex, slow and resource-demanding networks.
Sparse depth maps are densified by applying triangular interpolation in the
approach proposed by [2]. This allows for a more efficient use of convolutions
but has the drawback of a high runtime due to the used CPU bound Delaunay
triangulation algorithm. Lately, the sparse input depth maps are densified using
Segmentation-guided Domain Adaptation for Efficient Depth Completion 3
spatial pyramid pooling [3,1]. This pooling method also allows for a more efficient
use of convolutions and therefore allows for a smaller and more efficient network.
The cited approaches can only marginally reduce the number of parameters
and complexity due to the use of a secondary more resource demanding image
encoder.
Depth estimation and semantic segmentation are a popular combination for
multi-task learning networks [14,15] due to their similar network architectures, as
well as their feature-sharing and performance improvement possibilities. These
networks typically are vision based, perform a depth estimation and sometimes
rely on resource-demanding refinement, which makes these type of networks
inadequate for our use case.
3 Efficient Depth Completion
In this section, we explain the overall architecture and components of our net-
work, shown in Fig. 1, which is based on the depth prior estimating ScaffNet
proposed by the authors of [1].
Fig. 1. Network architecture, consisting of a filtering and pooling stage (Sec. 3.1), a
segmentation encoding stage (Sec. 3.2) and an encoding and decoding stage (Sec. 3.3)
exemplary with a sample from KITTI dataset [10].
Our network consists of three stages. In the filter and pooling stage, which is
taken from the original ScaffNet, sparse depth maps are locally filtered to remove
outliers and pooled using spatial pyramid pooling (SPP). The segmentation en-
coding stage, which we added, is used to capture additional object and shape
information. For the final depth prediction a VGG05-like CNN is used in the
encoding and decoding stage. In contrast to other depth completion approaches,
we do not make use of additional camera images and thus are able to drastically
decrease the number of parameters.
3.1 Filtering and Pooling
A major problem of CNNs for the task of depth completion is the inefficiency
of convolutions for sparse input data. This inefficiency leads to complex and
resource-demanding networks with a high number of parameters. Recent pub-
lications have shown that the inefficiency of convolutions on sparse input data
摘要:

Segmentation-guidedDomainAdaptationforEcientDepthCompletionFabianMarkert1,MartinSunkel2,AnselmHaselho 1,andStefanRudolph21RuhrWestUniversityofAppliedScience,Bottrop,Germanyfabian.maerkert@stud.hs-ruhrwest.deanselm.haselhoff@hs-ruhrwest.de2e:fsTechHubGmbH,Gaimersheim,Germanyfmartin.sunkel,stefan.ru...

展开>> 收起<<
Segmentation-guided Domain Adaptation for Ecient Depth Completion Fabian M arkert1 Martin Sunkel2 Anselm Haselho1 and Stefan Rudolph2.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:3.7MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注