Segmentation-guided Domain Adaptation for Ecient Depth Completion Fabian M arkert1 Martin Sunkel2 Anselm Haselho1 and Stefan Rudolph2

2025-05-03 0 0 3.7MB 13 页 10玖币

侵权投诉

Segmentation-guided Domain Adaptation for

Eﬃcient Depth Completion

Fabian M¨arkert1, Martin Sunkel2, Anselm Haselhoﬀ1, and Stefan Rudolph2

1Ruhr West University of Applied Science, Bottrop, Germany

fabian.maerkert@stud.hs-ruhrwest.de

anselm.haselhoff@hs-ruhrwest.de

2e:fs TechHub GmbH, Gaimersheim, Germany

{martin.sunkel,stefan.rudolph}@efs-techhub.com

Abstract. Complete depth information and eﬃcient estimators have

become vital ingredients in scene understanding for automated driving

tasks. A major problem for LiDAR-based depth completion is the ineﬃ-

cient utilization of convolutions due to the lack of coherent information as

provided by the sparse nature of uncorrelated LiDAR point clouds, which

often leads to complex and resource-demanding networks. The problem

is reinforced by the expensive aquisition of depth data for supervised

training.

In this paper, we propose an eﬃcient depth completion model based on

a vgg05-like CNN architecture and propose a semi-supervised domain

adaptation approach to transfer knowledge from synthetic to real world

data to improve data-eﬃciency and reduce the need for a large database.

In order to boost spatial coherence, we guide the learning process using

segmentations as additional source of information. The eﬃciency and

accuracy of our approach is evaluated on the KITTI dataset. Our ap-

proach improves on previous eﬃcient and low parameter state of the art

approaches while having a noticeably lower computational footprint.

Keywords: Deep Learning ·Depth Completion ·Domain Adaptation ·

Autonomous Driving

1 Introduction

In automated driving typically multiple sensors, like camera, LiDAR and radar

sensors, are used for a better understanding of a scene. Each of these sensors

have their own advantages and drawbacks. A problem of LiDAR sensors is that

the resolution is typically a fraction of the resolution of a camera image and in

consequence depth information appears sparse and anisotropic. Convolutional

neural networks are used to solve this problem by estimating dense depth maps

in a task called depth completion.

However, a major problem of using convolutional neural networks for this

type of task is the ineﬃciency of convolutions for sparse input data, which is

typically solved by using complex networks with a high number of parameters.

arXiv:2210.09213v2 [cs.CV] 14 Dec 2022

2 F. M¨arkert et al.

Due to the limited processing power available in autonomous vehicles, such net-

works are not suitable for real-time autonomy. Another problem with the task of

depth completion is the expensive and hard to obtain ground truth data, which

typically contains large unlabeled image regions (e.g. KITTI depth completion

upper 100px of the image). These unlabeled regions in the ground truth data

lead to partly unrealistic depth estimates. In this paper we call these unlabeled

regions out of training distribution regions.

In this paper, we build upon the VGG05-like ScaﬀNet [1] for the task of

depth completion. We ﬁrst train the network on the synthetic Virtual KITTI [11]

dataset and analyze its performance on real world data. We then apply a semi-

supervised domain adaptation approach to transfer the synthetically learned

topology to a real world domain. We improve the overall accuracy and out of

training distribution sample capabilities by adding an additional segmentation

input.

We analyze our approach in terms of accuracy for diﬀerent conﬁgurations

using standard metrics. The evaluation is performed for in and out of training

distribution samples and ﬁnally we compare the results with a baseline network

(ScaﬀNet) and the eﬃcient state of the art FusionNet [1]. In addition information

about the computational footprint of the diﬀerent approaches is provided.

2 Related Work

The task of depth completion can be learned supervised using given ground truth

training data or unsupervised using camera pairs or a series of images.

In the past many approaches were based on matching stereo images or on the

parallax eﬀect and optical ﬂow. These approaches were eﬀective at the time but

are now outdated due to the advancements in machine learning based methods.

Today’s self-supervised high performing depth completion approaches are

mostly based on photometric loss functions [5,1,2]. These approaches are useful

and eﬀective if no ground truth depth data is given but they lack performance

in comparison to supervised approaches.

The authors of [7] make use of synthetic datasets and unsupervised learning

of depth priors but the approach is aﬀected by a domain gap when applied to

real world data. To overcome this problem [4] applies advanced augmentation

of synthetic dense depth maps and uses generative adversarial networks to close

the domain gap between synthetic and real camera images. This way the domain

gap is reduced but the training on the other hand became way more complex

and resource-demanding.

The approaches proposed by the authors of [6] and [8] deliver exceptional ac-

curacies during supervised training, but are reﬁning the estimated depth, which

leads to complex, slow and resource-demanding networks.

Sparse depth maps are densiﬁed by applying triangular interpolation in the

approach proposed by [2]. This allows for a more eﬃcient use of convolutions

but has the drawback of a high runtime due to the used CPU bound Delaunay

triangulation algorithm. Lately, the sparse input depth maps are densiﬁed using

Segmentation-guided Domain Adaptation for Eﬃcient Depth Completion 3

spatial pyramid pooling [3,1]. This pooling method also allows for a more eﬃcient

use of convolutions and therefore allows for a smaller and more eﬃcient network.

The cited approaches can only marginally reduce the number of parameters

and complexity due to the use of a secondary more resource demanding image

encoder.

Depth estimation and semantic segmentation are a popular combination for

multi-task learning networks [14,15] due to their similar network architectures, as

well as their feature-sharing and performance improvement possibilities. These

networks typically are vision based, perform a depth estimation and sometimes

rely on resource-demanding reﬁnement, which makes these type of networks

inadequate for our use case.

3 Eﬃcient Depth Completion

In this section, we explain the overall architecture and components of our net-

work, shown in Fig. 1, which is based on the depth prior estimating ScaﬀNet

proposed by the authors of [1].

Fig. 1. Network architecture, consisting of a ﬁltering and pooling stage (Sec. 3.1), a

segmentation encoding stage (Sec. 3.2) and an encoding and decoding stage (Sec. 3.3)

exemplary with a sample from KITTI dataset [10].

Our network consists of three stages. In the ﬁlter and pooling stage, which is

taken from the original ScaﬀNet, sparse depth maps are locally ﬁltered to remove

outliers and pooled using spatial pyramid pooling (SPP). The segmentation en-

coding stage, which we added, is used to capture additional object and shape

information. For the ﬁnal depth prediction a VGG05-like CNN is used in the

encoding and decoding stage. In contrast to other depth completion approaches,

we do not make use of additional camera images and thus are able to drastically

decrease the number of parameters.

3.1 Filtering and Pooling

A major problem of CNNs for the task of depth completion is the ineﬃciency

of convolutions for sparse input data. This ineﬃciency leads to complex and

resource-demanding networks with a high number of parameters. Recent pub-

lications have shown that the ineﬃciency of convolutions on sparse input data

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Segmentation-guidedDomainAdaptationforEcientDepthCompletionFabianMarkert1,MartinSunkel2,AnselmHaselho1,andStefanRudolph21RuhrWestUniversityofAppliedScience,Bottrop,Germanyfabian.maerkert@stud.hs-ruhrwest.deanselm.haselhoff@hs-ruhrwest.de2e:fsTechHubGmbH,Gaimersheim,Germanyfmartin.sunkel,stefan.ru...

展开>> 收起<<

Segmentation-guided Domain Adaptation for Ecient Depth Completion Fabian M arkert1 Martin Sunkel2 Anselm Haselho1 and Stefan Rudolph2.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Segmentation-guided Domain Adaptation for Ecient Depth Completion Fabian M arkert1 Martin Sunkel2 Anselm Haselho1 and Stefan Rudolph2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: