Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather Jinlong Li1 Runsheng Xu2 Jin Ma1 Qin Zou3 Jiaqi Ma2 Hongkai Yu1

2025-05-03 0 0 1.71MB 11 页 10玖币

侵权投诉

Domain Adaptive Object Detection for Autonomous Driving under Foggy

Weather

Jinlong Li1, Runsheng Xu2, Jin Ma1, Qin Zou3, Jiaqi Ma2, Hongkai Yu1*

1Cleveland State University, 2University of California, Los Angeles, 3Wuhan University

j.li56@vikes.csuohio.edu, h.yu19@csuohio.edu

Abstract

Most object detection methods for autonomous driving

usually assume a consistent feature distribution between

training and testing data, which is not always the case when

weathers differ signiﬁcantly. The object detection model

trained under clear weather might be not effective enough

on the foggy weather because of the domain gap. This paper

proposes a novel domain adaptive object detection frame-

work for autonomous driving under foggy weather. Our

method leverages both image-level and object-level adap-

tation to diminish the domain discrepancy in image style

and object appearance. To further enhance the model’s

capabilities under challenging samples, we also come up

with a new adversarial gradient reversal layer to perform

adversarial mining for the hard examples together with do-

main adaptation. Moreover, we propose to generate an

auxiliary domain by data augmentation to enforce a new

domain-level metric regularization. Experimental results

on public benchmarks show the effectiveness and accu-

racy of the proposed method. The code is available at

https://github.com/jinlong17/DA-Detect

1. Introduction

Autonomous driving has wide applications for intelli-

gent transportation systems, such as improving the efﬁciency

in the automatic 24/7 working manner, reducing the labor

costs, enhancing the comfortableness of customers, and so

on [23, 51]. With the computer vision and artiﬁcial intel-

ligence techniques, object detection plays a critical role in

autonomous driving to understand the surrounding driving

scenarios [54, 59]. In some cases, the autonomous vehicle

might work in the complex residential and industry areas.

The diverse weather conditions might make the object de-

tection in these environments more difﬁcult. For example,

the usages of heating, gas, coal, and vehicle emissions in

residential and industry areas might be possible to generate

more frequent foggy or hazy weather, leading to a signiﬁ-

*Corresponding author

cant challenge to the object detection system installed on the

autonomous vehicle.

Many deep learning models such as Faster R-CNN [37],

YOLO [36] have demonstrated great success in autonomous

driving. However, most of these well-known methods as-

sume that the feature distributions of training and testing data

are homogeneous. Such an assumption may fail when taking

the real-world diverse weather conditions into account [40].

For example, as shown in Fig. 1, the Faster R-CNN model

trained on the clear-weather data (source domain) is capable

of detecting objects accurately under good weather, but its

performance drops signiﬁcantly when it comes to the foggy

weather (target domain). This degradation is caused by the

feature domain gap between divergent weather conditions,

as the model is unfamiliar with the feature distribution on

the target domain, while the detection performance could be

improved under the foggy weather with domain adaptation.

Domain adaptation, as a technique of transfer learning, is

to reduce the domain shift between various weathers. This

paper proposes a novel domain adaptation framework to

achieve robust object detection performance in autonomous

driving under foggy weather. As manually annotating im-

ages under adverse weathers is usually time-consuming,

our design follows an unsupervised fashion same as that

in [5, 26, 43], where clear-weather images (source domain)

are well labeled and foggy weather images (target domain)

have no annotations. Inspired by [5, 15], our method lever-

ages both image-level and object-level adaptation to diminish

the domain discrepancy in image style and object appear-

ance jointly, which is realized by involving image-level and

object-level domain classiﬁers to enable our convolutional

neural networks generating domain-invariant latent feature

representations. Speciﬁcally, the domain classiﬁers aim to

maximize the probability of distinguishing the features pro-

duced by different domains, whereas the detection model

expects to generate the domain-invariant features to confuse

the classiﬁers.

This paper also addresses two critical insights that are

ignored by previous domain adaptation methods [5, 9, 15,

26, 61]: 1) Different training samples might have different

arXiv:2210.15176v1 [cs.CV] 27 Oct 2022

(a) (b) (c)

Figure 1:

Illustration of the domain adaptive object detection for autonomous driving: (a) Faster R-CNN [37] detection under

clear weather, (b) Faster R-CNN detection under foggy weather without domain adaptation, (c) Faster R-CNN detection under

foggy weather with the proposed domain adaptation.

challenging levels to be fully harnessed during the transfer

learning, while existing works usually omit such diversity;

2) Previous domain adaptation methods only consider the

source domain and target domain for transfer learning, while

the domain-level feature metric distance to the third related

domain might be neglected. However, embedding the min-

ing for hard examples and involving an extra related domain

might potentially further enhance the model’s robust learning

capabilities, which has not been carefully explored before.

To emphasize these two insights, we propose a new Ad-

versarial Gradient Reversal Layer (AdvGRL) and generate

an auxiliary domain by data augmentation. The AdvGRL

performs adversarial mining for the hard examples to en-

hance the model learning on the challenging scenarios, and

the auxiliary domain enforces a new domain-level metric

regularization during the transfer learning. Experimental

results on the public benchmarks Cityscapes [7] and Foggy

Cityscapes [40] show the effectiveness of each proposed

component and the superior object detection performance

over the baseline and comparison methods. Overall, the

contributions of this paper are summarized as follows:

•

We propose a novel deep transfer learning based

domain adaptive object detection framework for au-

tonomous driving under foggy weather, including the

image-level and object-level adaptations, which is

trained with labeled clear-weather data and unlabeled

foggy-weather data to enhance the generalization ability

of the deep learning based object detection model.

•

We propose a new Adversarial Gradient Reversal Layer

(AdvGRL) to perform adversarial mining for the hard

examples together with the domain adaptation to further

enhance the model’s transfer learning capabilities under

challenging samples.

•

We propose a new domain-level metric regularization

during the transfer learning. By generating an auxiliary

domain with data augmentation, the domain-level met-

ric constraint between source domain, auxiliary domain,

and target domain is ensured as regularization during

the transfer learning.

2. Related Work

2.1. Object detection for autonomous driving

Recent advancement in deep learning has brought out-

standing progress in autonomous driving [6, 25, 33,53], and

object detection has been one of the most active topic under

this ﬁeld [8, 41, 45, 59]. Regarding the network architecture,

current object detection algorithms can be roughly split into

two categories: two-stage methods and single-stage methods.

Two-stage object detection algorithms typically compose

of two processes: 1) region proposal, 2) object classiﬁca-

tion and localization reﬁnement. R-CNN [14] is the ﬁrst

work for this kind of methods, it applies selective search for

regional proposals and independent CNNs for each object

prediction. Fast R-CNN [13] improves R-CNN by obtaining

object features from the shared feature map learned by one

CNN. Faster R-CNN [37] further enhances the framework

by proposing Region Proposal Network (RPN) to replace

the selective search stage. Single-stage object detection

algorithms predict object bounding boxes and classes si-

multaneously in one same stage. These methods usually

leverage pre-deﬁned anchors to classify objects and regress

bounding boxes, they are less time-consuming but less ac-

curate compared to two-stage algorithms. Milestones for

this category include SSD-series [29], YOLO-series [36]

and RetinaNet [28]. Despite their success in clear-weather

visual scenes, these object detection methods might not be

employed in autonomous driving directly due to the complex

real-world weather conditions.

2.2. Object detection for autonomous driving under

different weather

In order to address the diverse weather conditions en-

countered in autonomous driving, many datasets have been

generated [31, 32, 34, 40] and many methods have been pro-

posed [2, 17, 18, 22, 35, 42, 44] in recent years. For example,

Foggy Cityscape [40] is a synthetic dataset that applies fog

simulation to Cityscape for scene understanding in foggy

weather. TJU-DHD [32] is a diverse dataset for object de-

tection in real-world scenarios which contains variances in

terms of illumination, scene, weather and season. In this

paper, we focus on the object detection problem in foggy

weather. Huang et al. [22] propose a DSNet (Dual-Subnet

Network) that involves a detection subnet and a restoration

subnet. This network can be trained with multi-task learning

by combining visibility enhancement task and object detec-

tion task, thus outperforms pure object detectors. Hahner

et al. [17] develop a fog simulation approach to enhance

existing real lidar dataset, and show this approach can be

leveraged to improve current object detection methods in

foggy weather. Qian et al. [35] propose a MVDNet (Multi-

modal Vehicle Detection Network) that takes advantage of

lidar and radar signals to obtain proposals. Then the region-

wise features from these two sensors are fused together to

get ﬁnal detection results. Bijelic et al. [2] develop a network

that takes the data from four sensors as input: lidar, RGB

camera, gated camera, and radar. This architecture uses

entropy-steered adaptive deep fusion to get fused feature

maps for prediction. These methods typically rely on input

data from other sensors rather than RGB camera itself, which

is not the general case for many autonomous driving cars.

Thus we aim to develop an object detection architecture that

only takes RGB camera data as input in this work.

2.3. Domain adaptation for object detection

Domain adaptation reduces the discrepancy between dif-

ferent domains, thus allows the model trained on source

domain to be applicable on unlabeled target domain. Pre-

vious domain adaptation works mainly focus on the task

of image classiﬁcation [46

–

48, 56], while more and more

methods have been proposed to solve domain adaptation for

object detection in recent years [5,15,24,39,49,50,55,58,60].

Domain adaptive detectors could be obtained if the features

from different domains are aligned [5,15,18,39,49,52]. From

this perspective, Chen et al. [5] introduce a Domain Adaptive

Faster R-CNN framework to reduce domain gap from image

level and instance level, and the image-and-instance consis-

tency is subsequently employed to improve cross-domain

robustness. He et al. [18] propose a MAF (multi-adversarial

Faster R-CNN) framework to minimize the domain distri-

bution disparity by aligning domain features and proposal

features hierarchically. On the other hand, some works try

to solve domain adaptation through image style transfer

methods [21, 24, 41]. Shan et al. [41] ﬁrst convert images

from source domain to target domain with image transla-

tion module, then train the object detector with adversarial

training on target domain. Hsu et al. [21] choose to translate

images progressively, and add a weighted task loss during

adversarial training stage for tackling the problem of image

quality difference. Many previous methods [4, 27, 38, 62]

design complex architectures. [62] used multi-scale back-

bone Feature Pyramid Networks and considered pixel-level

and category-level adaptation. [27] used the complex Graph

Convolution Network and graph matching algorithms. [38]

used the similarity-based clustering and grouping. [4] uses

the uncertainty-guided self-training mechanism (Probabilis-

tic Teacher and Focal Loss) to capture the uncertainty of

unlabeled target data from a gradually evolving teacher and

guides student learning. Differently, our method does not

bring extra learnable parameters to original Faster R-CNN

model because our AdvGRL is based on adversarial training

(gradient reversal) and Domain-level Metric Regularization

is based on triplet loss. Previous domain adaptation meth-

ods usually treat training samples at the same challenging

level, while we employ advGRL for adversarial hard ex-

ample mining to improve transfer learning. Moreover, we

generate an auxiliary domain and apply domain-level metric

regularization to avoid overﬁtting.

3. Proposed Method

In this section, we will ﬁrst introduce the overall network

architecture, then describe the image-level and object-level

adaptation method, and ﬁnally, reveal the details of AdvGRL

and domain-level metric regularization.

3.1. Network Architecture

As illustrated in Fig. 2, our proposed model adopts the

pipeline in Faster R-CNN for object detection. The Con-

volutional Neural Network (CNN) backbone extracts the

image-level features from the RGB images and send them to

Region Proposal Network (RPN) to generate object propos-

als. Afterwards, the ROI pooling accepts both image-level

features and object proposals as the input to retrieve the

object-level features. Eventually, a detection head is applied

on the object-level features to produce the ﬁnal predictions.

Based on the Faster R-CNN framework, we integrate two

more components: image-level domain adaptation module,

and object-level domain adaptation module. For both mod-

ules, we deploy a new Adversarial Gradient Reversal Layer

(AdvGRL) together with the domain classiﬁer to extract

domain-invariant features and perform adversarial hard ex-

ample mining. Moreover, we involve an auxiliary domain to

impose a new domain-level metric regularization to enforce

the feature metric distance between different domains. All

three domains, i.e., source, target, and auxiliary domains,

will be employed simultaneously during the training.

3.2. Image-level Adaptation

The image-level domain representation is obtained from

the backbone feature extraction and contains rich global

information such as style, scale and illumination, which can

potentially pose signiﬁcant impacts on the detection task [5].

Therefore, a domain classiﬁer is introduced to classify the

domains of the upcoming image-level features to enhance

the image-level global alignment. The domain classiﬁer is

just a simple CNN with two convolutional layers and it will

output a prediction to identify the feature domain. We use

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DomainAdaptiveObjectDetectionforAutonomousDrivingunderFoggyWeatherJinlongLi1,RunshengXu2,JinMa1,QinZou3,JiaqiMa2,HongkaiYu1*1ClevelandStateUniversity,2UniversityofCalifornia,LosAngeles,3WuhanUniversityj.li56@vikes.csuohio.edu,h.yu19@csuohio.eduAbstractMostobjectdetectionmethodsforautonomousdrivingus...

展开>> 收起<<

Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather Jinlong Li1 Runsheng Xu2 Jin Ma1 Qin Zou3 Jiaqi Ma2 Hongkai Yu1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather Jinlong Li1 Runsheng Xu2 Jin Ma1 Qin Zou3 Jiaqi Ma2 Hongkai Yu1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: