Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather Jinlong Li1 Runsheng Xu2 Jin Ma1 Qin Zou3 Jiaqi Ma2 Hongkai Yu1

2025-05-03 0 0 1.71MB 11 页 10玖币
侵权投诉
Domain Adaptive Object Detection for Autonomous Driving under Foggy
Weather
Jinlong Li1, Runsheng Xu2, Jin Ma1, Qin Zou3, Jiaqi Ma2, Hongkai Yu1*
1Cleveland State University, 2University of California, Los Angeles, 3Wuhan University
j.li56@vikes.csuohio.edu, h.yu19@csuohio.edu
Abstract
Most object detection methods for autonomous driving
usually assume a consistent feature distribution between
training and testing data, which is not always the case when
weathers differ significantly. The object detection model
trained under clear weather might be not effective enough
on the foggy weather because of the domain gap. This paper
proposes a novel domain adaptive object detection frame-
work for autonomous driving under foggy weather. Our
method leverages both image-level and object-level adap-
tation to diminish the domain discrepancy in image style
and object appearance. To further enhance the model’s
capabilities under challenging samples, we also come up
with a new adversarial gradient reversal layer to perform
adversarial mining for the hard examples together with do-
main adaptation. Moreover, we propose to generate an
auxiliary domain by data augmentation to enforce a new
domain-level metric regularization. Experimental results
on public benchmarks show the effectiveness and accu-
racy of the proposed method. The code is available at
https://github.com/jinlong17/DA-Detect
.
1. Introduction
Autonomous driving has wide applications for intelli-
gent transportation systems, such as improving the efficiency
in the automatic 24/7 working manner, reducing the labor
costs, enhancing the comfortableness of customers, and so
on [23, 51]. With the computer vision and artificial intel-
ligence techniques, object detection plays a critical role in
autonomous driving to understand the surrounding driving
scenarios [54, 59]. In some cases, the autonomous vehicle
might work in the complex residential and industry areas.
The diverse weather conditions might make the object de-
tection in these environments more difficult. For example,
the usages of heating, gas, coal, and vehicle emissions in
residential and industry areas might be possible to generate
more frequent foggy or hazy weather, leading to a signifi-
*Corresponding author
cant challenge to the object detection system installed on the
autonomous vehicle.
Many deep learning models such as Faster R-CNN [37],
YOLO [36] have demonstrated great success in autonomous
driving. However, most of these well-known methods as-
sume that the feature distributions of training and testing data
are homogeneous. Such an assumption may fail when taking
the real-world diverse weather conditions into account [40].
For example, as shown in Fig. 1, the Faster R-CNN model
trained on the clear-weather data (source domain) is capable
of detecting objects accurately under good weather, but its
performance drops significantly when it comes to the foggy
weather (target domain). This degradation is caused by the
feature domain gap between divergent weather conditions,
as the model is unfamiliar with the feature distribution on
the target domain, while the detection performance could be
improved under the foggy weather with domain adaptation.
Domain adaptation, as a technique of transfer learning, is
to reduce the domain shift between various weathers. This
paper proposes a novel domain adaptation framework to
achieve robust object detection performance in autonomous
driving under foggy weather. As manually annotating im-
ages under adverse weathers is usually time-consuming,
our design follows an unsupervised fashion same as that
in [5, 26, 43], where clear-weather images (source domain)
are well labeled and foggy weather images (target domain)
have no annotations. Inspired by [5, 15], our method lever-
ages both image-level and object-level adaptation to diminish
the domain discrepancy in image style and object appear-
ance jointly, which is realized by involving image-level and
object-level domain classifiers to enable our convolutional
neural networks generating domain-invariant latent feature
representations. Specifically, the domain classifiers aim to
maximize the probability of distinguishing the features pro-
duced by different domains, whereas the detection model
expects to generate the domain-invariant features to confuse
the classifiers.
This paper also addresses two critical insights that are
ignored by previous domain adaptation methods [5, 9, 15,
26, 61]: 1) Different training samples might have different
arXiv:2210.15176v1 [cs.CV] 27 Oct 2022
(a) (b) (c)
Figure 1:
Illustration of the domain adaptive object detection for autonomous driving: (a) Faster R-CNN [37] detection under
clear weather, (b) Faster R-CNN detection under foggy weather without domain adaptation, (c) Faster R-CNN detection under
foggy weather with the proposed domain adaptation.
challenging levels to be fully harnessed during the transfer
learning, while existing works usually omit such diversity;
2) Previous domain adaptation methods only consider the
source domain and target domain for transfer learning, while
the domain-level feature metric distance to the third related
domain might be neglected. However, embedding the min-
ing for hard examples and involving an extra related domain
might potentially further enhance the model’s robust learning
capabilities, which has not been carefully explored before.
To emphasize these two insights, we propose a new Ad-
versarial Gradient Reversal Layer (AdvGRL) and generate
an auxiliary domain by data augmentation. The AdvGRL
performs adversarial mining for the hard examples to en-
hance the model learning on the challenging scenarios, and
the auxiliary domain enforces a new domain-level metric
regularization during the transfer learning. Experimental
results on the public benchmarks Cityscapes [7] and Foggy
Cityscapes [40] show the effectiveness of each proposed
component and the superior object detection performance
over the baseline and comparison methods. Overall, the
contributions of this paper are summarized as follows:
We propose a novel deep transfer learning based
domain adaptive object detection framework for au-
tonomous driving under foggy weather, including the
image-level and object-level adaptations, which is
trained with labeled clear-weather data and unlabeled
foggy-weather data to enhance the generalization ability
of the deep learning based object detection model.
We propose a new Adversarial Gradient Reversal Layer
(AdvGRL) to perform adversarial mining for the hard
examples together with the domain adaptation to further
enhance the model’s transfer learning capabilities under
challenging samples.
We propose a new domain-level metric regularization
during the transfer learning. By generating an auxiliary
domain with data augmentation, the domain-level met-
ric constraint between source domain, auxiliary domain,
and target domain is ensured as regularization during
the transfer learning.
2. Related Work
2.1. Object detection for autonomous driving
Recent advancement in deep learning has brought out-
standing progress in autonomous driving [6, 25, 33,53], and
object detection has been one of the most active topic under
this field [8, 41, 45, 59]. Regarding the network architecture,
current object detection algorithms can be roughly split into
two categories: two-stage methods and single-stage methods.
Two-stage object detection algorithms typically compose
of two processes: 1) region proposal, 2) object classifica-
tion and localization refinement. R-CNN [14] is the first
work for this kind of methods, it applies selective search for
regional proposals and independent CNNs for each object
prediction. Fast R-CNN [13] improves R-CNN by obtaining
object features from the shared feature map learned by one
CNN. Faster R-CNN [37] further enhances the framework
by proposing Region Proposal Network (RPN) to replace
the selective search stage. Single-stage object detection
algorithms predict object bounding boxes and classes si-
multaneously in one same stage. These methods usually
leverage pre-defined anchors to classify objects and regress
bounding boxes, they are less time-consuming but less ac-
curate compared to two-stage algorithms. Milestones for
this category include SSD-series [29], YOLO-series [36]
and RetinaNet [28]. Despite their success in clear-weather
visual scenes, these object detection methods might not be
employed in autonomous driving directly due to the complex
real-world weather conditions.
2.2. Object detection for autonomous driving under
different weather
In order to address the diverse weather conditions en-
countered in autonomous driving, many datasets have been
generated [31, 32, 34, 40] and many methods have been pro-
posed [2, 17, 18, 22, 35, 42, 44] in recent years. For example,
Foggy Cityscape [40] is a synthetic dataset that applies fog
simulation to Cityscape for scene understanding in foggy
weather. TJU-DHD [32] is a diverse dataset for object de-
tection in real-world scenarios which contains variances in
terms of illumination, scene, weather and season. In this
paper, we focus on the object detection problem in foggy
weather. Huang et al. [22] propose a DSNet (Dual-Subnet
Network) that involves a detection subnet and a restoration
subnet. This network can be trained with multi-task learning
by combining visibility enhancement task and object detec-
tion task, thus outperforms pure object detectors. Hahner
et al. [17] develop a fog simulation approach to enhance
existing real lidar dataset, and show this approach can be
leveraged to improve current object detection methods in
foggy weather. Qian et al. [35] propose a MVDNet (Multi-
modal Vehicle Detection Network) that takes advantage of
lidar and radar signals to obtain proposals. Then the region-
wise features from these two sensors are fused together to
get final detection results. Bijelic et al. [2] develop a network
that takes the data from four sensors as input: lidar, RGB
camera, gated camera, and radar. This architecture uses
entropy-steered adaptive deep fusion to get fused feature
maps for prediction. These methods typically rely on input
data from other sensors rather than RGB camera itself, which
is not the general case for many autonomous driving cars.
Thus we aim to develop an object detection architecture that
only takes RGB camera data as input in this work.
2.3. Domain adaptation for object detection
Domain adaptation reduces the discrepancy between dif-
ferent domains, thus allows the model trained on source
domain to be applicable on unlabeled target domain. Pre-
vious domain adaptation works mainly focus on the task
of image classification [46
48, 56], while more and more
methods have been proposed to solve domain adaptation for
object detection in recent years [5,15,24,39,49,50,55,58,60].
Domain adaptive detectors could be obtained if the features
from different domains are aligned [5,15,18,39,49,52]. From
this perspective, Chen et al. [5] introduce a Domain Adaptive
Faster R-CNN framework to reduce domain gap from image
level and instance level, and the image-and-instance consis-
tency is subsequently employed to improve cross-domain
robustness. He et al. [18] propose a MAF (multi-adversarial
Faster R-CNN) framework to minimize the domain distri-
bution disparity by aligning domain features and proposal
features hierarchically. On the other hand, some works try
to solve domain adaptation through image style transfer
methods [21, 24, 41]. Shan et al. [41] first convert images
from source domain to target domain with image transla-
tion module, then train the object detector with adversarial
training on target domain. Hsu et al. [21] choose to translate
images progressively, and add a weighted task loss during
adversarial training stage for tackling the problem of image
quality difference. Many previous methods [4, 27, 38, 62]
design complex architectures. [62] used multi-scale back-
bone Feature Pyramid Networks and considered pixel-level
and category-level adaptation. [27] used the complex Graph
Convolution Network and graph matching algorithms. [38]
used the similarity-based clustering and grouping. [4] uses
the uncertainty-guided self-training mechanism (Probabilis-
tic Teacher and Focal Loss) to capture the uncertainty of
unlabeled target data from a gradually evolving teacher and
guides student learning. Differently, our method does not
bring extra learnable parameters to original Faster R-CNN
model because our AdvGRL is based on adversarial training
(gradient reversal) and Domain-level Metric Regularization
is based on triplet loss. Previous domain adaptation meth-
ods usually treat training samples at the same challenging
level, while we employ advGRL for adversarial hard ex-
ample mining to improve transfer learning. Moreover, we
generate an auxiliary domain and apply domain-level metric
regularization to avoid overfitting.
3. Proposed Method
In this section, we will first introduce the overall network
architecture, then describe the image-level and object-level
adaptation method, and finally, reveal the details of AdvGRL
and domain-level metric regularization.
3.1. Network Architecture
As illustrated in Fig. 2, our proposed model adopts the
pipeline in Faster R-CNN for object detection. The Con-
volutional Neural Network (CNN) backbone extracts the
image-level features from the RGB images and send them to
Region Proposal Network (RPN) to generate object propos-
als. Afterwards, the ROI pooling accepts both image-level
features and object proposals as the input to retrieve the
object-level features. Eventually, a detection head is applied
on the object-level features to produce the final predictions.
Based on the Faster R-CNN framework, we integrate two
more components: image-level domain adaptation module,
and object-level domain adaptation module. For both mod-
ules, we deploy a new Adversarial Gradient Reversal Layer
(AdvGRL) together with the domain classifier to extract
domain-invariant features and perform adversarial hard ex-
ample mining. Moreover, we involve an auxiliary domain to
impose a new domain-level metric regularization to enforce
the feature metric distance between different domains. All
three domains, i.e., source, target, and auxiliary domains,
will be employed simultaneously during the training.
3.2. Image-level Adaptation
The image-level domain representation is obtained from
the backbone feature extraction and contains rich global
information such as style, scale and illumination, which can
potentially pose significant impacts on the detection task [5].
Therefore, a domain classifier is introduced to classify the
domains of the upcoming image-level features to enhance
the image-level global alignment. The domain classifier is
just a simple CNN with two convolutional layers and it will
output a prediction to identify the feature domain. We use
摘要:

DomainAdaptiveObjectDetectionforAutonomousDrivingunderFoggyWeatherJinlongLi1,RunshengXu2,JinMa1,QinZou3,JiaqiMa2,HongkaiYu1*1ClevelandStateUniversity,2UniversityofCalifornia,LosAngeles,3WuhanUniversityj.li56@vikes.csuohio.edu,h.yu19@csuohio.eduAbstractMostobjectdetectionmethodsforautonomousdrivingus...

展开>> 收起<<
Domain Adaptive Object Detection for Autonomous Driving under Foggy Weather Jinlong Li1 Runsheng Xu2 Jin Ma1 Qin Zou3 Jiaqi Ma2 Hongkai Yu1.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.71MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注