Rethinking the Detection Head Conguration for Trac Object Detection Yi Shia Jiang Wua Shixuan Zhaoa Gangyao Gaoa Tao Dengb Hongmei

2025-04-29 0 0 546.59KB 26 页 10玖币
侵权投诉
Rethinking the Detection Head Configuration for
Traffic Object Detection
Yi Shia, Jiang Wua, Shixuan Zhaoa, Gangyao Gaoa, Tao Dengb, Hongmei
Yana,
aMOE Key Laboratory for Neuroinformation, School of Life Science and Technology,
University of Electronic Science and Technology of China, Chengdu, China
bSchool of Information Science and Technology, Southwest Jiaotong University, Chengdu,
China
Abstract
Multi-scale detection plays an important role in object detection models.
However, researchers usually feel blank on how to reasonably configure detec-
tion heads combining multi-scale features at different input resolutions. We find
that there are different matching relationships between the object distribution
and the detection head at different input resolutions. Based on the instruc-
tive findings, we propose a lightweight traffic object detection network based
on matching between detection head and object distribution, termed as MHD-
Net. It consists of three main parts. The first is the detection head and object
distribution matching strategy, which guides the rational configuration of detec-
tion head, so as to leverage multi-scale features to effectively detect objects at
vastly different scales. The second is the cross-scale detection head configura-
tion guideline, which instructs to replace multiple detection heads with only two
detection heads possessing of rich feature representations to achieve an excellent
balance between detection accuracy, model parameters, FLOPs and detection
speed. The third is the receptive field enlargement method, which combines the
dilated convolution module with shallow features of backbone to further improve
the detection accuracy at the cost of increasing model parameters very slightly.
Corresponding author.
Email addresses: yishi701@gmail.com (Yi Shi), jiangjiangbabay@gmail.com (Jiang
Wu), zhaosx@std.uestc.edu.cn (Shixuan Zhao), gangyaogao@gmail.com (Gangyao Gao),
tdeng@swjtu.edu.cn (Tao Deng), hmyan@uestc.edu.cn (Hongmei Yan )
Preprint submitted to Elsevier October 11, 2022
arXiv:2210.03883v1 [cs.CV] 8 Oct 2022
The proposed model achieves more competitive performance than other models
on BDD100K dataset and our proposed ETFOD-v2 dataset. The code will be
available.
Keywords: Traffic object detection, Detection head configuration, Deep
learning
1. Introduction
As an crucial part of intelligent driving, object detection (Chen et al., 2021a;
Liu et al., 2022a) is important for ensuring driving safety. In general, to bal-
ance the detection accuracy with FLOPs, scaling input resolution is a common
method (Doll´ar et al., 2021; Liu et al., 2022b). We review the details of detec-
tion models on BDD100K dataset (Yu et al., 2020) and surprisingly find that
different detection head can match different scale objects at different input res-
olutions. As shown in Figure 1, with low input resolution, there are a large
number of objects matched with H1 and H2 detection heads. However, with
the increase of input resolution, the number of objects matched with H1 and
H2 detection heads is significantly decreased, while those matched with H4 and
H5 detection heads is increased obviously.
This find is exciting and motivate us to rethink whether the configuration
of detection heads in existing models is optimal. For example, with low input
resolution, can three detection heads similar to that set by YOLOv5 (Jocher
et al., 2022) be used to achieve the best detection performance, especially for a
large number of small objects? With high input resolution, will five detection
heads similar to IDYOLO (Qin et al., 2022) cause redundancy of detection heads
and increase the difficulty of model optimization? Furthermore, for different
input resolutions, how to configure appropriate detection heads to achieve better
detection performance for traffic objects at very varied scales?
On the other hand, to reduce model parameters and improve detection speed,
it is a common method to employ fewer detection heads. Yolov3-tiny (Redmon
& Farhadi, 2018) and YOLOv4-tiny (Wang et al., 2021a) used H4 and H5 detec-
2
.
Figure 1: The matching relationships between detection head and object distri-
bution at different input resolution on the BDD100K training set. H1, H2, H3,
H4 and H5 represent the detection heads corresponding to feature maps with
the down-sampling rate of 2, 4, 8, 16 and 32, respectively.
tion heads to detect objects. This method reduced the model parameters and
improved the detection speed, but sacrificed the detection accuracy. YOLOF
(Chen et al., 2021b) utilized single H5 detection head, which decreased model
parameters and increased detection accuracy, but it is not friendly to detection
of small objects. CornerNet (Law & Deng, 2018) and CenterNet (Duan et al.,
2019) achieved competitive performance with single H2 detection head, but de-
tection performance for large objects may be inadequate. For traffic scenes
with limited computing resources and different scale objects, how to configure
reasonable detection heads to achieve an excellent balance between detection
accuracy, model parameters, FLOPs and detection speed is a problem worthy
of study.
To deal with the problems mentioned above, we first conduct a preliminary
study on the influence of the configuration of the detection heads on the detec-
tion performance at different input resolutions on the BDD100K dataset. The
3
experimental results show that the detection accuracy obtained by three detec-
tion heads (H3-5) is lower than that obtained by four detection heads (H2-5)
with low resolution of 416. With high resolution of 1504, the detection accu-
racy achieved by H3-5 detection heads is higher than that achieved by H2-5
detection heads. We believe that this totally opposite result is mainly caused
by the mismatch between the detection head and object distribution. With low
input resolution, a large number of small objects match with detection heads
corresponding to high resolution feature maps. In this case, only three detection
heads cannot effectively detect a large number of small objects. In accordance
with cognition, the addition of H2 detection head is beneficial to small object
detection. With high input resolution, the object scale becomes larger as a
whole, and three detection heads can almost match all objects. In this case,
the use of more detection heads may cause detection head redundancy, which
is not conducive to model optimization. To alleviate the detection performance
degradation caused by the mismatch, based on above findings, we have made
the following three contributions:
1) We propose an applicable matching strategy of detection head and ob-
ject distribution. The proposed matching strategy can guide us to configure
detection heads reasonably for detecting different scale objects.
2) Further, a simple and effective cross-scale detection head configuration
guideline is presented. Based on this guideline, multiple detection heads can
be replaced by two detection heads, which can significantly reduce model pa-
rameters and FLOPs as well as improve detection speed while maintain high
detection accuracy.
3) Combining the dilated convolution module with shallow features of back-
bone, we construct a lightweight traffic object detection network, termed as
MHD-Net. Experimental results show that our proposed model achieves more
competitive performance than other models on BDD100K dataset and our pro-
posed ETFOD-v2 dataset.
4
2. Related works
2.1. Multi-scale detection
Multi-scale detection plays an important role in object detection models.
The models represented by VIT-YOLO (Zhang et al., 2021c), FCOS (Tian et al.,
2020b), VFNet (Zhang et al., 2021a) and GFL (Li et al., 2020) utilized five dif-
ferent feature levels to fuse contextual information, and thus achieved impres-
sive detection performance. Considering that the features of small objects are
easy missing in the down-sampling process, Libra-RCNN (Pang et al., 2019),
OPANAS (Liang et al., 2021), ABFPN (Zeng et al., 2022) and GCA RCNN
(Zhang et al., 2021b) integrated four different feature levels to construct the
detection model for detection of objects under varied scales. YOLOv5 (Jocher
et al., 2022), YOLOv7 (Wang et al., 2022a), CSPNet (Wang et al., 2020) and
YOLOv4-P5 (Wang et al., 2021a) only employed the features of three scales
to represent the different scale objects, and also achieve excellent performance.
Different from the above multi-scale representation, YOLOv3-Tiny (Redmon &
Farhadi, 2018), YOLOv4-Tiny (Wang et al., 2021a) and YOEO (Vahl et al.,
2021) integrated the features of the two scales for object detection, so as to
achieve the balance between detection accuracy, model parameters and detec-
tion speed. Although these models have achieved impressive performance, they
do not fully consider the matching relationships between the object distribution
and the detection head at different resolutions. In our proposed method, based
on the instructive findings as presented in previous section, detection heads can
be simply and reasonably configured according to the matching relationship,
and more appropriate and representative multi-scale information can be fully
utilized to detect objects at different scales.
2.2. Traffic object detection
With the rise of intelligent driving, traffic object detection has received more
and more attention. Yang et al. (Yang et al., 2020) proposed a Part-Aware
Multi-Scale Fully Convolutional Network with two detection heads for detecting
5
摘要:

RethinkingtheDetectionHeadCon gurationforTracObjectDetectionYiShia,JiangWua,ShixuanZhaoa,GangyaoGaoa,TaoDengb,HongmeiYana,aMOEKeyLaboratoryforNeuroinformation,SchoolofLifeScienceandTechnology,UniversityofElectronicScienceandTechnologyofChina,Chengdu,ChinabSchoolofInformationScienceandTechnology,So...

展开>> 收起<<
Rethinking the Detection Head Conguration for Trac Object Detection Yi Shia Jiang Wua Shixuan Zhaoa Gangyao Gaoa Tao Dengb Hongmei.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:546.59KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注