Rethinking the Detection Head Conguration for Trac Object Detection Yi Shia Jiang Wua Shixuan Zhaoa Gangyao Gaoa Tao Dengb Hongmei

2025-04-29 0 0 546.59KB 26 页 10玖币

侵权投诉

Rethinking the Detection Head Conﬁguration for

Traﬃc Object Detection

Yi Shia, Jiang Wua, Shixuan Zhaoa, Gangyao Gaoa, Tao Dengb, Hongmei

Yana,∗

aMOE Key Laboratory for Neuroinformation, School of Life Science and Technology,

University of Electronic Science and Technology of China, Chengdu, China

bSchool of Information Science and Technology, Southwest Jiaotong University, Chengdu,

China

Abstract

Multi-scale detection plays an important role in object detection models.

However, researchers usually feel blank on how to reasonably conﬁgure detec-

tion heads combining multi-scale features at diﬀerent input resolutions. We ﬁnd

that there are diﬀerent matching relationships between the object distribution

and the detection head at diﬀerent input resolutions. Based on the instruc-

tive ﬁndings, we propose a lightweight traﬃc object detection network based

on matching between detection head and object distribution, termed as MHD-

Net. It consists of three main parts. The ﬁrst is the detection head and object

distribution matching strategy, which guides the rational conﬁguration of detec-

tion head, so as to leverage multi-scale features to eﬀectively detect objects at

vastly diﬀerent scales. The second is the cross-scale detection head conﬁgura-

tion guideline, which instructs to replace multiple detection heads with only two

detection heads possessing of rich feature representations to achieve an excellent

balance between detection accuracy, model parameters, FLOPs and detection

speed. The third is the receptive ﬁeld enlargement method, which combines the

dilated convolution module with shallow features of backbone to further improve

the detection accuracy at the cost of increasing model parameters very slightly.

∗Corresponding author.

Email addresses: yishi701@gmail.com (Yi Shi), jiangjiangbabay@gmail.com (Jiang

Wu), zhaosx@std.uestc.edu.cn (Shixuan Zhao), gangyaogao@gmail.com (Gangyao Gao),

tdeng@swjtu.edu.cn (Tao Deng), hmyan@uestc.edu.cn (Hongmei Yan )

Preprint submitted to Elsevier October 11, 2022

arXiv:2210.03883v1 [cs.CV] 8 Oct 2022

The proposed model achieves more competitive performance than other models

on BDD100K dataset and our proposed ETFOD-v2 dataset. The code will be

available.

Keywords: Traﬃc object detection, Detection head conﬁguration, Deep

learning

1. Introduction

As an crucial part of intelligent driving, object detection (Chen et al., 2021a;

Liu et al., 2022a) is important for ensuring driving safety. In general, to bal-

ance the detection accuracy with FLOPs, scaling input resolution is a common

method (Doll´ar et al., 2021; Liu et al., 2022b). We review the details of detec-

tion models on BDD100K dataset (Yu et al., 2020) and surprisingly ﬁnd that

diﬀerent detection head can match diﬀerent scale objects at diﬀerent input res-

olutions. As shown in Figure 1, with low input resolution, there are a large

number of objects matched with H1 and H2 detection heads. However, with

the increase of input resolution, the number of objects matched with H1 and

H2 detection heads is signiﬁcantly decreased, while those matched with H4 and

H5 detection heads is increased obviously.

This ﬁnd is exciting and motivate us to rethink whether the conﬁguration

of detection heads in existing models is optimal. For example, with low input

resolution, can three detection heads similar to that set by YOLOv5 (Jocher

et al., 2022) be used to achieve the best detection performance, especially for a

large number of small objects? With high input resolution, will ﬁve detection

heads similar to IDYOLO (Qin et al., 2022) cause redundancy of detection heads

and increase the diﬃculty of model optimization? Furthermore, for diﬀerent

input resolutions, how to conﬁgure appropriate detection heads to achieve better

detection performance for traﬃc objects at very varied scales?

On the other hand, to reduce model parameters and improve detection speed,

it is a common method to employ fewer detection heads. Yolov3-tiny (Redmon

& Farhadi, 2018) and YOLOv4-tiny (Wang et al., 2021a) used H4 and H5 detec-

Figure 1: The matching relationships between detection head and object distri-

bution at diﬀerent input resolution on the BDD100K training set. H1, H2, H3,

H4 and H5 represent the detection heads corresponding to feature maps with

the down-sampling rate of 2, 4, 8, 16 and 32, respectively.

tion heads to detect objects. This method reduced the model parameters and

improved the detection speed, but sacriﬁced the detection accuracy. YOLOF

(Chen et al., 2021b) utilized single H5 detection head, which decreased model

parameters and increased detection accuracy, but it is not friendly to detection

of small objects. CornerNet (Law & Deng, 2018) and CenterNet (Duan et al.,

2019) achieved competitive performance with single H2 detection head, but de-

tection performance for large objects may be inadequate. For traﬃc scenes

with limited computing resources and diﬀerent scale objects, how to conﬁgure

reasonable detection heads to achieve an excellent balance between detection

accuracy, model parameters, FLOPs and detection speed is a problem worthy

of study.

To deal with the problems mentioned above, we ﬁrst conduct a preliminary

study on the inﬂuence of the conﬁguration of the detection heads on the detec-

tion performance at diﬀerent input resolutions on the BDD100K dataset. The

experimental results show that the detection accuracy obtained by three detec-

tion heads (H3-5) is lower than that obtained by four detection heads (H2-5)

with low resolution of 416. With high resolution of 1504, the detection accu-

racy achieved by H3-5 detection heads is higher than that achieved by H2-5

detection heads. We believe that this totally opposite result is mainly caused

by the mismatch between the detection head and object distribution. With low

input resolution, a large number of small objects match with detection heads

corresponding to high resolution feature maps. In this case, only three detection

heads cannot eﬀectively detect a large number of small objects. In accordance

with cognition, the addition of H2 detection head is beneﬁcial to small object

detection. With high input resolution, the object scale becomes larger as a

whole, and three detection heads can almost match all objects. In this case,

the use of more detection heads may cause detection head redundancy, which

is not conducive to model optimization. To alleviate the detection performance

degradation caused by the mismatch, based on above ﬁndings, we have made

the following three contributions:

1) We propose an applicable matching strategy of detection head and ob-

ject distribution. The proposed matching strategy can guide us to conﬁgure

detection heads reasonably for detecting diﬀerent scale objects.

2) Further, a simple and eﬀective cross-scale detection head conﬁguration

guideline is presented. Based on this guideline, multiple detection heads can

be replaced by two detection heads, which can signiﬁcantly reduce model pa-

rameters and FLOPs as well as improve detection speed while maintain high

detection accuracy.

3) Combining the dilated convolution module with shallow features of back-

bone, we construct a lightweight traﬃc object detection network, termed as

MHD-Net. Experimental results show that our proposed model achieves more

competitive performance than other models on BDD100K dataset and our pro-

posed ETFOD-v2 dataset.

2. Related works

2.1. Multi-scale detection

Multi-scale detection plays an important role in object detection models.

The models represented by VIT-YOLO (Zhang et al., 2021c), FCOS (Tian et al.,

2020b), VFNet (Zhang et al., 2021a) and GFL (Li et al., 2020) utilized ﬁve dif-

ferent feature levels to fuse contextual information, and thus achieved impres-

sive detection performance. Considering that the features of small objects are

easy missing in the down-sampling process, Libra-RCNN (Pang et al., 2019),

OPANAS (Liang et al., 2021), ABFPN (Zeng et al., 2022) and GCA RCNN

(Zhang et al., 2021b) integrated four diﬀerent feature levels to construct the

detection model for detection of objects under varied scales. YOLOv5 (Jocher

et al., 2022), YOLOv7 (Wang et al., 2022a), CSPNet (Wang et al., 2020) and

YOLOv4-P5 (Wang et al., 2021a) only employed the features of three scales

to represent the diﬀerent scale objects, and also achieve excellent performance.

Diﬀerent from the above multi-scale representation, YOLOv3-Tiny (Redmon &

Farhadi, 2018), YOLOv4-Tiny (Wang et al., 2021a) and YOEO (Vahl et al.,

2021) integrated the features of the two scales for object detection, so as to

achieve the balance between detection accuracy, model parameters and detec-

tion speed. Although these models have achieved impressive performance, they

do not fully consider the matching relationships between the object distribution

and the detection head at diﬀerent resolutions. In our proposed method, based

on the instructive ﬁndings as presented in previous section, detection heads can

be simply and reasonably conﬁgured according to the matching relationship,

and more appropriate and representative multi-scale information can be fully

utilized to detect objects at diﬀerent scales.

2.2. Traﬃc object detection

With the rise of intelligent driving, traﬃc object detection has received more

and more attention. Yang et al. (Yang et al., 2020) proposed a Part-Aware

Multi-Scale Fully Convolutional Network with two detection heads for detecting

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RethinkingtheDetectionHeadCongurationforTracObjectDetectionYiShia,JiangWua,ShixuanZhaoa,GangyaoGaoa,TaoDengb,HongmeiYana,aMOEKeyLaboratoryforNeuroinformation,SchoolofLifeScienceandTechnology,UniversityofElectronicScienceandTechnologyofChina,Chengdu,ChinabSchoolofInformationScienceandTechnology,So...

展开>> 收起<<

Rethinking the Detection Head Conguration for Trac Object Detection Yi Shia Jiang Wua Shixuan Zhaoa Gangyao Gaoa Tao Dengb Hongmei.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Rethinking the Detection Head Conguration for Trac Object Detection Yi Shia Jiang Wua Shixuan Zhaoa Gangyao Gaoa Tao Dengb Hongmei

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: