Precise Single-stage Detector Aisha Chandioy1 Gong Guiy1 Teerath Kumary2 Irfan Ullah3 Ramin Ranjbarzadeh4 Arunabha M Roy5 Akhtar Hussain6 and Yao Shen1

2025-05-02 0 0 4.08MB 33 页 10玖币
侵权投诉
Precise Single-stage Detector
Aisha Chandio 1, Gong Gui1, Teerath Kumar2, Irfan Ullah 3, Ramin
Ranjbarzadeh4, Arunabha M Roy5, Akhtar Hussain6, and Yao Shen 1
1Department of Computer Science and Engineering, Shanghai Jiao Tong
University, Shanghai, 200240, China.
2Department of Software Engineering, School of Computing, National
University of computer and emerging sciences, Islamabad 44000, Pakistan
3School of Electrical Engineering and Computer Science (SEECS), National
University of Sciences and Technology (NUST), Islamabad, 44000, Pakistan
4School of Computing, Faculty of Engineering and Computing, Dublin City
University, Dublin ,Ireland.
5Aerospace Engineering Department, University of Michigan, Ann Arbor, MI
48109, USA
6Department of Information Technology, Quaid-e-Awam University of
Engineering, Science, and Technology, Nawabshah,Pakistan
These authors contributed equally to this work.
Corresponding author. Email addresses: yshen@cs.sjtu.edu.cn
1
arXiv:2210.04252v1 [cs.CV] 9 Oct 2022
2
Abstract
Background and objectives: Deep learning (DL) logarithms have shown an impressive performance
in various tasks. Among them, Single-stage object detectors (SSD) mainly depends on
classification network to extract features, multiple feature maps to predict, and classification
confidence to guide the filtration of the overlapping prediction boxes. However, there are
still two problems causing some inaccurate results: (1) In the process of feature extraction,
with the layer-by-layer acquisition of semantic information, local information is gradually lost,
resulting into less representative feature maps; (2) During the Non-Maximum Suppression
(NMS) algorithm due to inconsistency in classification and regression tasks, the classification
confidence and predicted detection position cannot accurately indicate the position of the
prediction boxes. Methods: In order to address these aforementioned issues, we propose a
new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise
Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to
SSD. Secondly, we construct a simple and effective feature enhancement module to expand
the receptive field step by step for each layer and enhance its local and semantic information.
Finally, we design a more efficient loss function to predict the IOU between the prediction boxes
and ground truth boxes, and the threshold IOU guides classification training and attenuates
the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above
optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically,
with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45
FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007
outperforming state-of-the-art object detection models. Besides, the proposed model performs
significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27
FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results
prove that the proposed model has a better trade-off between speed and accuracy.
Keywords: Objection Detection(OD); Precise Single Stage Detector (PSSD); Deep-Convolutional
Neural Networks (D-CNNs); Deep Learning (DL); Machine Learning (ML).
3
1. Introduction :
In recent years, deep learning algorithms have became a powerful tool which can automatically
capture nonlinear and hierarchical features that has shown great success on various applications,
in particular, image domains such as classification, segmentation, detection, captioning, and
various others [
1
,
2
,
3
,
4
,
5
,
6
]. Furthermore, it has been extended for different classification tasks
including audio classification [
7
,
8
,
7
,
9
,
10
], text classification [
11
,
12
,
13
,
14
,
15
,
16
], various
signals classification [
17
,
18
,
19
,
20
], and multi-modality object classification [
21
,
22
,
23
], event
detection [
24
,
25
], and various other applications [
26
,
27
,
28
]. Among them, object detection
[
29
,
30
,
31
,
32
,
33
,
34
] has been a central interest to vast majority of researchers. To this end,
there are various algorithms such as YOLO, fast-RCNN, faster-RCNN, and others that have
been successfully used for object detection over the years. Deep learning algorithms for object
detection has gained significant attention over the last few decades [
1
,
2
,
3
]. Object detection,
aiming at locating object instances from a large number of predefined categories in natural
images, is one of the most basic and challenging problems in computer vision [
35
,
36
]. With
the rapid development of CNN, object detection has made remarkable progress and gradually
turns into two main architectures: two-stage and single-stage. Two-stage algorithms such as
Faster Recurrent Neural Network (FRCNN) [
37
,
38
] where the first stage only distinguishes a
large number of background regions and obtains rough object proposals without considering the
specific class of the object. It has then followed by the second stage that classifies each proposal
and optimizes the location according to the features extracted from CNN network [
39
,
40
]. Due
to the existence of refinement conducted by the second stage, two-stage algorithms cannot
achieve performance in real-time [
40
,
38
]. Therefore, single-stage algorithms have been the
major priority for various object detection applications due to real-time detection [
35
,
36
] and
thus, it is the particular interest of the current work. The single-stage algorithms directly
perform classification and location optimization based on the default boxes [
40
,
38
]. For
example, You Look Only Once (YOLO) [
41
] and SSD [
42
] have achieved fast real-time detection
speed but simultaneously sacrificed detection accuracy. In recent years, single-stage detectors
are improving their accuracy, but still, they cannot achieve a better trade-off between speed
4
and accuracy [38].
In this paper, on the premise of guaranteeing the real-time performance of the model, we
propose a novel architecture named Precise Single Stage Detector (PSSD) based on the original
SSD that provides solutions to the two key questions: (1) How to enrich the information of
features used by predictors without relying on a deep backbone of the model like ResNet-101 [
43
]?
(2) Whether it is reasonable to rely on classification confidence to determine the filtration of
overlapping boxes in the process of the NMS algorithm? In the next subsections, we will address
the aforementioned questions in order to establish the main challenges and corresponding
improvements necessary that we have implemented in our proposed PPSD model.
1.1 Feature richness:
Considering the significant overhead caused by the image pyramid, SSD [
42
] puts forward
a feature pyramid to solve the problem of multi-scale detection. The deep features in the
classification network contain more semantic information, which is suitable for identifying large
objects, while the shallow low-level features are more suitable for small objects. However, the
lack of semantic information in shallow features and the loss of local details in deep features
deteriorate the precision of SSD. Feature Pyramid Networks (FPN) [
44
] adds deep semantic
information to shallow features using a U-shaped structure to obtain a more effective feature
pyramid, which improves the effect of small object detection. DetNet [
45
] combines dilated
convolution to reduce local information loss by decreasing the down-sampling steps so that the
positioning accuracy of large objects is improved. From the above-mentioned networks, it is
noted that the features used in each scale predictor need not only suitable semantic information
but also local texture information for more accurate positioning. The information richness of
each level feature is significant to the detection effect. But the problem is how to construct a
high-performance feature pyramid with as little overhead as possible.
1.2 Filtration of overlapping boxes:
Usually, to prevent the results from overlapping, we set the NMS as the final operation
5
Ground Truth
IO U = 0.8 Pcl s=0.7
IO U = 0.6 Pcl s=0.8
Figure 1: A example of test results without NMS, the blue box will be retained because of
higher classification confidence.
of object detection. In the NMS algorithm, the prediction boxes with the highest classification
confidence are retained and other boxes are filtered when IOU between the two is greater than
the threshold. As shown in Figure 1, this may lead to inaccurate results. IOUNet [
46
] guides
the NMS to alleviate this problem by predicting the IOU between the regression boxes and their
ground truth. Herein, the key question arises - how to make it more effective in single-stage
algorithms?
1.3 Contribution:
To improve the detection effect, we mitigate the above problems based on the SSD model. Firstly,
we improve features by introducing extra layers in the SSD to make the basic feature pyramid
摘要:

PreciseSingle-stageDetectorAishaChandioy1,GongGuiy1,TeerathKumary2,IrfanUllah3,RaminRanjbarzadeh4,ArunabhaMRoy5,AkhtarHussain6,andYaoShen*11DepartmentofComputerScienceandEngineering,ShanghaiJiaoTongUniversity,Shanghai,200240,China.2DepartmentofSoftwareEngineering,SchoolofComputing,NationalUniversity...

展开>> 收起<<
Precise Single-stage Detector Aisha Chandioy1 Gong Guiy1 Teerath Kumary2 Irfan Ullah3 Ramin Ranjbarzadeh4 Arunabha M Roy5 Akhtar Hussain6 and Yao Shen1.pdf

共33页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:33 页 大小:4.08MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 33
客服
关注