Precise Single-stage Detector Aisha Chandioy1 Gong Guiy1 Teerath Kumary2 Irfan Ullah3 Ramin Ranjbarzadeh4 Arunabha M Roy5 Akhtar Hussain6 and Yao Shen1

2025-05-02 0 0 4.08MB 33 页 10玖币

侵权投诉

Precise Single-stage Detector

Aisha Chandio †1, Gong Gui†1, Teerath Kumar†2, Irfan Ullah 3, Ramin

Ranjbarzadeh4, Arunabha M Roy5, Akhtar Hussain6, and Yao Shen ∗1

1Department of Computer Science and Engineering, Shanghai Jiao Tong

University, Shanghai, 200240, China.

2Department of Software Engineering, School of Computing, National

University of computer and emerging sciences, Islamabad 44000, Pakistan

3School of Electrical Engineering and Computer Science (SEECS), National

University of Sciences and Technology (NUST), Islamabad, 44000, Pakistan

4School of Computing, Faculty of Engineering and Computing, Dublin City

University, Dublin ,Ireland.

5Aerospace Engineering Department, University of Michigan, Ann Arbor, MI

48109, USA

6Department of Information Technology, Quaid-e-Awam University of

Engineering, Science, and Technology, Nawabshah,Pakistan

†These authors contributed equally to this work.

∗Corresponding author. Email addresses: yshen@cs.sjtu.edu.cn

arXiv:2210.04252v1 [cs.CV] 9 Oct 2022

Abstract

Background and objectives: Deep learning (DL) logarithms have shown an impressive performance

in various tasks. Among them, Single-stage object detectors (SSD) mainly depends on

classification network to extract features, multiple feature maps to predict, and classification

confidence to guide the filtration of the overlapping prediction boxes. However, there are

still two problems causing some inaccurate results: (1) In the process of feature extraction,

with the layer-by-layer acquisition of semantic information, local information is gradually lost,

resulting into less representative feature maps; (2) During the Non-Maximum Suppression

(NMS) algorithm due to inconsistency in classification and regression tasks, the classification

confidence and predicted detection position cannot accurately indicate the position of the

prediction boxes. Methods: In order to address these aforementioned issues, we propose a

new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise

Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to

SSD. Secondly, we construct a simple and effective feature enhancement module to expand

the receptive field step by step for each layer and enhance its local and semantic information.

Finally, we design a more efficient loss function to predict the IOU between the prediction boxes

and ground truth boxes, and the threshold IOU guides classification training and attenuates

the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above

optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically,

with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45

FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007

outperforming state-of-the-art object detection models. Besides, the proposed model performs

significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27

FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results

prove that the proposed model has a better trade-off between speed and accuracy.

Keywords: Objection Detection(OD); Precise Single Stage Detector (PSSD); Deep-Convolutional

Neural Networks (D-CNNs); Deep Learning (DL); Machine Learning (ML).

1. Introduction :

In recent years, deep learning algorithms have became a powerful tool which can automatically

capture nonlinear and hierarchical features that has shown great success on various applications,

in particular, image domains such as classification, segmentation, detection, captioning, and

various others [

]. Furthermore, it has been extended for different classification tasks

including audio classification [

], text classification [

], various

signals classification [

], and multi-modality object classification [

], event

detection [

], and various other applications [

]. Among them, object detection

[

] has been a central interest to vast majority of researchers. To this end,

there are various algorithms such as YOLO, fast-RCNN, faster-RCNN, and others that have

been successfully used for object detection over the years. Deep learning algorithms for object

detection has gained significant attention over the last few decades [

]. Object detection,

aiming at locating object instances from a large number of predefined categories in natural

images, is one of the most basic and challenging problems in computer vision [

]. With

the rapid development of CNN, object detection has made remarkable progress and gradually

turns into two main architectures: two-stage and single-stage. Two-stage algorithms such as

Faster Recurrent Neural Network (FRCNN) [

] where the first stage only distinguishes a

large number of background regions and obtains rough object proposals without considering the

specific class of the object. It has then followed by the second stage that classifies each proposal

and optimizes the location according to the features extracted from CNN network [

]. Due

to the existence of refinement conducted by the second stage, two-stage algorithms cannot

achieve performance in real-time [

]. Therefore, single-stage algorithms have been the

major priority for various object detection applications due to real-time detection [

] and

thus, it is the particular interest of the current work. The single-stage algorithms directly

perform classification and location optimization based on the default boxes [

]. For

example, You Look Only Once (YOLO) [

] and SSD [

] have achieved fast real-time detection

speed but simultaneously sacrificed detection accuracy. In recent years, single-stage detectors

are improving their accuracy, but still, they cannot achieve a better trade-off between speed

and accuracy [38].

In this paper, on the premise of guaranteeing the real-time performance of the model, we

propose a novel architecture named Precise Single Stage Detector (PSSD) based on the original

SSD that provides solutions to the two key questions: (1) How to enrich the information of

features used by predictors without relying on a deep backbone of the model like ResNet-101 [

(2) Whether it is reasonable to rely on classification confidence to determine the filtration of

overlapping boxes in the process of the NMS algorithm? In the next subsections, we will address

the aforementioned questions in order to establish the main challenges and corresponding

improvements necessary that we have implemented in our proposed PPSD model.

1.1 Feature richness:

Considering the significant overhead caused by the image pyramid, SSD [

] puts forward

a feature pyramid to solve the problem of multi-scale detection. The deep features in the

classification network contain more semantic information, which is suitable for identifying large

objects, while the shallow low-level features are more suitable for small objects. However, the

lack of semantic information in shallow features and the loss of local details in deep features

deteriorate the precision of SSD. Feature Pyramid Networks (FPN) [

] adds deep semantic

information to shallow features using a U-shaped structure to obtain a more effective feature

pyramid, which improves the effect of small object detection. DetNet [

] combines dilated

convolution to reduce local information loss by decreasing the down-sampling steps so that the

positioning accuracy of large objects is improved. From the above-mentioned networks, it is

noted that the features used in each scale predictor need not only suitable semantic information

but also local texture information for more accurate positioning. The information richness of

each level feature is significant to the detection effect. But the problem is how to construct a

high-performance feature pyramid with as little overhead as possible.

1.2 Filtration of overlapping boxes:

Usually, to prevent the results from overlapping, we set the NMS as the final operation

Ground Truth

IO U = 0.8 Pcl s=0.7

IO U = 0.6 Pcl s=0.8

Figure 1: A example of test results without NMS, the blue box will be retained because of

higher classification confidence.

of object detection. In the NMS algorithm, the prediction boxes with the highest classification

confidence are retained and other boxes are filtered when IOU between the two is greater than

the threshold. As shown in Figure 1, this may lead to inaccurate results. IOUNet [

] guides

the NMS to alleviate this problem by predicting the IOU between the regression boxes and their

ground truth. Herein, the key question arises - how to make it more effective in single-stage

algorithms?

1.3 Contribution:

To improve the detection effect, we mitigate the above problems based on the SSD model. Firstly,

we improve features by introducing extra layers in the SSD to make the basic feature pyramid

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PreciseSingle-stageDetectorAishaChandioy1,GongGuiy1,TeerathKumary2,IrfanUllah3,RaminRanjbarzadeh4,ArunabhaMRoy5,AkhtarHussain6,andYaoShen*11DepartmentofComputerScienceandEngineering,ShanghaiJiaoTongUniversity,Shanghai,200240,China.2DepartmentofSoftwareEngineering,SchoolofComputing,NationalUniversity...

展开>> 收起<<

Precise Single-stage Detector Aisha Chandioy1 Gong Guiy1 Teerath Kumary2 Irfan Ullah3 Ramin Ranjbarzadeh4 Arunabha M Roy5 Akhtar Hussain6 and Yao Shen1.pdf

共33页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Precise Single-stage Detector Aisha Chandioy1 Gong Guiy1 Teerath Kumary2 Irfan Ullah3 Ramin Ranjbarzadeh4 Arunabha M Roy5 Akhtar Hussain6 and Yao Shen1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: