Beta R-CNN Looking into Pedestrian Detection from Another Perspective Zixuan Xu

2025-05-06 0 0 2.56MB 13 页 10玖币
侵权投诉
Beta R-CNN: Looking into Pedestrian Detection from
Another Perspective
Zixuan Xu
Peking University
zixuanxu@pku.edu.cn
Banghuai Li
Megvii Research
libanghuai@megvii.com
Ye Yuan
Megvii Research
yuanye@megvii.com
Anhong Dang
Peking University
ahdang@pku.edu.cn
Abstract
Recently significant progress has been made in pedestrian detection, but it remains
challenging to achieve high performance in occluded and crowded scenes. It could
be attributed mostly to the widely used representation of pedestrians, i.e., 2D
axis-aligned bounding box, which just describes the approximate location and size
of the object. Bounding box models the object as a uniform distribution within the
boundary, making pedestrians indistinguishable in occluded and crowded scenes
due to much noise. To eliminate the problem, we propose a novel representation
based on 2D beta distribution, named Beta Representation. It pictures a pedestrian
by explicitly constructing the relationship between full-body and visible boxes, and
emphasizes the center of visual mass by assigning different probability values to
pixels. As a result, Beta Representation is much better for distinguishing highly-
overlapped instances in crowded scenes with a new NMS strategy named BetaNMS.
What’s more, to fully exploit Beta Representation, a novel pipeline Beta R-CNN
equipped with BetaHead and BetaMask is proposed, leading to high detection
performance in occluded and crowded scenes.
1 Introduction
Pedestrian detection is a critical research topic in computer vision field with various real-world
applications such as autonomous vehicles, intelligent video surveillance, robotics, and so on. During
the last decade, with the rise of deep convolutional neural networks (CNNs), great progress has
been achieved in pedestrian detection. However, it remains challenging to accurately distinguish
pedestrians in occluded and crowded scenes.
Although extensive methods have been attempted for occlusion and crowd issues, the performance
is still limited by pedestrian representation, i.e., 2D bounding box representation. The axis-aligned
minimum bounding box is widely utilized to explicitly define a distinct object, with its approximate
location and size. Although box representation has advantages such as parameterization- and
annotation-friendly as the identity of an object, some nonnegligible drawbacks are limiting the
performance of pedestrian detection especially in occluded and crowded scenes. Firstly, the bounding
box can be regarded as modeling the object as a uniform distribution in the box, but it actually goes
against our intuitive perception. Given an occluded pedestrian, what attracts our attention should be
the visible part rather than the occluded noise. Secondly, based on box representation, intersection
over union (IoU) serves as the metric to measure the difference between objects, which results in
These authors contributed equally
34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.
arXiv:2210.12758v1 [cs.CV] 23 Oct 2022
FWHM
5, 2

==
2, 2

==
2, 3

==
2, 4

==
Figure 1: Beta distributions have flexible
shapes with different peaks and FWHMs.
BBox representation 2-value Mask Beta Representation
fIoU:0.74, vIoU:0.21, KL:9.95
fIoU:0.68, vIoU:0.31, KL:10.34
fIoU:0.61, vIoU:0.45, KL:8.28
fIoU:0.84, vIoU:0.19, KL:12.47
Full Box Visible Box
fIoU:0.68, vIoU:0.31, KL:10.34
fIoU:0.61, vIoU:0.45, KL:8.28
fIoU:0.84, vIoU:0.19, KL:12.47
Full-body Box
Visible Box
Figure 2: Beta Representation samples and compar-
isons between IoU and KL divergence.
difficulty to distinguish highly-overlapped instances in crowded scenes. As shown in Fig. 2, even if
the detectors succeed to identify different human instances in a crowded scene, the highly-overlapped
detections may also be suppressed by the post-processing of non-maximum suppression (NMS). Last,
the full-body and visible boxes treat a distinct person as two separate parts, which omit their inner
relationship as a whole and lead to difficulty for model optimization.
To eliminate the weaknesses of box representation and preserve its advantages in the meanwhile,
we propose a novel representation for pedestrians based on 2D beta distribution, named
Beta
Representation
. In probability theory, the beta distribution is a family of continuous probability
distribution defined in the interval [0, 1], as depicted in Fig. 1. By assigning different values to
α, β
,
we could control the shape of the beta distribution, especially the peak and the full width at half
maximum (FWHM), which is naturally suitable for pedestrian representation with unpredictable
visible patterns. We take each pedestrian as a 2D beta distribution on the image and generate eight new
parameters as the Beta Representation. As illustrated in Fig. 2, the boundary of 2D beta distribution is
consistent with the full-body box, while the peak along with FWHM depends on the relation between
the visible part and full-body box. Compared with paired boxes, i.e., full-body and visible boxes,
2D beta distribution treats each pedestrian more like an integrated whole and emphasizes the object
center of visual mass meanwhile.
Besides, instead of IoU, Kullback-Leibler (KL) divergence is adopted as a new metric to measure
the distance of two objects and the beta-distribution-based NMS strategy is named BetaNMS. Fig. 2
illustrates that while the bounding boxes are too close to distinguish (fIoU > 0.5, vIoU > 0.3
2
), the
2D beta distributions still maintain high discrimination (KL > 7) between each other, thereby leading
to better performance in distinguishing highly-overlapped instances.
Moreover, to fully exploit Beta Representation in pedestrian detection, we design a novel pedestrian
detector named Beta R-CNN, equipped with two different key modules, i.e., BetaHead and BetaMask.
BetaHead is utilized to regress the eight beta parameters and the class score, while BetaMask serves
as an attention mechanism to modulate the extracted feature with beta-distribution-based masks.
Experiments on the extremely crowded benchmark CrowdHuman [
1
] and CityPersons [
2
] show
that our proposed approach can outperform the state-of-the-art results, which strongly validate the
superiority of our method.
2 Related Work
Pedestrian Detection.
Pedestrian detection can be viewed as object detection for the specific
category. With the development of deep learning, CNN-based detectors can be roughly divided into
two categories: the two-stage approaches [
3
,
4
] comprise separate proposal generation followed by
classification and regression module to refine the proposals; and the one-stage approaches [
5
7
]
perform localization and classification simultaneously on the feature maps without the separate
2FIoU and vIoU are the IoU calculated based on full-body/visible boxes respectively.
2
proposal generation module. Most existing pedestrian detection methods employ either the single-
stage or two-stage strategy as their model architectures.
Occlusion Handling.
In pedestrian detection, occlusion leads to misclassifying pedestrians. A
common strategy is the part-based approaches [
8
11
], which ensemble a series of body-part detectors
to localize partially occluded pedestrians. Also some methods train different models for most frequent
occlusion patterns [
12
,
13
] or model different occlusion patterns in a joint framework [
14
,
15
], but
they are all just designed for some specific occlusion patterns and not able to generalize well in
other occluded scenes. Besides, attention mechanism has been applied to handle different occlusion
patterns [
9
,
16
]. MGAN [
16
] introduces a novel mask guided attention network, which emphasizes
visible pedestrian regions while suppressing the occluded parts by modulating extracted features.
Moreover, a few recent works [
17
,
18
] have exploited to utilize annotations of the visible box as extra
supervisions to improve pedestrian detection performance.
Crowdness Handling.
As for crowded scenes, except for the misclassifying issues, crowdedness
makes it difficult to distinguish highly-overlapped pedestrians. A few previous works propose new
loss functions to address the problem of crowded detections. For example, OR-CNN [
8
] proposes
aggregation loss to enforce proposals to be close to the corresponding objects and minimize the
internal region distances of proposals associated with the same objects. RepLoss [
19
] proposes
Repulsion Loss, which introduces extra penalty to proposals intertwined with multiple ground truths.
Moreover, some advanced NMS strategies [
20
23
,
18
] are proposed to alleviate the crowded issues to
some extent, but they still take IoU as the metric to measure the difference between detected objects,
which limits the performance on identifying highly-overlapped instances from crowded boxes.
Object Representation.
In computer vision, object representation is one primary topic, and there
are many representations for objects in 2D images, such as 2D bounding boxes [
4
], polygons [
24
],
splines [
25
], and pixels [
26
]. Each has strengths and weaknesses from a specific application’s
practical perspective, providing annotation cost, information density, and variable levels of fidelity.
Distribution-based representation has also been tried in [
27
] which utilizes the bivariate normal
distribution as the representation of objects. However, when transformed from bounding boxes rather
than segmentation, the mean and variance of bivarite normal distribution are still consistent with the
center and scale. Besides, its performance is considerably poor compared to other methods.
In this paper, Beta Representation provides a more detailed representation for occluded pedestrians,
along with a new metric to substitute for IoU and a new detector Beta R-CNN, thereby alleviating the
occlusion and crowd issues to a great extent.
3 Method
In this section, we first introduce the parameterized Beta Representation for pedestrians. Then to
fully exploit the Beta Representation, a novel pipeline Beta R-CNN is proposed. Moreover, a specific
NMS strategy based on beta distribution and KL divergence, i.e., BetaNMS, is analyzed in detail.
3.1 Beta Representation
3.1.1 Beta Distribution
In probability theory and mathematical statistics, the beta distribution is a family of one-dimensional
continuous probability distribution defined in the interval
[0,1]
, parameterized by two positive shape
parameters
α
and
β
. For
0x1
and shape parameters
α, β > 0
, the probability density function
(PDF) of beta distribution is a exponential function of the variable
x
and its reflection
(1 x)
as
follows:
Be(x;α, β) = Γ(α+β)
Γ(α)Γ(β)·x(α1)(1 x)(β1)
=1
B(α, β)·x(α1)(1 x)(β1),
(1)
where
Γ(z)
is the gamma function and
B(α, β)
is a normalization factor to ensure the total probability
is
1
. Some beta distribution samples are shown in Fig. 1. According to the above definition, the mean
3
摘要:

BetaR-CNN:LookingintoPedestrianDetectionfromAnotherPerspectiveZixuanXuPekingUniversityzixuanxu@pku.edu.cnBanghuaiLiMegviiResearchlibanghuai@megvii.comYeYuanMegviiResearchyuanye@megvii.comAnhongDangPekingUniversityahdang@pku.edu.cnAbstractRecentlysignicantprogresshasbeenmadeinpedestriandetection,b...

展开>> 收起<<
Beta R-CNN Looking into Pedestrian Detection from Another Perspective Zixuan Xu.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:2.56MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注