FRIDA Fisheye Re-Identification Dataset with Annotations Mertcan Cokbas John Bolognino Janusz Konrad Prakash Ishwar Department of Electrical and Computer Engineering Boston University

2025-05-01 0 0 1.07MB 8 页 10玖币
侵权投诉
FRIDA: Fisheye Re-Identification Dataset with Annotations
Mertcan Cokbas, John Bolognino, Janusz Konrad, Prakash Ishwar*
Department of Electrical and Computer Engineering, Boston University
8 Saint Mary’s Street, Boston, MA 02215
[mcokbas, jcbolo, jkonrad, pi]@bu.edu
Abstract
Person re-identification (PRID) from side-mounted
rectilinear-lens cameras is a well-studied problem. On the
other hand, PRID from overhead fisheye cameras is new
and largely unstudied, primarily due to the lack of suitable
image datasets. To fill this void, we introduce the “Fish-
eye Re-IDentification Dataset with Annotations” (FRIDA)1,
with 240k+ bounding-box annotations of people, captured
by 3 time-synchronized, ceiling-mounted fisheye cameras in
a large indoor space. Due to a field-of-view overlap, PRID
in this case differs from a typical PRID problem, which we
discuss in depth. We also evaluate the performance of 10
state-of-the-art PRID algorithms on FRIDA. We show that
for 6 CNN-based algorithms, training on FRIDA boosts the
performance by up to 11.64% points in mAP compared to
training on a common rectilinear-camera PRID dataset.
1. Introduction
Knowing the number and location of people in public
spaces, office and school buildings, stores and shopping
malls, etc., is critical for public safety (fire, chemical haz-
ards), spatial analytics (optimization of office or store space
usage), HVAC energy reduction, and, recently, for pan-
demic management. Typically, people-detection systems
use standard surveillance cameras (equipped with rectilin-
ear lens) mounted high on walls above the scene of inter-
est. Since such cameras have a relatively narrow field of
view (FOV), a number of them must be installed and man-
aged which significantly increases the system complexity
and cost, especially in large spaces.
Recently, overhead fisheye cameras have been success-
fully proposed for people counting [23,17,9]. However,
even a fisheye camera with its large FOV cannot accurately
detect people at the FOV periphery (large distance from the
camera) due to extreme foreshortening and geometric dis-
*This work was supported by ARPA-E (agreement DE-AR0000944)
and by Boston University Undergraduate Research Opportunities Program.
1vip.bu.edu/frida
978-1-6654-6382-9/22/$31.00 2022 IEEE
tortions. Clearly, in such spaces (e.g., a convention hall)
multiple overhead fisheye cameras are needed. However,
since the same person may appear in FOVs of multiple cam-
eras, person re-identification (PRID) is critical for accurate
people counting, tracking, etc.
While PRID for side-mounted, rectilinear-lens cameras
has been researched in depth [22,6,28,31,26,27], we
are aware of only three works exploring fisheye PRID
[1,2,3], each with its own limitations and none releasing
their fisheye data. Therefore, to inspire more research in this
area, we are proposing a first-of-its-kind dataset, “Fisheye
Re-IDentification Dataset with Annotations” (FRIDA), that
was captured by three overhead fisheye cameras in a large
space and includes over 240,000 bounding-box annotations
of people. In addition to introducing FRIDA, we explore
its use for image-based PRID. An alternative use-case for
FRIDA is as a video-dataset for tracking, but this is not the
focus of our work.
Typical PRID datasets are not designed for people count-
ing and were captured by side-mounted, rectilinear-lens
cameras without FOV overlap. In this case, the goal is to
identify the same person in two images captured by two
cameras at different times. FRIDA, however, is meant for
people counting and was captured by time-synchronized,
overhead, fisheye cameras with fully-overlapping FOVs
(360×185). In this case, the goal is to identify the same
person in two images captured by two cameras at the same
time. This explains the difference between the gallery sets
of typical PRID datasets and FRIDA. In the former, for a
given query there may be multiple ground-truth gallery el-
ements, captured at different times. In FRIDA, for a given
query there may be at most one gallery element at a given
time instant. In case of occlusion, there is no gallery ele-
ment for a given query (see Section 3for details).
We also evaluate the performance of 10 state-of-the-art
PRID methods on FRIDA: 6 methods developed for typical
PRID datasets [22,6,26,31,28,13] and 4 methods devel-
oped for overhead fisheye cameras [3]. The results show
that training CNN-based methods on FRIDA (2-fold cross-
validation) improves performance by 4.99-11.64% points in
1
arXiv:2210.01582v2 [cs.CV] 19 Oct 2022
mAP compared to training on a typical PRID dataset [29].
The main contributions of this work are:
We introduce a new PRID dataset, FRIDA, for indoor
person re-identification using time-synchronized over-
head fisheye cameras. This is the first overhead fisheye
dataset for PRID and will be made publicly available.
• We evaluate the performance of 10 state-of-the-art
PRID methods on FRIDA using two metrics. We com-
pare the performance of 6 of those algorithms, when
training on FRIDA against training on the non-fisheye
Market-1501 dataset [29].
2. Related Work
2.1. Datasets
There exist several datasets for person re-identification
using side-mounted rectilinear-lens cameras. Table 1lists
key statistics of the most common ones: VIPeR [11],
PRID 2011 [14], Airport [15], CUHK03 [18], GRID [20],
MSMT17 [25], Market-1501 [29] and iLIDS [30], but more
details can be found in [27]. All these datasets have been
designed with the goal of matching the image of a person
from the query set to an image from the gallery set, and the
query and gallery sets consist of images captured by differ-
ent cameras. Moreover, different cameras have no field-of-
view overlap so query and gallery images of the same iden-
tity have been captured at different time instants. Finally, in
most of these datasets there are, typically, multiple gallery
images having the same ID as the query image.
While there exist people-focused datasets captured by
overhead fisheye-lens cameras (PIROPO [7], BOMNI [8],
MW [21], HABBOF [17], CEPDOF [9], WEPDTOF [24]),
they have been developed with the goal of people detection
and, in some cases, tracking. However, each dataset only
consists of frames from a single camera which severely lim-
its the variability of body appearance, unlike in FRIDA.
Table 1. Commonly-used image datasets for person re-
identification. (BBox = bounding box)
Dataset Year # # Frame
BBoxes Cameras Resol.
VIPer [11] 2007 1,264 2 Fixed
iLIDS [30] 2009 476 2 Variable
GRID [20] 2009 1,275 8 Variable
PRID 2011 [14] 2011 24,541 2 Fixed
CUHK03 [18] 2014 13,164 2 Variable
Market-1501 [29] 2015 32,668 6 Fixed
Airport [15] 2017 39,902 6 Fixed
MSMT17 [25] 2018 126,441 15 Variable
FRIDA 2022 242,809 3 Fixed
2.2. Algorithms
Person re-identification using rectilinear-lens cameras is
a well-studied problem. Early approaches were model-
based [12,10,19,16] and used hand-crafted features. Re-
cent approaches use deep learning [22,6,5,4,28,31,26,
27] and outperform the traditional methods.
Sun et al. proposed PCB [22] in which feature vectors
are uniformly partitioned in an intermediate layer to obtain
part-informed features. This structure allows to separately
focus on different parts of an image and extract local infor-
mation for each part. Zheng et al. proposed a network called
Pyramid [28] which does not only focus on part-informed
local features, but also accounts for global features in ad-
dition to gradual cues. Pyramid achieves this through a
coarse-to-fine model, which performs image matching by
leveraging information from different spatial scales. Chen
et al. proposed an attention-based network called ABD-
Net [6], which instead of a small portion of an image fo-
cuses on its wider aspect by means of a diverse attention
map. This is accomplished by combining two separate mod-
ules: one module focuses on context-wise relevance of pix-
els while the other module focuses on spatial relevance of
these pixels. Zhu et al. proposed a network called VA-
reID [31] that allows matching of people regardless of the
viewpoint from which they were captured. Instead of cre-
ating a separate space for each viewpoint (i.e., front, side,
back), they create a unified hyperspace which accommo-
dates viewpoints in-between the main viewpoints (e.g, side-
front, side-back, etc.). Recently, Wieczorek et al. proposed
a CTL model (Centroid Triplet Loss model) [26], which ex-
tends the triplet loss. When working with triplet loss, it is
typical to choose one positive sample and one negative sam-
ple for an anchor. However, in the CTL model, instead of
choosing a single sample, a centroid is computed over a set
of samples which significantly improves performance.
The methods above have been designed for and tested on
images from rectilinear-lens cameras. Very few PRID meth-
ods have been developed for overhead fisheye cameras. An
early approach, proposed by Barman et al. [1], matches im-
ages of people who appear at the same radial distance from a
camera (similar viewpoints). This is restrictive, and leads to
sub-par performance, since people often appear at different
distances from FOV centers in different cameras. Another
algorithm proposed by Blott et al. [2] applies tracking to
extract front-, back- and side-view images of a person. A
person-descriptor is built by fusing features extracted from
individual views. The algorithm does not perform PRID for
each pose/viewpoint. Moreover, there is no guarantee that
a person will appear at all 3 viewpoints during a recording,
thus limiting performance. Recently, Bone et al. [3] pro-
posed a PRID method for fisheye-lens cameras with over-
lapping FOVs. This approach leverages locations of peo-
ple in images instead of their appearance. Using a cali-
brated fisheye-lens model this method maps pixel-location
of a person in a query image to a pixel-location in a gallery
image. The mapped query-person location is compared to
摘要:

FRIDA:FisheyeRe-IdenticationDatasetwithAnnotationsMertcanCokbas,JohnBolognino,JanuszKonrad,PrakashIshwar*DepartmentofElectricalandComputerEngineering,BostonUniversity8SaintMary'sStreet,Boston,MA02215[mcokbas,jcbolo,jkonrad,pi]@bu.eduAbstractPersonre-identication(PRID)fromside-mountedrectilinear-le...

展开>> 收起<<
FRIDA Fisheye Re-Identification Dataset with Annotations Mertcan Cokbas John Bolognino Janusz Konrad Prakash Ishwar Department of Electrical and Computer Engineering Boston University.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.07MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注