FRIDA Fisheye Re-Identiﬁcation Dataset with Annotations Mertcan Cokbas John Bolognino Janusz Konrad Prakash Ishwar Department of Electrical and Computer Engineering Boston University

2025-05-01 0 0 1.07MB 8 页 10玖币

侵权投诉

FRIDA: Fisheye Re-Identiﬁcation Dataset with Annotations

Mertcan Cokbas, John Bolognino, Janusz Konrad, Prakash Ishwar*

Department of Electrical and Computer Engineering, Boston University

8 Saint Mary’s Street, Boston, MA 02215

[mcokbas, jcbolo, jkonrad, pi]@bu.edu

Abstract

Person re-identiﬁcation (PRID) from side-mounted

rectilinear-lens cameras is a well-studied problem. On the

other hand, PRID from overhead ﬁsheye cameras is new

and largely unstudied, primarily due to the lack of suitable

image datasets. To ﬁll this void, we introduce the “Fish-

eye Re-IDentiﬁcation Dataset with Annotations” (FRIDA)1,

with 240k+ bounding-box annotations of people, captured

by 3 time-synchronized, ceiling-mounted ﬁsheye cameras in

a large indoor space. Due to a ﬁeld-of-view overlap, PRID

in this case differs from a typical PRID problem, which we

discuss in depth. We also evaluate the performance of 10

state-of-the-art PRID algorithms on FRIDA. We show that

for 6 CNN-based algorithms, training on FRIDA boosts the

performance by up to 11.64% points in mAP compared to

training on a common rectilinear-camera PRID dataset.

1. Introduction

Knowing the number and location of people in public

spaces, ofﬁce and school buildings, stores and shopping

malls, etc., is critical for public safety (ﬁre, chemical haz-

ards), spatial analytics (optimization of ofﬁce or store space

usage), HVAC energy reduction, and, recently, for pan-

demic management. Typically, people-detection systems

use standard surveillance cameras (equipped with rectilin-

ear lens) mounted high on walls above the scene of inter-

est. Since such cameras have a relatively narrow ﬁeld of

view (FOV), a number of them must be installed and man-

aged which signiﬁcantly increases the system complexity

and cost, especially in large spaces.

Recently, overhead ﬁsheye cameras have been success-

fully proposed for people counting [23,17,9]. However,

even a ﬁsheye camera with its large FOV cannot accurately

detect people at the FOV periphery (large distance from the

camera) due to extreme foreshortening and geometric dis-

*This work was supported by ARPA-E (agreement DE-AR0000944)

and by Boston University Undergraduate Research Opportunities Program.

1vip.bu.edu/frida

978-1-6654-6382-9/22/$31.00 2022 IEEE

tortions. Clearly, in such spaces (e.g., a convention hall)

multiple overhead ﬁsheye cameras are needed. However,

since the same person may appear in FOVs of multiple cam-

eras, person re-identiﬁcation (PRID) is critical for accurate

people counting, tracking, etc.

While PRID for side-mounted, rectilinear-lens cameras

has been researched in depth [22,6,28,31,26,27], we

are aware of only three works exploring ﬁsheye PRID

[1,2,3], each with its own limitations and none releasing

their ﬁsheye data. Therefore, to inspire more research in this

area, we are proposing a ﬁrst-of-its-kind dataset, “Fisheye

Re-IDentiﬁcation Dataset with Annotations” (FRIDA), that

was captured by three overhead ﬁsheye cameras in a large

space and includes over 240,000 bounding-box annotations

of people. In addition to introducing FRIDA, we explore

its use for image-based PRID. An alternative use-case for

FRIDA is as a video-dataset for tracking, but this is not the

focus of our work.

Typical PRID datasets are not designed for people count-

ing and were captured by side-mounted, rectilinear-lens

cameras without FOV overlap. In this case, the goal is to

identify the same person in two images captured by two

cameras at different times. FRIDA, however, is meant for

people counting and was captured by time-synchronized,

overhead, ﬁsheye cameras with fully-overlapping FOVs

(360◦×185◦). In this case, the goal is to identify the same

person in two images captured by two cameras at the same

time. This explains the difference between the gallery sets

of typical PRID datasets and FRIDA. In the former, for a

given query there may be multiple ground-truth gallery el-

ements, captured at different times. In FRIDA, for a given

query there may be at most one gallery element at a given

time instant. In case of occlusion, there is no gallery ele-

ment for a given query (see Section 3for details).

We also evaluate the performance of 10 state-of-the-art

PRID methods on FRIDA: 6 methods developed for typical

PRID datasets [22,6,26,31,28,13] and 4 methods devel-

oped for overhead ﬁsheye cameras [3]. The results show

that training CNN-based methods on FRIDA (2-fold cross-

validation) improves performance by 4.99-11.64% points in

arXiv:2210.01582v2 [cs.CV] 19 Oct 2022

mAP compared to training on a typical PRID dataset [29].

The main contributions of this work are:

• We introduce a new PRID dataset, FRIDA, for indoor

person re-identiﬁcation using time-synchronized over-

head ﬁsheye cameras. This is the ﬁrst overhead ﬁsheye

dataset for PRID and will be made publicly available.

• We evaluate the performance of 10 state-of-the-art

PRID methods on FRIDA using two metrics. We com-

pare the performance of 6 of those algorithms, when

training on FRIDA against training on the non-ﬁsheye

Market-1501 dataset [29].

2. Related Work

2.1. Datasets

There exist several datasets for person re-identiﬁcation

using side-mounted rectilinear-lens cameras. Table 1lists

key statistics of the most common ones: VIPeR [11],

PRID 2011 [14], Airport [15], CUHK03 [18], GRID [20],

MSMT17 [25], Market-1501 [29] and iLIDS [30], but more

details can be found in [27]. All these datasets have been

designed with the goal of matching the image of a person

from the query set to an image from the gallery set, and the

query and gallery sets consist of images captured by differ-

ent cameras. Moreover, different cameras have no ﬁeld-of-

view overlap so query and gallery images of the same iden-

tity have been captured at different time instants. Finally, in

most of these datasets there are, typically, multiple gallery

images having the same ID as the query image.

While there exist people-focused datasets captured by

overhead ﬁsheye-lens cameras (PIROPO [7], BOMNI [8],

MW [21], HABBOF [17], CEPDOF [9], WEPDTOF [24]),

they have been developed with the goal of people detection

and, in some cases, tracking. However, each dataset only

consists of frames from a single camera which severely lim-

its the variability of body appearance, unlike in FRIDA.

Table 1. Commonly-used image datasets for person re-

identiﬁcation. (BBox = bounding box)

Dataset Year # # Frame

BBoxes Cameras Resol.

VIPer [11] 2007 1,264 2 Fixed

iLIDS [30] 2009 476 2 Variable

GRID [20] 2009 1,275 8 Variable

PRID 2011 [14] 2011 24,541 2 Fixed

CUHK03 [18] 2014 13,164 2 Variable

Market-1501 [29] 2015 32,668 6 Fixed

Airport [15] 2017 39,902 6 Fixed

MSMT17 [25] 2018 126,441 15 Variable

FRIDA 2022 242,809 3 Fixed

2.2. Algorithms

Person re-identiﬁcation using rectilinear-lens cameras is

a well-studied problem. Early approaches were model-

based [12,10,19,16] and used hand-crafted features. Re-

cent approaches use deep learning [22,6,5,4,28,31,26,

27] and outperform the traditional methods.

Sun et al. proposed PCB [22] in which feature vectors

are uniformly partitioned in an intermediate layer to obtain

part-informed features. This structure allows to separately

focus on different parts of an image and extract local infor-

mation for each part. Zheng et al. proposed a network called

Pyramid [28] which does not only focus on part-informed

local features, but also accounts for global features in ad-

dition to gradual cues. Pyramid achieves this through a

coarse-to-ﬁne model, which performs image matching by

leveraging information from different spatial scales. Chen

et al. proposed an attention-based network called ABD-

Net [6], which instead of a small portion of an image fo-

cuses on its wider aspect by means of a diverse attention

map. This is accomplished by combining two separate mod-

ules: one module focuses on context-wise relevance of pix-

els while the other module focuses on spatial relevance of

these pixels. Zhu et al. proposed a network called VA-

reID [31] that allows matching of people regardless of the

viewpoint from which they were captured. Instead of cre-

ating a separate space for each viewpoint (i.e., front, side,

back), they create a uniﬁed hyperspace which accommo-

dates viewpoints in-between the main viewpoints (e.g, side-

front, side-back, etc.). Recently, Wieczorek et al. proposed

a CTL model (Centroid Triplet Loss model) [26], which ex-

tends the triplet loss. When working with triplet loss, it is

typical to choose one positive sample and one negative sam-

ple for an anchor. However, in the CTL model, instead of

choosing a single sample, a centroid is computed over a set

of samples which signiﬁcantly improves performance.

The methods above have been designed for and tested on

images from rectilinear-lens cameras. Very few PRID meth-

ods have been developed for overhead ﬁsheye cameras. An

early approach, proposed by Barman et al. [1], matches im-

ages of people who appear at the same radial distance from a

camera (similar viewpoints). This is restrictive, and leads to

sub-par performance, since people often appear at different

distances from FOV centers in different cameras. Another

algorithm proposed by Blott et al. [2] applies tracking to

extract front-, back- and side-view images of a person. A

person-descriptor is built by fusing features extracted from

individual views. The algorithm does not perform PRID for

each pose/viewpoint. Moreover, there is no guarantee that

a person will appear at all 3 viewpoints during a recording,

thus limiting performance. Recently, Bone et al. [3] pro-

posed a PRID method for ﬁsheye-lens cameras with over-

lapping FOVs. This approach leverages locations of peo-

ple in images instead of their appearance. Using a cali-

brated ﬁsheye-lens model this method maps pixel-location

of a person in a query image to a pixel-location in a gallery

image. The mapped query-person location is compared to

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FRIDA:FisheyeRe-IdenticationDatasetwithAnnotationsMertcanCokbas,JohnBolognino,JanuszKonrad,PrakashIshwar*DepartmentofElectricalandComputerEngineering,BostonUniversity8SaintMary'sStreet,Boston,MA02215[mcokbas,jcbolo,jkonrad,pi]@bu.eduAbstractPersonre-identication(PRID)fromside-mountedrectilinear-le...

展开>> 收起<<

FRIDA Fisheye Re-Identiﬁcation Dataset with Annotations Mertcan Cokbas John Bolognino Janusz Konrad Prakash Ishwar Department of Electrical and Computer Engineering Boston University.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FRIDA Fisheye Re-Identiﬁcation Dataset with Annotations Mertcan Cokbas John Bolognino Janusz Konrad Prakash Ishwar Department of Electrical and Computer Engineering Boston University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: