Hierarchical Few-Shot Object Detection Problem Benchmark and Method

2025-05-06 0 0 3.01MB 10 页 10玖币
侵权投诉
Hierarchical Few-Shot Object Detection: Problem, Benchmark
and Method
Lu Zhang
l_zhang19@fudan.edu.cn
Fudan University
Shanghai, China
Yang Wang
tongji_wangyang@tongji.edu.cn
Tongji University
Shanghai, China
Jiaogen Zhou
zhoujg@hytc.edu.cn
Huaiyin Normal University
Huaian, China
Chenbo Zhang
cbzhang21@m.fudan.edu.cn
Fudan University
Shanghai, China
Yinglu Zhang
yingluzhang21@m.fudan.edu.cn
Fudan University
Shanghai, China
Jihong Guan
jhguan@tongji.edu.cn
Tongji University
Shanghai, China
Yatao Bian
yatao.bian@gmail.com
Tencent AI Lab
Shenzhen, China
Shuigeng Zhou
sgzhou@fudan.edu.cn
Fudan University
Shanghai, China
ABSTRACT
Few-shot object detection (FSOD) is to detect objects with a few
examples. However, existing FSOD methods do not consider hierar-
chical ne-grained category structures of objects that exist widely
in real life. For example, animals are taxonomically classied into
orders, families, genera and species etc. In this paper, we propose
and solve a new problem called hierarchical few-shot object de-
tection (Hi-FSOD), which aims to detect objects with hierarchical
categories in the FSOD paradigm. To this end, on the one hand, we
build the rst large-scale and high-quality Hi-FSOD benchmark
dataset HiFSOD-Bird, which contains 176,350 wild-bird images
falling to 1,432 categories. All the categories are organized into a
4-level taxonomy, consisting of 32 orders, 132 families, 572 genera
and 1,432 species. On the other hand, we propose the rst Hi-FSOD
method HiCLPL, where a hierarchical contrastive learning approach
is developed to constrain the feature space so that the feature dis-
tribution of objects is consistent with the hierarchical taxonomy
and the model’s generalization power is strengthened. Meanwhile,
a probabilistic loss is designed to enable the child nodes to correct
the classication errors of their parent nodes in the taxonomy. Ex-
tensive experiments on the benchmark dataset HiFSOD-Bird show
that our method HiCLPL outperforms the existing FSOD methods.
CCS CONCEPTS
Computing methodologies Object detection.
Correspondence authors: Jiaogen Zhou (School of Urban and Environmental Sci-
ences, Huaiyin Normal University), Shuigeng Zhou (School of Computer Science, and
Shanghai Key Lab of Intelligent Information Processing, Fudan University.)
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MM ’22, October 10–14, 2022, Lisboa, Portugal
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00
https://doi.org/10.1145/3503161.3548412
KEYWORDS
Few-shot object detection; hierarchical few-shot object detection;
Benchmark; hierarchical classication.
ACM Reference Format:
Lu Zhang, Yang Wang, Jiaogen Zhou, Chenbo Zhang, Yinglu Zhang, Jihong
Guan, Yatao Bian, and Shuigeng Zhou. 2022. Hierarchical Few-Shot Object
Detection: Problem, Benchmark and Method. In Proceedings of the 30th
ACM International Conference on Multimedia (MM ’22), Oct. 10–14, 2022,
Lisboa, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/10.
1145/3503161.3548412
1 INTRODUCTION
Existing object detection methods [
26
] require huge amounts of an-
notated training data. However, in the real world, samples of some
categories are dicult to acquire and the cost to label high-quality
samples can be very high. On the contrary, a child can recognize
and locate elephants or horses in a picture that s/he has never seen
before with only a few examples. Thus, few-shot object detection
(FSOD) [
7
,
14
,
22
,
24
,
27
,
31
,
35
,
36
,
39
,
42
] is gaining increasing
research interests, which tries to detect novel objects with only a
few labeled examples. However, many objects in real life fall into hi-
erarchical ne-grained category structures. For example, elephants
have dierent families and species, e.g. African elephants and Asian
elephants. And African elephants have two subspecies, which are
African savannah elephants and African forest elephants, so as the
Asian elephants. Obviously, it is dicult for an ordinary people
(let alone a child) to distinguish between an African savannah ele-
phant and an African forest elephant if only a few photos are given.
Moreover, existing FSOD methods do not consider such hierarchical
ne-grained category structures of objects that exist ubiquitously
in real life, thus they cannot cope with such scenarios well.
In this paper, we propose a new problem of
hi
erarchical
f
ew-
s
hot
o
bject
d
etection,
Hi-FSOD
in short, which aims to perform
few-shot object detection under a hierarchical taxonomy. Obviously,
the FSOD task is a special case of Hi-FSOD when the hierarchical
taxonomy is degenerated to a at category structure. So comparing
to FSOD, Hi-FSOD is more challenging and has wider applications
arXiv:2210.03940v1 [cs.CV] 8 Oct 2022
MM ’22, October 10–14, 2022, Lisboa, Portugal Lu Zhang et al.
root
order
family
genus
species
(a)
(b)
Hierarchical taxonomy
Hierarchical taxonomy T(partial) Feature space
ab
a.1 a.2 a.3 b.1 b.2 b.3
ab
a.1
a.2
a.3
b.1
b.2
b.3
Level 1
Level 2
Level 3
Figure 1: (a) The hierarchical taxonomy of our HiFSOD-Bird
dataset; (b) Illustration of the proposed hierarchical con-
trastive learning, which constrains the feature space such
that the distribution of object features is consistent with the
hierarchical taxonomy.
than FSOD, especially in the scenarios that the number of cate-
gories of objects is huge, where existing FSOD methods are neither
ecient nor eective. To address the Hi-FSOD problem, we have
tackled two major subproblems:
On the one hand, we construct the rst high-quality and large-
scale Hi-FSOD benchmark dataset of wild birds, which is called
HiFSOD-Bird
. Although there are already some datasets of wildlife
for computer vision (CV) tasks [
30
,
37
,
38
,
45
], most of them are for
classication tasks and a few of them are dedicated to object detec-
tion tasks. Nevertheless, few of them have a strictly hierarchical
organization of categories. Existing FSOD methods perform train-
ing and testing on the modied COCO [
21
] and VOC [
6
] datasets
whose label structures are at and contain only 80 and 20 cate-
gories, respectively, which thus are unsuitable for the Hi-FSOD
task. Our HiFSOD-Bird dataset contains totally 1,432 categories and
176,350 bird images with high-quality annotated bounding boxes.
All categories are organized into a 4-level hierarchical taxonomy:
from top to bottom, order, family, genus and species, as shown
in Fig. 1(a). It consists of 32 orders, 132 families, 572 genera and
1,432 species, covering more than 90% of the world’s water birds
and part of forest birds. The bounding boxes and class labels of
each image are manually annotated and carefully double-checked.
Moreover, each category of birds comes with a textual description,
so the dataset can be further used for the zero-shot object detection
task. The HiFSOD-Bird dataset is also of great signicance to the
monitoring and protection of endangered birds, since the samples of
endangered birds are dicult to acquire and the domain knowledge
is mainly from expert annotations.
On the other hand, we develop the rst Hi-FSOD method
Hi-
CLPL
, which is a two-stage method with
hi
erarchical
c
ontrastive
l
earning and
p
robabilistic
l
oss. Here, hierarchical contrastive learn-
ing (HiCL) is used to constrain the feature space so that the feature
distribution of objects is consistent with the hierarchical category
structure, and the probabilistic loss is designed to enable the child
nodes to correct the classication errors of their parent nodes.
Fig. 1(b) illustrates the HiCL mechanism. We use memories to hold
the prototypes of classes in the hierarchical tree. Then, a hierar-
chical contrastive loss is designed to control the distance between
box features and memories at dierent levels. Finally, we utilize
exponential moving average to update the parameters of memories.
HiCL can boost the generalization power of the model. Meanwhile,
we found that in the process of hierarchical classication from top
to bottom, if a non-leaf node wrongly classies an instance, the
classications of the instance at the descendants nodes are useless.
Therefore, we design a probabilistic loss such that the child nodes
can learn to identify and correct the misclassied samples of their
parent nodes.
In summary, contributions of this paper are as follows: 1) We
propose a new problem of hierarchical few-shot object detection
(Hi-FSOD), which is an extension to the existing FSOD problem, so it
is more challenging and has wider applications. 2) We establish the
rst large-scale and high-quality benchmark dataset HiFSOD-Bird,
specically for the Hi-FSOD problem. 3) We develop the rst Hi-
FSOD method HiCLPL, which uses hierarchical contrastive learning
to constrain the feature space and a probabilistic loss to correct
the classication errors of parent nodes. 4) We conduct extensive
experiments on the benchmark dataset HiFSOD-Bird to evaluate
the proposed method HiCLPL. Experimental results show that our
method HiCLPL outperforms the existing FSOD methods.
2 RELATED WORK
2.1 Few-shot Object Detection
Existing few-shot object detection (FSOD) methods roughly fall
into two types: meta-learning based and ne-tuning based. Meta-
learning based methods [
7
,
17
,
36
,
39
,
40
] learn meta knowledge
from base classes to facilitate model training for novel classes.
Among them, FSRW [
17
] utilizes a feature re-weighting strategy
to construct a one-stage object detector. Attention-RPN [
7
] inte-
grates the information of supports into RPN, in order to pay more
attention to the foreground objects relevant to support classes. Meta-
DETR [
39
] exploits the inter-class correlation to apply the detection
transformer [
44
] to the FSOD task. We proposed a support-query
mutual guidance strategy that can generate more support-relevant
candidate regions, together with a hybrid loss to enhance the metric
space [
40
]. Fine-tuning based methods [
24
,
27
,
31
,
35
,
42
] formulate
the FSOD problem in a transfer learning setting. TFA [
31
] is the
rst work that proposes a two-stage ne-tuning strategy. It rst
trains the entire model on the base classes, and then ne-tunes
the nal classier on a balanced dataset containing base and novel
data. Experiments show that such ne-tuning method is simple yet
very eective. Following TFA, a number of methods are developed.
DeFRCN [
24
] adopts multi-stage and multi-task decoupling to im-
prove performance. FSCE [
27
] uses a contrastive learning strategy
to constrain the intra-class similarity and enhance the inter-class
similarity of box features. Nevertheless, existing methods do not
consider the scenarios where object classes form a hierarchical
taxonomy, thus they cannot be directly used to eectively handle
the problem proposed in this paper.
Dierent from these works above, here we address a new problem
— hierarchical few-shot object detection (Hi-FSOD). To this end, we
build a large-scale and high-quality benchmark dataset and develop
an eective method.
摘要:

HierarchicalFew-ShotObjectDetection:Problem,BenchmarkandMethodLuZhangl_zhang19@fudan.edu.cnFudanUniversityShanghai,ChinaYangWangtongji_wangyang@tongji.edu.cnTongjiUniversityShanghai,ChinaJiaogenZhou∗zhoujg@hytc.edu.cnHuaiyinNormalUniversityHuaian,ChinaChenboZhangcbzhang21@m.fudan.edu.cnFudanUniversi...

展开>> 收起<<
Hierarchical Few-Shot Object Detection Problem Benchmark and Method.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:3.01MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注