Hierarchical Few-Shot Object Detection Problem Benchmark and Method

2025-05-06 0 0 3.01MB 10 页 10玖币

侵权投诉

Hierarchical Few-Shot Object Detection: Problem, Benchmark

and Method

Lu Zhang

l_zhang19@fudan.edu.cn

Fudan University

Shanghai, China

Yang Wang

tongji_wangyang@tongji.edu.cn

Tongji University

Shanghai, China

Jiaogen Zhou∗

zhoujg@hytc.edu.cn

Huaiyin Normal University

Huaian, China

Chenbo Zhang

cbzhang21@m.fudan.edu.cn

Fudan University

Shanghai, China

Yinglu Zhang

yingluzhang21@m.fudan.edu.cn

Fudan University

Shanghai, China

Jihong Guan

jhguan@tongji.edu.cn

Tongji University

Shanghai, China

Yatao Bian

yatao.bian@gmail.com

Tencent AI Lab

Shenzhen, China

Shuigeng Zhou∗

sgzhou@fudan.edu.cn

Fudan University

Shanghai, China

ABSTRACT

Few-shot object detection (FSOD) is to detect objects with a few

examples. However, existing FSOD methods do not consider hierar-

chical ne-grained category structures of objects that exist widely

in real life. For example, animals are taxonomically classied into

orders, families, genera and species etc. In this paper, we propose

and solve a new problem called hierarchical few-shot object de-

tection (Hi-FSOD), which aims to detect objects with hierarchical

categories in the FSOD paradigm. To this end, on the one hand, we

build the rst large-scale and high-quality Hi-FSOD benchmark

dataset HiFSOD-Bird, which contains 176,350 wild-bird images

falling to 1,432 categories. All the categories are organized into a

4-level taxonomy, consisting of 32 orders, 132 families, 572 genera

and 1,432 species. On the other hand, we propose the rst Hi-FSOD

method HiCLPL, where a hierarchical contrastive learning approach

is developed to constrain the feature space so that the feature dis-

tribution of objects is consistent with the hierarchical taxonomy

and the model’s generalization power is strengthened. Meanwhile,

a probabilistic loss is designed to enable the child nodes to correct

the classication errors of their parent nodes in the taxonomy. Ex-

tensive experiments on the benchmark dataset HiFSOD-Bird show

that our method HiCLPL outperforms the existing FSOD methods.

CCS CONCEPTS

•Computing methodologies →Object detection.

∗

Correspondence authors: Jiaogen Zhou (School of Urban and Environmental Sci-

ences, Huaiyin Normal University), Shuigeng Zhou (School of Computer Science, and

Shanghai Key Lab of Intelligent Information Processing, Fudan University.)

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

MM ’22, October 10–14, 2022, Lisboa, Portugal

ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00

https://doi.org/10.1145/3503161.3548412

KEYWORDS

Few-shot object detection; hierarchical few-shot object detection;

Benchmark; hierarchical classication.

ACM Reference Format:

Lu Zhang, Yang Wang, Jiaogen Zhou, Chenbo Zhang, Yinglu Zhang, Jihong

Guan, Yatao Bian, and Shuigeng Zhou. 2022. Hierarchical Few-Shot Object

Detection: Problem, Benchmark and Method. In Proceedings of the 30th

ACM International Conference on Multimedia (MM ’22), Oct. 10–14, 2022,

Lisboa, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/10.

1145/3503161.3548412

1 INTRODUCTION

Existing object detection methods [

] require huge amounts of an-

notated training data. However, in the real world, samples of some

categories are dicult to acquire and the cost to label high-quality

samples can be very high. On the contrary, a child can recognize

and locate elephants or horses in a picture that s/he has never seen

before with only a few examples. Thus, few-shot object detection

(FSOD) [

] is gaining increasing

research interests, which tries to detect novel objects with only a

few labeled examples. However, many objects in real life fall into hi-

erarchical ne-grained category structures. For example, elephants

have dierent families and species, e.g. African elephants and Asian

elephants. And African elephants have two subspecies, which are

African savannah elephants and African forest elephants, so as the

Asian elephants. Obviously, it is dicult for an ordinary people

(let alone a child) to distinguish between an African savannah ele-

phant and an African forest elephant if only a few photos are given.

Moreover, existing FSOD methods do not consider such hierarchical

ne-grained category structures of objects that exist ubiquitously

in real life, thus they cannot cope with such scenarios well.

In this paper, we propose a new problem of

erarchical

ew-

hot

bject

etection,

Hi-FSOD

in short, which aims to perform

few-shot object detection under a hierarchical taxonomy. Obviously,

the FSOD task is a special case of Hi-FSOD when the hierarchical

taxonomy is degenerated to a at category structure. So comparing

to FSOD, Hi-FSOD is more challenging and has wider applications

arXiv:2210.03940v1 [cs.CV] 8 Oct 2022

MM ’22, October 10–14, 2022, Lisboa, Portugal Lu Zhang et al.

root

order

family

genus

species

(a)

(b)

Hierarchical taxonomy

Hierarchical taxonomy T(partial) Feature space

a.1 a.2 a.3 b.1 b.2 b.3

a.1

a.2

a.3

b.1

b.2

b.3

Level 1

Level 2

Level 3

Figure 1: (a) The hierarchical taxonomy of our HiFSOD-Bird

dataset; (b) Illustration of the proposed hierarchical con-

trastive learning, which constrains the feature space such

that the distribution of object features is consistent with the

hierarchical taxonomy.

than FSOD, especially in the scenarios that the number of cate-

gories of objects is huge, where existing FSOD methods are neither

ecient nor eective. To address the Hi-FSOD problem, we have

tackled two major subproblems:

On the one hand, we construct the rst high-quality and large-

scale Hi-FSOD benchmark dataset of wild birds, which is called

HiFSOD-Bird

. Although there are already some datasets of wildlife

for computer vision (CV) tasks [

], most of them are for

classication tasks and a few of them are dedicated to object detec-

tion tasks. Nevertheless, few of them have a strictly hierarchical

organization of categories. Existing FSOD methods perform train-

ing and testing on the modied COCO [

] and VOC [

] datasets

whose label structures are at and contain only 80 and 20 cate-

gories, respectively, which thus are unsuitable for the Hi-FSOD

task. Our HiFSOD-Bird dataset contains totally 1,432 categories and

176,350 bird images with high-quality annotated bounding boxes.

All categories are organized into a 4-level hierarchical taxonomy:

from top to bottom, order, family, genus and species, as shown

in Fig. 1(a). It consists of 32 orders, 132 families, 572 genera and

1,432 species, covering more than 90% of the world’s water birds

and part of forest birds. The bounding boxes and class labels of

each image are manually annotated and carefully double-checked.

Moreover, each category of birds comes with a textual description,

so the dataset can be further used for the zero-shot object detection

task. The HiFSOD-Bird dataset is also of great signicance to the

monitoring and protection of endangered birds, since the samples of

endangered birds are dicult to acquire and the domain knowledge

is mainly from expert annotations.

On the other hand, we develop the rst Hi-FSOD method

Hi-

CLPL

, which is a two-stage method with

erarchical

ontrastive

earning and

robabilistic

oss. Here, hierarchical contrastive learn-

ing (HiCL) is used to constrain the feature space so that the feature

distribution of objects is consistent with the hierarchical category

structure, and the probabilistic loss is designed to enable the child

nodes to correct the classication errors of their parent nodes.

Fig. 1(b) illustrates the HiCL mechanism. We use memories to hold

the prototypes of classes in the hierarchical tree. Then, a hierar-

chical contrastive loss is designed to control the distance between

box features and memories at dierent levels. Finally, we utilize

exponential moving average to update the parameters of memories.

HiCL can boost the generalization power of the model. Meanwhile,

we found that in the process of hierarchical classication from top

to bottom, if a non-leaf node wrongly classies an instance, the

classications of the instance at the descendants nodes are useless.

Therefore, we design a probabilistic loss such that the child nodes

can learn to identify and correct the misclassied samples of their

parent nodes.

In summary, contributions of this paper are as follows: 1) We

propose a new problem of hierarchical few-shot object detection

(Hi-FSOD), which is an extension to the existing FSOD problem, so it

is more challenging and has wider applications. 2) We establish the

rst large-scale and high-quality benchmark dataset HiFSOD-Bird,

specically for the Hi-FSOD problem. 3) We develop the rst Hi-

FSOD method HiCLPL, which uses hierarchical contrastive learning

to constrain the feature space and a probabilistic loss to correct

the classication errors of parent nodes. 4) We conduct extensive

experiments on the benchmark dataset HiFSOD-Bird to evaluate

the proposed method HiCLPL. Experimental results show that our

method HiCLPL outperforms the existing FSOD methods.

2 RELATED WORK

2.1 Few-shot Object Detection

Existing few-shot object detection (FSOD) methods roughly fall

into two types: meta-learning based and ne-tuning based. Meta-

learning based methods [

] learn meta knowledge

from base classes to facilitate model training for novel classes.

Among them, FSRW [

] utilizes a feature re-weighting strategy

to construct a one-stage object detector. Attention-RPN [

] inte-

grates the information of supports into RPN, in order to pay more

attention to the foreground objects relevant to support classes. Meta-

DETR [

] exploits the inter-class correlation to apply the detection

transformer [

] to the FSOD task. We proposed a support-query

mutual guidance strategy that can generate more support-relevant

candidate regions, together with a hybrid loss to enhance the metric

space [

]. Fine-tuning based methods [

] formulate

the FSOD problem in a transfer learning setting. TFA [

] is the

rst work that proposes a two-stage ne-tuning strategy. It rst

trains the entire model on the base classes, and then ne-tunes

the nal classier on a balanced dataset containing base and novel

data. Experiments show that such ne-tuning method is simple yet

very eective. Following TFA, a number of methods are developed.

DeFRCN [

] adopts multi-stage and multi-task decoupling to im-

prove performance. FSCE [

] uses a contrastive learning strategy

to constrain the intra-class similarity and enhance the inter-class

similarity of box features. Nevertheless, existing methods do not

consider the scenarios where object classes form a hierarchical

taxonomy, thus they cannot be directly used to eectively handle

the problem proposed in this paper.

Dierent from these works above, here we address a new problem

— hierarchical few-shot object detection (Hi-FSOD). To this end, we

build a large-scale and high-quality benchmark dataset and develop

an eective method.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HierarchicalFew-ShotObjectDetection:Problem,BenchmarkandMethodLuZhangl_zhang19@fudan.edu.cnFudanUniversityShanghai,ChinaYangWangtongji_wangyang@tongji.edu.cnTongjiUniversityShanghai,ChinaJiaogenZhou∗zhoujg@hytc.edu.cnHuaiyinNormalUniversityHuaian,ChinaChenboZhangcbzhang21@m.fudan.edu.cnFudanUniversi...

展开>> 收起<<

Hierarchical Few-Shot Object Detection Problem Benchmark and Method.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Hierarchical Few-Shot Object Detection Problem Benchmark and Method

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: