Exploring Effective Knowledge Transfer for Few-shot Object Detection Zhiyuan Zhao1

2025-04-27 0 0 7.2MB 11 页 10玖币

侵权投诉

Exploring Eective Knowledge Transfer for Few-shot Object

Detection

Zhiyuan Zhao1

State Key Laboratory of Virtual

Reality Technology and Systems,

Beihang University

Beijing, China

zhaozhiyuan@buaa.edu.cn

Qingjie Liu2∗

State Key Laboratory of Virtual

Reality Technology and Systems,

Beihang University

Beijing, China

qingjie.liu@buaa.edu.cn

Yunhong Wang3

State Key Laboratory of Virtual

Reality Technology and Systems,

Beihang University

Beijing, China

yhwang@buaa.edu.cn

ABSTRACT

Recently, few-shot object detection (FSOD) has received much at-

tention from the community, and many methods are proposed

to address this problem from a knowledge transfer perspective.

Though promising results have been achieved, these methods fail

to achieve shot-stable: methods that excel in low-shot regimes

are likely to struggle in high-shot regimes, and vice versa. We

believe this is because the primary challenge of FSOD changes

when the number of shots varies. In the low-shot regime, the pri-

mary challenge is the lack of inner-class variation. In the high-

shot regime, as the variance approaches the real one, the main

hindrance to the performance comes from misalignment between

learned and true distributions. However, these two distinct issues

remain unsolved in most existing FSOD methods. In this paper, we

propose to overcome these challenges by exploiting rich knowl-

edge the model has learned and eectively transferring them to the

novel classes. For the low-shot regime, we propose a distribution

calibration method to deal with the lack of inner-class variation

problem. Meanwhile, a shift compensation method is proposed to

compensate for possible distribution shift during ne-tuning. For

the high-shot regime, we propose to use the knowledge learned

from ImageNet as guidance for the feature learning in the ne-

tuning stage, which will implicitly align the distributions of the

novel classes. Although targeted toward dierent regimes, these

two strategies can work together to further improve the FSOD per-

formance. Experiments on both the VOC and COCO benchmarks

show that our proposed method can signicantly outperform the

baseline method and produce competitive results in both low-shot

settings (shot<5) and high-shot settings (shot

≥

5). Code is available

at https://github.com/JulioZhao97/ETrans_Fsdet.git.

CCS CONCEPTS

•Computing methodologies →Object detection.

∗Corresponding Author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

MM ’22, October 10–14, 2022, Lisboa, Portugal

ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00

https://doi.org/10.1145/3503161.3548062

KEYWORDS

few-shot object detection, knowledge transfer, distribution calibra-

tion, distribution regularization

ACM Reference Format:

Zhiyuan Zhao

, Qingjie Liu

, and Yunhong Wang

. 2022. Exploring Eective

Knowledge Transfer for Few-shot Object Detection. In Proceedings of the

30th ACM International Conference on Multimedia(MM ’22), October 10–14,

2022, Lisboa, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/

10.1145/3503161.3548062

1 INTRODUCTION

Today, deep neural networks (DNNs) have achieved outstanding

performance in a wide variety of data-intensive applications. Un-

fortunately, due to the data-driven nature of deep models, their

performance is severely hampered in data-limited scenarios. On

the other hand, humans have an incredible capacity for learning

from a few examples and generalizing to new concepts from a

small amount of information. For instance, by showing a photo-

graph of a stranger to a child once, he/she can rapidly identify this

stranger from a pile of pictures. To acquire this ability, researchers

endeavor to empower the deep models with the quick and robust

learning ability in data-limited scenarios, termed few-shot learn-

ing [4, 21, 43, 53].

As a fundamental task in computer vision, object detection

has witnessed tremendous progress in the last few years, yet it

still suers from the data curse. Precedented by few-shot recog-

nition, eorts have been devoted to addressing few-shot object

detection (FSOD), which is a much more challenging task. Earlier

attempts [

] inherit ideas from approaches for few-shot classi-

cation [

] and adapt them to FSOD. For instance, following

the meta-learning paradigm [

], meta-detectors [

]

are trained on the base classes to learn prior knowledge of the

base classes and then are updated on the novel classes to make

predictions. Another line of work involves the ne-tuning frame-

work. Fine-tuning based FSOD methods normally consist of two

steps: (1) rstly, base detectors are pre-trained on abundant base

classes; (2) secondly, the detectors are ne-tuned on novel classes

for adaption. These methods intend to transfer knowledge learned

on base classes to the novel classes and are known as transfer learn-

ing based methods. However, due to the rarity of the target data,

ne-tuning all parameters of the models is inecient. Wang et

al. [

] propose to ne-tune only the classication and regression

branches of the detector while freezing the feature extractor. Their

approach yields competitive results with this simple strategy and

arXiv:2210.02021v1 [cs.CV] 5 Oct 2022

MM ’22, October 10–14, 2022, Lisboa, Portugal Zhiyuan Zhao, Qingjie Liu, & Yunhong Wang

cat

(base)

horse

(base)

dog

(novel)

low-shot

lack of

variation

true distribution

learned distribution

true class boundary

misalignment

high-shot

learned class boundary

car

(novel) …..

…..

Figure 1: An illustration of the cause of shot-unstable. This gure depicts a high-shot case (shot=5) and a low-shot case (shot=1).

We believe as the number of shot increases, the main challenges of FSOD also changes. (1) In the low-shot regime, the key issue

of FSOD is the lack of variation in novel classes. (2) In the high-shot regime, the main challenge changes to the distribution

misalignment.

reinvigorates researchers’ interest in transfer learning methodol-

ogy [6, 59].

Despite the progress made, the performance of existing trans-

fer learning based FSOD methods are still far from satisfying. We

notice that most of these methods that perform well in low-shot

regimes are likely to be inferior in high-shot regimes, and vice

versa. In other words, these methods are incapable of achieving

shot-stable. We believe that this is because the primary challenges

in the low-shot regime and the high-shot regime are very dierent

from each other (as shown in Figure 1). In the low-shot regimes, the

primary diculty is the lack of inner-class variation. In the high-

shot regimes, to improve the detection performance, the key factor

is to tackle misalignment between learned distributions and true

distributions. However, in most existing transfer learning based

methods, these two distinct issues are ignored thus their perfor-

mance is unstable across dierent shots.

To overcome these unsolved issues, a key point is to exploit the

rich knowledge the model has learned in previous stages. To this

end, we propose a calibration and regularization based method that

enables eective knowledge transfer. Specically, we propose a

distribution calibration method to deal with the lack of variation

issue for the low-shot regime. We calibrate the biased novel class

distributions with the base class distributions. Then synthetic train-

ing features are sampled from calibrated novel distributions and

added to training subsequently. In this calibration-and-generation

manner, the inner-class variation of novel classes can be greatly

enriched. Besides, we notice that there exist distribution shift of

base classes due to the ne-tuning process, which may mislead

novel class distributions and limit the eectiveness of calibration.

To overcome this limitation, we present a strategy of compensating

for possible distribution shift, namely shift compensation.

In the high-shot regime, we propose to use the knowledge learned

from ImageNet as guidance for feature learning in the ne-tuning

stage, which will implicitly align the distributions of the novel

classes. This is inspired by the recent studies that ImageNet fea-

tures are stable and expressive and thus can be used as teachers for

down-stream task learning [

]. We have tried the base detector

model as the teacher, which also adopts the ImageNet pre-trained

backbone, however, receiving unsatisfactory results. We suspect the

ImageNet features are corrupted by the base class training. In this

paper, the knowledge transfer from ImageNet features is achieved

with a regularization loss. Although the two solutions are targeted

for dierent regimes of FSOD, they can be combined together for

performance boost. Our contributions are as follows:

(1) We investigate the fundamental cause of the shot-unstable

problem of FSOD: the key issue for FSOD in the low-shot regime

and high-regime are dierent from each other and needs targeted

solution.

(2) We propose two eective knowledge transfer strategies tar-

geted for the low-shot and high-shot regimes of FSOD, respectively.

The two strategies can be combined together for further improve-

ment of FSOD.

(3) Experiments on VOC and COCO benchmarks show that

our method signicantly outperforms the baseline method and

achieves competitive results in both low-shot (shot<5) and high-

shot (shot≥5) regimes.

Exploring Eective Knowledge Transfer for Few-shot Object Detection MM ’22, October 10–14, 2022, Lisboa, Portugal

2 RELATED WORK

2.1 Object Detection

Modern object detectors are built on top of deep neural networks.

These methods can be broadly divided into single-stage detectors

and two-stage detectors. Single-stage detectors are usually with

high detection eciency however a relatively low detection accu-

racy [

]. Two-stage detectors mostly refer to Faster-

RCNN [

] and its derivatives [

]. They usually attain

higher performance than single-stage detectors thanks to the stage-

wise rening pipeline. Also, this exible architecture makes them

easily adaptable to extended tasks such as FSOD. In addition to these

two families, some anchor-free detection methods [

]

are proposed to release detectors from burdensome anchor settings.

2.2 Few-shot Object Detection

Few-shot object detection approaches can be broadly grouped into

three branches: transfer learning based methods, metric-learning

based methods, and meta-learning based methods. Transfer learn-

ing based methods mainly focus on transferring knowledge from

base classes to novel classes and unleashing the potential of ne-

tuning [

]. Chen et al. [

] combine the advantages of Faster

RCNN and SSD to alleviate the transfer diculties from the source

domain to the target domain. Recently, Wang et al. [

] rekindle

the interest in transfer learning by showing its potential in im-

proving FSOD, which inspires a lot of follow-up works [

Their proposed method, named TFA, freezes the parameters of

the model trained on the base classes and only ne-tunes the de-

tection head with the novel classes. Meta-learning based meth-

ods follow the ideas of meta-learning in few-shot classication

task [

], which intend to learn generic

knowledge across base classes and then generalize to novel classes.

Specically, Yan et al. [

] propose to conduct meta-learning over

RoI regions. Kang et al. [

] design a few-shot detection method

using a meta feature learner and a reweighting module.

Metric-learning based methods focus on learning better represen-

tations [

]. Leonid et al. [

] propose a sub-network to

learn an embedding space and apply it for novel class detection. Sun

et al. [

] adopt supervised contrastive learning to learn better rep-

resentations, which is implemented with an additional contrastive

branch to guide RoI feature learning. Except for these methods,

there are some other interesting works proposed for FSOD. Zhang

et al. [

] ’s work demonstrates that hallucination is helpful for

few-shot detection. Wu et al. [

] believe that there is a universal

prototype across all categories, with which the features learned

from base classes can be generalized well to novel classes. Qiao et

al. [

] propose DeFRCN, which largely improves few-shot detec-

tion performance through multi-stage and multi-task decoupling.

2.3 Distribution Calibration

The goal of Distribution Calibration (DC) is to align a target dis-

tribution to a reference distribution. The idea has been used for

solving many unbalanced distribution problems. In [

], Zhang et al.

investigate the performance bottleneck of the two-stage learning

framework and proposed a unied distribution alignment strat-

egy for long-tail visual recognition. Distribution calibration is also

Base

Classes

RoI Feat

Extractor

RPN

Bbox

Classifier

Bbox

Regressor

Base

Classes

Novel

Classes

Parameter

Fixed

Stage 1: Base Training

Stage 2: Few-shot Fine-tuning

RoIAlign

RoI Feat

Extractor

RPN

Bbox

Classifier

Bbox

Regressor

RoIAlign

Backbone

Figure 2: Illustration of baseline method TFA. The learning

procedure of TFA consists of 2 stages: (1) base training and

(2) few-shot ne-tuning. During ne-tuning, only parame-

ters of the detection head are updated.

adopted in regression tasks [

]. Their work shows that predic-

tions from previously trained regression models can be improved

through distribution calibration. In [

], Shen et al. propose to learn

a binary network by calibrating the latent representation through

a teacher-student paradigm. Recently, Yang et al. [

] propose to

reuse the statistics from many-shot classes and transfer them to

better estimate the distributions of the few-shot classes according

to their class similarities.

3 OUR METHOD

Before delving into the details of our proposed method, we rst

review the basic problem setting of FSOD and then take a brief look

at our baseline method TFA.

3.1 Problem Setting

We follow the FSOD setting introduced in [

]. Given two sets of

classes: base classes

𝐶𝑏

and novel classes

𝐶𝑛

, the learning procedure

of few-shot detection is normally divided into two stages. The rst

stage is training a base model on sucient training data of base

classes

𝐶𝑏

. On the second stage, model is ne-tuned on novel classes

𝐶𝑛

𝐾

samples (normally

𝐾≤

10) for each class are used in the

second stage ne-tuning. It is worth noting that there is no overlap

between

𝐶𝑏

and

𝐶𝑛

, that is

𝐶𝑏∩𝐶𝑛=∅

. In the second stage, to

preserve the performance on base classes, model is ne-tuned on a

balanced set containing training samples from both base classes

𝐶𝑏

and novel classes 𝐶𝑛.

3.2 Review of TFA

TFA is built on top of a two-stage detector Faster-RCNN. It consists

of two learning stages. In the rst stage, the Faster-RCNN is trained

on base classes

𝐶𝑏

. In the second stage, the Faster-RCNN model is

ne-tuned on novel classes

𝐶𝑛

, where each class contains

𝐾

train-

ing samples. Notably, during the ne-tuning stage, parameters of

the backbone network and RoI feature extractor are xed, only

parameters of the prediction head, i.e., the classication branch

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExploringEffectiveKnowledgeTransferforFew-shotObjectDetectionZhiyuanZhao1StateKeyLaboratoryofVirtualRealityTechnologyandSystems,BeihangUniversityBeijing,Chinazhaozhiyuan@buaa.edu.cnQingjieLiu2∗StateKeyLaboratoryofVirtualRealityTechnologyandSystems,BeihangUniversityBeijing,Chinaqingjie.liu@buaa.edu.c...

展开>> 收起<<

Exploring Effective Knowledge Transfer for Few-shot Object Detection Zhiyuan Zhao1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploring Effective Knowledge Transfer for Few-shot Object Detection Zhiyuan Zhao1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: