Exploring Effective Knowledge Transfer for Few-shot Object Detection Zhiyuan Zhao1

2025-04-27 0 0 7.2MB 11 页 10玖币
侵权投诉
Exploring Eective Knowledge Transfer for Few-shot Object
Detection
Zhiyuan Zhao1
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
zhaozhiyuan@buaa.edu.cn
Qingjie Liu2
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
qingjie.liu@buaa.edu.cn
Yunhong Wang3
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
yhwang@buaa.edu.cn
ABSTRACT
Recently, few-shot object detection (FSOD) has received much at-
tention from the community, and many methods are proposed
to address this problem from a knowledge transfer perspective.
Though promising results have been achieved, these methods fail
to achieve shot-stable: methods that excel in low-shot regimes
are likely to struggle in high-shot regimes, and vice versa. We
believe this is because the primary challenge of FSOD changes
when the number of shots varies. In the low-shot regime, the pri-
mary challenge is the lack of inner-class variation. In the high-
shot regime, as the variance approaches the real one, the main
hindrance to the performance comes from misalignment between
learned and true distributions. However, these two distinct issues
remain unsolved in most existing FSOD methods. In this paper, we
propose to overcome these challenges by exploiting rich knowl-
edge the model has learned and eectively transferring them to the
novel classes. For the low-shot regime, we propose a distribution
calibration method to deal with the lack of inner-class variation
problem. Meanwhile, a shift compensation method is proposed to
compensate for possible distribution shift during ne-tuning. For
the high-shot regime, we propose to use the knowledge learned
from ImageNet as guidance for the feature learning in the ne-
tuning stage, which will implicitly align the distributions of the
novel classes. Although targeted toward dierent regimes, these
two strategies can work together to further improve the FSOD per-
formance. Experiments on both the VOC and COCO benchmarks
show that our proposed method can signicantly outperform the
baseline method and produce competitive results in both low-shot
settings (shot<5) and high-shot settings (shot
5). Code is available
at https://github.com/JulioZhao97/ETrans_Fsdet.git.
CCS CONCEPTS
Computing methodologies Object detection.
Corresponding Author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MM ’22, October 10–14, 2022, Lisboa, Portugal
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00
https://doi.org/10.1145/3503161.3548062
KEYWORDS
few-shot object detection, knowledge transfer, distribution calibra-
tion, distribution regularization
ACM Reference Format:
Zhiyuan Zhao
1
, Qingjie Liu
2
, and Yunhong Wang
3
. 2022. Exploring Eective
Knowledge Transfer for Few-shot Object Detection. In Proceedings of the
30th ACM International Conference on Multimedia(MM ’22), October 10–14,
2022, Lisboa, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/
10.1145/3503161.3548062
1 INTRODUCTION
Today, deep neural networks (DNNs) have achieved outstanding
performance in a wide variety of data-intensive applications. Un-
fortunately, due to the data-driven nature of deep models, their
performance is severely hampered in data-limited scenarios. On
the other hand, humans have an incredible capacity for learning
from a few examples and generalizing to new concepts from a
small amount of information. For instance, by showing a photo-
graph of a stranger to a child once, he/she can rapidly identify this
stranger from a pile of pictures. To acquire this ability, researchers
endeavor to empower the deep models with the quick and robust
learning ability in data-limited scenarios, termed few-shot learn-
ing [4, 21, 43, 53].
As a fundamental task in computer vision, object detection
has witnessed tremendous progress in the last few years, yet it
still suers from the data curse. Precedented by few-shot recog-
nition, eorts have been devoted to addressing few-shot object
detection (FSOD), which is a much more challenging task. Earlier
attempts [
2
] inherit ideas from approaches for few-shot classi-
cation [
3
,
28
] and adapt them to FSOD. For instance, following
the meta-learning paradigm [
36
,
42
], meta-detectors [
11
,
52
,
52
]
are trained on the base classes to learn prior knowledge of the
base classes and then are updated on the novel classes to make
predictions. Another line of work involves the ne-tuning frame-
work. Fine-tuning based FSOD methods normally consist of two
steps: (1) rstly, base detectors are pre-trained on abundant base
classes; (2) secondly, the detectors are ne-tuned on novel classes
for adaption. These methods intend to transfer knowledge learned
on base classes to the novel classes and are known as transfer learn-
ing based methods. However, due to the rarity of the target data,
ne-tuning all parameters of the models is inecient. Wang et
al. [
44
] propose to ne-tune only the classication and regression
branches of the detector while freezing the feature extractor. Their
approach yields competitive results with this simple strategy and
arXiv:2210.02021v1 [cs.CV] 5 Oct 2022
MM ’22, October 10–14, 2022, Lisboa, Portugal Zhiyuan Zhao, Qingjie Liu, & Yunhong Wang
cat
(base)
horse
(base)
dog
(novel)
low-shot
lack of
variation
true distribution
learned distribution
true class boundary
misalignment
high-shot
learned class boundary
car
(novel) …..
…..
Figure 1: An illustration of the cause of shot-unstable. This gure depicts a high-shot case (shot=5) and a low-shot case (shot=1).
We believe as the number of shot increases, the main challenges of FSOD also changes. (1) In the low-shot regime, the key issue
of FSOD is the lack of variation in novel classes. (2) In the high-shot regime, the main challenge changes to the distribution
misalignment.
reinvigorates researchers’ interest in transfer learning methodol-
ogy [6, 59].
Despite the progress made, the performance of existing trans-
fer learning based FSOD methods are still far from satisfying. We
notice that most of these methods that perform well in low-shot
regimes are likely to be inferior in high-shot regimes, and vice
versa. In other words, these methods are incapable of achieving
shot-stable. We believe that this is because the primary challenges
in the low-shot regime and the high-shot regime are very dierent
from each other (as shown in Figure 1). In the low-shot regimes, the
primary diculty is the lack of inner-class variation. In the high-
shot regimes, to improve the detection performance, the key factor
is to tackle misalignment between learned distributions and true
distributions. However, in most existing transfer learning based
methods, these two distinct issues are ignored thus their perfor-
mance is unstable across dierent shots.
To overcome these unsolved issues, a key point is to exploit the
rich knowledge the model has learned in previous stages. To this
end, we propose a calibration and regularization based method that
enables eective knowledge transfer. Specically, we propose a
distribution calibration method to deal with the lack of variation
issue for the low-shot regime. We calibrate the biased novel class
distributions with the base class distributions. Then synthetic train-
ing features are sampled from calibrated novel distributions and
added to training subsequently. In this calibration-and-generation
manner, the inner-class variation of novel classes can be greatly
enriched. Besides, we notice that there exist distribution shift of
base classes due to the ne-tuning process, which may mislead
novel class distributions and limit the eectiveness of calibration.
To overcome this limitation, we present a strategy of compensating
for possible distribution shift, namely shift compensation.
In the high-shot regime, we propose to use the knowledge learned
from ImageNet as guidance for feature learning in the ne-tuning
stage, which will implicitly align the distributions of the novel
classes. This is inspired by the recent studies that ImageNet fea-
tures are stable and expressive and thus can be used as teachers for
down-stream task learning [
9
,
25
]. We have tried the base detector
model as the teacher, which also adopts the ImageNet pre-trained
backbone, however, receiving unsatisfactory results. We suspect the
ImageNet features are corrupted by the base class training. In this
paper, the knowledge transfer from ImageNet features is achieved
with a regularization loss. Although the two solutions are targeted
for dierent regimes of FSOD, they can be combined together for
performance boost. Our contributions are as follows:
(1) We investigate the fundamental cause of the shot-unstable
problem of FSOD: the key issue for FSOD in the low-shot regime
and high-regime are dierent from each other and needs targeted
solution.
(2) We propose two eective knowledge transfer strategies tar-
geted for the low-shot and high-shot regimes of FSOD, respectively.
The two strategies can be combined together for further improve-
ment of FSOD.
(3) Experiments on VOC and COCO benchmarks show that
our method signicantly outperforms the baseline method and
achieves competitive results in both low-shot (shot<5) and high-
shot (shot5) regimes.
Exploring Eective Knowledge Transfer for Few-shot Object Detection MM ’22, October 10–14, 2022, Lisboa, Portugal
2 RELATED WORK
2.1 Object Detection
Modern object detectors are built on top of deep neural networks.
These methods can be broadly divided into single-stage detectors
and two-stage detectors. Single-stage detectors are usually with
high detection eciency however a relatively low detection accu-
racy [
20
,
22
,
29
,
30
]. Two-stage detectors mostly refer to Faster-
RCNN [
32
] and its derivatives [
1
,
10
,
24
,
49
]. They usually attain
higher performance than single-stage detectors thanks to the stage-
wise rening pipeline. Also, this exible architecture makes them
easily adaptable to extended tasks such as FSOD. In addition to these
two families, some anchor-free detection methods [
15
,
40
,
55
,
61
,
62
]
are proposed to release detectors from burdensome anchor settings.
2.2 Few-shot Object Detection
Few-shot object detection approaches can be broadly grouped into
three branches: transfer learning based methods, metric-learning
based methods, and meta-learning based methods. Transfer learn-
ing based methods mainly focus on transferring knowledge from
base classes to novel classes and unleashing the potential of ne-
tuning [
16
,
52
,
58
]. Chen et al. [
2
] combine the advantages of Faster
RCNN and SSD to alleviate the transfer diculties from the source
domain to the target domain. Recently, Wang et al. [
44
] rekindle
the interest in transfer learning by showing its potential in im-
proving FSOD, which inspires a lot of follow-up works [
38
,
59
].
Their proposed method, named TFA, freezes the parameters of
the model trained on the base classes and only ne-tunes the de-
tection head with the novel classes. Meta-learning based meth-
ods follow the ideas of meta-learning in few-shot classication
task [
7
,
17
,
23
,
27
,
31
,
34
,
36
,
42
], which intend to learn generic
knowledge across base classes and then generalize to novel classes.
Specically, Yan et al. [
52
] propose to conduct meta-learning over
RoI regions. Kang et al. [
11
] design a few-shot detection method
using a meta feature learner and a reweighting module.
Metric-learning based methods focus on learning better represen-
tations [
13
,
33
,
39
,
60
]. Leonid et al. [
12
] propose a sub-network to
learn an embedding space and apply it for novel class detection. Sun
et al. [
38
] adopt supervised contrastive learning to learn better rep-
resentations, which is implemented with an additional contrastive
branch to guide RoI feature learning. Except for these methods,
there are some other interesting works proposed for FSOD. Zhang
et al. [
59
] ’s work demonstrates that hallucination is helpful for
few-shot detection. Wu et al. [
47
] believe that there is a universal
prototype across all categories, with which the features learned
from base classes can be generalized well to novel classes. Qiao et
al. [
25
] propose DeFRCN, which largely improves few-shot detec-
tion performance through multi-stage and multi-task decoupling.
2.3 Distribution Calibration
The goal of Distribution Calibration (DC) is to align a target dis-
tribution to a reference distribution. The idea has been used for
solving many unbalanced distribution problems. In [
57
], Zhang et al.
investigate the performance bottleneck of the two-stage learning
framework and proposed a unied distribution alignment strat-
egy for long-tail visual recognition. Distribution calibration is also
Base
Classes
RoI Feat
Extractor
RPN
Bbox
Classifier
Bbox
Regressor
Base
Classes
Novel
Classes
Parameter
Fixed
Stage 1: Base Training
Stage 2: Few-shot Fine-tuning
RoIAlign
RoI Feat
Extractor
RPN
Bbox
Classifier
Bbox
Regressor
RoIAlign
Backbone
Backbone
Figure 2: Illustration of baseline method TFA. The learning
procedure of TFA consists of 2 stages: (1) base training and
(2) few-shot ne-tuning. During ne-tuning, only parame-
ters of the detection head are updated.
adopted in regression tasks [
37
]. Their work shows that predic-
tions from previously trained regression models can be improved
through distribution calibration. In [
35
], Shen et al. propose to learn
a binary network by calibrating the latent representation through
a teacher-student paradigm. Recently, Yang et al. [
54
] propose to
reuse the statistics from many-shot classes and transfer them to
better estimate the distributions of the few-shot classes according
to their class similarities.
3 OUR METHOD
Before delving into the details of our proposed method, we rst
review the basic problem setting of FSOD and then take a brief look
at our baseline method TFA.
3.1 Problem Setting
We follow the FSOD setting introduced in [
11
]. Given two sets of
classes: base classes
𝐶𝑏
and novel classes
𝐶𝑛
, the learning procedure
of few-shot detection is normally divided into two stages. The rst
stage is training a base model on sucient training data of base
classes
𝐶𝑏
. On the second stage, model is ne-tuned on novel classes
𝐶𝑛
.
𝐾
samples (normally
𝐾
10) for each class are used in the
second stage ne-tuning. It is worth noting that there is no overlap
between
𝐶𝑏
and
𝐶𝑛
, that is
𝐶𝑏𝐶𝑛=
. In the second stage, to
preserve the performance on base classes, model is ne-tuned on a
balanced set containing training samples from both base classes
𝐶𝑏
and novel classes 𝐶𝑛.
3.2 Review of TFA
TFA is built on top of a two-stage detector Faster-RCNN. It consists
of two learning stages. In the rst stage, the Faster-RCNN is trained
on base classes
𝐶𝑏
. In the second stage, the Faster-RCNN model is
ne-tuned on novel classes
𝐶𝑛
, where each class contains
𝐾
train-
ing samples. Notably, during the ne-tuning stage, parameters of
the backbone network and RoI feature extractor are xed, only
parameters of the prediction head, i.e., the classication branch
摘要:

ExploringEffectiveKnowledgeTransferforFew-shotObjectDetectionZhiyuanZhao1StateKeyLaboratoryofVirtualRealityTechnologyandSystems,BeihangUniversityBeijing,Chinazhaozhiyuan@buaa.edu.cnQingjieLiu2∗StateKeyLaboratoryofVirtualRealityTechnologyandSystems,BeihangUniversityBeijing,Chinaqingjie.liu@buaa.edu.c...

展开>> 收起<<
Exploring Effective Knowledge Transfer for Few-shot Object Detection Zhiyuan Zhao1.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:7.2MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注