
Exploring Eective Knowledge Transfer for Few-shot Object
Detection
Zhiyuan Zhao1
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
zhaozhiyuan@buaa.edu.cn
Qingjie Liu2∗
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
qingjie.liu@buaa.edu.cn
Yunhong Wang3
State Key Laboratory of Virtual
Reality Technology and Systems,
Beihang University
Beijing, China
yhwang@buaa.edu.cn
ABSTRACT
Recently, few-shot object detection (FSOD) has received much at-
tention from the community, and many methods are proposed
to address this problem from a knowledge transfer perspective.
Though promising results have been achieved, these methods fail
to achieve shot-stable: methods that excel in low-shot regimes
are likely to struggle in high-shot regimes, and vice versa. We
believe this is because the primary challenge of FSOD changes
when the number of shots varies. In the low-shot regime, the pri-
mary challenge is the lack of inner-class variation. In the high-
shot regime, as the variance approaches the real one, the main
hindrance to the performance comes from misalignment between
learned and true distributions. However, these two distinct issues
remain unsolved in most existing FSOD methods. In this paper, we
propose to overcome these challenges by exploiting rich knowl-
edge the model has learned and eectively transferring them to the
novel classes. For the low-shot regime, we propose a distribution
calibration method to deal with the lack of inner-class variation
problem. Meanwhile, a shift compensation method is proposed to
compensate for possible distribution shift during ne-tuning. For
the high-shot regime, we propose to use the knowledge learned
from ImageNet as guidance for the feature learning in the ne-
tuning stage, which will implicitly align the distributions of the
novel classes. Although targeted toward dierent regimes, these
two strategies can work together to further improve the FSOD per-
formance. Experiments on both the VOC and COCO benchmarks
show that our proposed method can signicantly outperform the
baseline method and produce competitive results in both low-shot
settings (shot<5) and high-shot settings (shot
≥
5). Code is available
at https://github.com/JulioZhao97/ETrans_Fsdet.git.
CCS CONCEPTS
•Computing methodologies →Object detection.
∗Corresponding Author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MM ’22, October 10–14, 2022, Lisboa, Portugal
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00
https://doi.org/10.1145/3503161.3548062
KEYWORDS
few-shot object detection, knowledge transfer, distribution calibra-
tion, distribution regularization
ACM Reference Format:
Zhiyuan Zhao
1
, Qingjie Liu
2
, and Yunhong Wang
3
. 2022. Exploring Eective
Knowledge Transfer for Few-shot Object Detection. In Proceedings of the
30th ACM International Conference on Multimedia(MM ’22), October 10–14,
2022, Lisboa, Portugal. ACM, New York, NY, USA, 10 pages. https://doi.org/
10.1145/3503161.3548062
1 INTRODUCTION
Today, deep neural networks (DNNs) have achieved outstanding
performance in a wide variety of data-intensive applications. Un-
fortunately, due to the data-driven nature of deep models, their
performance is severely hampered in data-limited scenarios. On
the other hand, humans have an incredible capacity for learning
from a few examples and generalizing to new concepts from a
small amount of information. For instance, by showing a photo-
graph of a stranger to a child once, he/she can rapidly identify this
stranger from a pile of pictures. To acquire this ability, researchers
endeavor to empower the deep models with the quick and robust
learning ability in data-limited scenarios, termed few-shot learn-
ing [4, 21, 43, 53].
As a fundamental task in computer vision, object detection
has witnessed tremendous progress in the last few years, yet it
still suers from the data curse. Precedented by few-shot recog-
nition, eorts have been devoted to addressing few-shot object
detection (FSOD), which is a much more challenging task. Earlier
attempts [
2
] inherit ideas from approaches for few-shot classi-
cation [
3
,
28
] and adapt them to FSOD. For instance, following
the meta-learning paradigm [
36
,
42
], meta-detectors [
11
,
52
,
52
]
are trained on the base classes to learn prior knowledge of the
base classes and then are updated on the novel classes to make
predictions. Another line of work involves the ne-tuning frame-
work. Fine-tuning based FSOD methods normally consist of two
steps: (1) rstly, base detectors are pre-trained on abundant base
classes; (2) secondly, the detectors are ne-tuned on novel classes
for adaption. These methods intend to transfer knowledge learned
on base classes to the novel classes and are known as transfer learn-
ing based methods. However, due to the rarity of the target data,
ne-tuning all parameters of the models is inecient. Wang et
al. [
44
] propose to ne-tune only the classication and regression
branches of the detector while freezing the feature extractor. Their
approach yields competitive results with this simple strategy and
arXiv:2210.02021v1 [cs.CV] 5 Oct 2022