
CD-FSOD: A BENCHMARK FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION
Wuti Xiong
Center for Machine Vision and Signal Analysis, University of Oulu, Finland
wuti.xiong@oulu.fi
ABSTRACT
In this paper, we propose a study of the cross-domain few-
shot object detection (CD-FSOD) benchmark, consisting of
image data from a diverse data domain. On the proposed
benchmark, we evaluate state-of-art FSOD approaches, in-
cluding meta-learning FSOD approaches and fine-tuning
FSOD approaches. The results show that these methods tend
to fall, and even underperform the naive fine-tuning model.
We analyze the reasons for their failure and introduce a strong
baseline that uses a mutually-beneficial manner to alleviate
the overfitting problem. Our approach is remarkably superior
to existing approaches by significant margins (2.0% on av-
erage) on the proposed benchmark. Our code is available at
https://github.com/FSOD/CD-FSOD.
Index Terms—Few-shot Object Detection, Cross-domain.
1. INTRODUCTION
Few-shot object detection (FSOD) aims to detect novel
classes of objects with a few annotated instances. In the
previous FSOD setting [1,2], a detector is pre-training on the
source dataset consisting of base classes and then transferred
into the target dataset consisting of novel classes with few
instances, where base classes and novel classes are disjoint
but share similar data domains. However, this underlying
assumption does not apply to some real-world scenarios be-
cause it is difficult or impossible to collect a sufficient amount
of data in these domains. This leads to a new FSOD prob-
lem, where the detector must resort to pre-training in the
base classes from a different domain. In these cases, even
humans have trouble recognizing new categories that vary
too greatly between examples or differ from prior experience
[3,4]. Thus, finding new approaches to tackle the problem
remains a challenging but desirable goal.
Although conventional FSOD benchmarks [1,2] are well
established, no works study FSOD across different domains.
To fill this gap, In this paper, we introduce the study of Cross-
Domain Few-Shot Object Detection (CD-FSOD) benchmark
(As shown in Figure 1), which covers three target datasets:
ArTaxOr [6], UODD [7] and DIOR [8]. On the proposed
benchmark, we conduct extensive experiments to evaluate ex-
isting FSOD approaches (including meta-learning approaches
Source Domain
Decreasing similarity to MS COCO
Target Domains (Disjoint label spaces)
MS COCO ArTaxOr UODD DIOR
Fig. 1: The CD-FSOD benchmark. MS COCO [5] is used
for source training, and domains of varying dissimilarity from
natural images are used for target evaluation.
[2,9,10] and fine-tuning approaches [11,12,13]). The results
show that existing FSOD approaches can not achieve satis-
factory performance and even underperform the naive fine-
tuning model due to freezing parameters. Even without freez-
ing parameters, fine-tuning methods struggle to outperform
the naive transfer model while meta-learning methods still
fail. This finding shows that existing FSOD methods cannot
work for CD-FSOD, and there is an urgent need to develop
new methods.
Besides, we introduce a novel distillation-based baseline,
which enable a “flywheel effect” that the student and teacher
can mutually reinforce each other so that both get better and
better as the training goes on. Specifically, EMA (Exponential
Moving Average) enables the teacher model to ensemble the
student models in different time steps. The student’s weights
are optimized by the distillation loss between the pseudo-
labels generated by the teacher and the predictions by the stu-
dents on the same image. Our approach outperforms existing
FSOD approaches by a large margin on the proposed bench-
mark. In summary, our main contributions are as follows:
(1)we established the CD-FSOD benchmark, where there is
a very large domain difference between the base and target
datasets; 2) on the proposed benchmark, we evaluate existing
FSOD approaches, and analyze the reasons for their failure;
3) we introduce a strong baseline that achieves state-of-the-art
performance on the proposed benchmark.
arXiv:2210.05311v3 [cs.CV] 3 May 2023