can first apply the base detector to detect the leaf base class
objects (“car”), as well as the candidates for the base super-
classes (“animal” and “other”). These candidates are then
processed by separately trained novel predictors to detect
the novel classes. See Figure 1 for an illustration.
Our hierarchical approach decouples the weights of the
novel class predictors from the base detector. As a result,
our approach retains the performance of the pre-trained
detector on the base classes by design, addressing the “catas-
trophic forgetting” issue. Furthermore, we also introduce
a specialized optimization strategy, based on the Newton’s
method, to speed up the learning of the novel predictors.
By exploiting second-order information, our approach can
adapt to detect the novel classes using only 30 update steps.
Consequently, our approach obtains over 10×speed-up in
computation time, compared to our transfer learning based
baseline TFA [33].
Our contributions can thus be summarized as follows:
•We propose a simple yet effective hierarchical detection
approach which completely alleviates the “catastrophic
forgetting” on base classes, while obtaining competitive
results on the novel classes.
•We present a Newton’s method based optimization
strategy which achieves mush faster convergence than
traditional gradient descent.
•We introduce a new class-refined few-shot detection
task where a method should also be able to learn fine-
grained classification for existing base classes.
II. RELATED WORK
Few-Shot Object Detection: Existing literature mainly
adopt two paradigms to tackle the few-shot object detection
problem: meta learning-based approach [14], [23], [10], [36]
and transfer learning-based approach [8], [33], [17], [11],
[24]. For meta learning-based approach, researchers leverage
the meta-learned task-level knowledge to the detection task
with limited training data. MetaYOLO [14] meta learned
a feature learner module to extract the generic features of
novel objects and a reweighting module to make predictions
provided these features. Fan et al. [10] proposed Attention-
RPN and Multi-Relation Detector to learn a metric space
to measure the similarity of object pairs for detection. Meta-
DETR [36] meta learned an encoder-decoder transformer for
the few-shot detection.
For transfer learning-based approach, LSTD [8] is one of
the early works that adapted the detector learned on data-
abundant objects to the target domain of few-shot novel
objects. Wang et al. [33] proposed the two-stage fine-tuning
approach TFA. In the first stage, a base predictor was
trained for data-abundant base objects. The final layers of
the detector were then tuned in the second stage, on a
balanced few-shot dataset containing both base and novel
classes. This tuning-based approach is simple yet effective,
and outperformed previous methods using meta-learning.
Compared to TFA, LEAST [17] fine-tuned more layers on
novel classes, leading to a better novel class performance,
albeit with a deterioration on the base class performance.
To mitigate this catastrophic forgetting, they further applied
knowledge distillation and the clustered exemplars of base
objects. DeFRCN [24] fine-tuned the entire detector of Faster
R-CNN by jointly training it with two auxiliary modules
to improve novel class performance. Fan et al. [11] pro-
posed Retentive R-CNN, which inherited the tuning approach
of TFA with an auxiliary consistency loss to distill the
knowledge of the base detector. Retentive R-CNN achieved
competitive performance on novel classes, while maintaining
the performance of the pre-trained detector on base classes.
In this paper, we propose an alternate hierarchical detection
approach which can achieve similar results to Retentive R-
CNN, while being much simpler and general.
Incremental Learning and Refined Classification: Incre-
mental learning aims to incrementally learn new knowledge
from a stream of data while preserving its previous knowl-
edge [18], [27], [13], [25], [30], [34], [35], [7]. A real-world
scenario which is often neglected is that over time, humans
learn not only new entities, but also refined granularity of
previously learned entities. Abdelsalam et al. [1] propose the
Incremental Learning and Refined Classification (IIRC) setup
related to this scenario. Here, each class has two granularity
levels of labels to simulate the process of incremental learn-
ing from coarse-grained categories to fine-grained categories.
Following the IIRC setup, Wang et al. [32] proposed HCV
to learn the fine-grained categories while retaining previous
knowledge. HCV aims to identify hierarchical relationship
between classes and exploit this knowledge for the IIRC task.
Hierarchy for few-shot learning: Li et al. [16] perform
large-scale few-shot learning by using class hierarchy which
encodes semantic relations between base and novel classes.
The prior knowledge from class hierarchy is used to learn
transferrable visual features. Liu et al. [21] use class hi-
erarchy to perform coarse-to-fine classification. In contrast
to these works, we show that the idea of hierarchy can be
effectively used to address the “catastrophic forgetting” issue
in few-shot detection.
Optimization Methods for Few-Shot Learning: Bertinetto
et al. [4] noted that updating only the parameters sensitive
to specific classes for few-shot classification task leads to a
shallow learning problem. This enables developing adapta-
tion strategies that are more efficient than standard gradient
descent. Consequently, they proposed ridge and sigmoid
regression based classifiers with closed-form solutions to
achieve fast convergence for the meta-learning-based few-
shot classification. Lee et al. [15] meta-learn representations
for few-shot classification using discriminative linear classi-
fiers. Several works have utilized the steepest-descent opti-
mization strategy to train shallow learners for tackling few-
shot learning problem arising in object tracking [9], [29], [5],
video object segmentation [6] and classification [31]. A few
works [2], [3] have employed conjugate gradient (CG) as a
black box optimization tool for object detection. In this work,
we develop a specialized optimization strategy based on CG
to perform efficient few-shot detection. By running extensive
experiments, we show that our optimization approach obtains
similar performance to SGD while being much faster.