MM ’22, October 10–14, 2022, Lisboa, Portugal Yuqian Fu et al.
on Multimedia (MM ’22), October 10–14, 2022, Lisboa, Portugal. ACM, New
York, NY, USA, 9 pages. https://doi.org/10.1145/3503161.3547995
1 INTRODUCTION
FSL mainly aims at transferring knowledge from a source dataset
to a novel target dataset with only one or few labeled examples.
Generally, FSL assumes that the images of the source and target
datasets belong to the same domain. However, such an ideal as-
sumption may not be easy to be met in real-world multimedia
applications. For example, as revealed in [
9
], a model trained on the
Imagenet [
11
] which is mainly composed of massive and diverse
natural images still fails to recognize the novel ne-grained birds.
To this end, CD-FSL which is dedicated to addressing the domain
gap problem of FSL has invoked rising attention.
Recently, various settings of CD-FSL have been extensively stud-
ied in many previous methods [
13
,
14
,
33
,
40
,
44
]. Most of them [
14
,
40
,
44
] use only the source domain images for training and pay
eorts on improving the generalization ability of the FSL mod-
els. Though some achievements have been made, it is still hard to
achieve very impressive performance due to the huge domain gap
between the source and target datasets. Thus, some works [
13
,
33
]
relax the most basic yet strict setting, and allow target data to be
used during the training phase. More specically, STARTUP [
33
]
proposes to make use of relative massive unlabeled target data,
whilst Meta-FDMixup [
13
] advocates utilizing few limited labeled
target data. Unfortunately, the massive unlabeled examples in the
former one may still be not easy to be obtained in many real-world
applications, such as the recognition of endangered wild animals
and specic buildings. By contrast, learning CD-FSL with few lim-
ited labeled target domain data, e.g., 5 images per class, is more
realistic. Thus, in this paper, we stick to the setting proposed in
Meta-FDMixup [13] to promote the learning process of models.
Formally, given a source domain dataset with enough examples
and an auxiliary target domain dataset with only a few labeled ex-
amples, our goal is to learn a good FSL model taking these two sets
as training data and achieve good results on the novel target data.
Notably, as in Meta-FDMixup, our setting doesn’t violate the basic
FSL setting, as the class sets of the auxiliary target training data and
the novel target testing data are disjoint from each other strictly.
This ensures that none of the novel target categories will appear
during the training stage. Critically, as shown in Figure 1, we high-
light that there are two key challenges: 1) The number of labeled
examples for the source dataset and auxiliary target dataset are
extremely unbalanced. Models learned on such unbalanced training
data will be biased towards the source dataset while performing
much worse on the target dataset. 2) Since the source dataset and
the auxiliary target belong to two distinct domains, it may be too
dicult for a single model to learn knowledge from datasets with
dierent domains simultaneously. Such challenges unfortunately
have less been touched in previous works [13].
To address these challenges, this paper presents a novel
M
ulti-
E
xpert
D
omain
D
ecompositional
N
etwork (
ME-D2N
) for CD-FSL.
Our key solutions are also illustrated in Figure 1. Specically, taking
unbalanced datasets as training data will leads to the model biased
problem [
2
,
49
]. That is, the learned model tends to perform well on
the classes with more examples but has a performance degradation
on the categories with fewer examples. To tackle the data imbal-
ance issue, we propose to build our model upon the multi-expert
learning paradigm. Concretely, rather than learning a model on
the merged data of source and auxiliary target datasets directly, we
train two teacher models on the source and the auxiliary dataset,
respectively. Models trained in this way can be considered experts
in their specialized domain avoiding being aected by training
data of another domain. Then, we transfer the knowledge from
these two teachers to our student model. This is done by using
the knowledge distillation technique which constrains the student
model to produce consistent predictions with the teachers. By dis-
tilling the individual knowledge from both source and target teacher
models, our student model picks up the ability to recognize both
the source and auxiliary target images, avoiding learning from the
unbalanced datasets. We take one step further: considering that
forcing a unied model to learn from teachers of dierent domains
may be nontrivial. Concretely, since each lter in the network needs
to be responsible for extracting all domain features simultaneously,
this vanilla learning method may limit the performance of the net-
work. A natural question is whether it is possible to decompose the
student model into two parts – one for learning from the source
teacher and the another for the auxiliary target teacher? Based on
the above insights, a novel domain decomposition module which
is also termed as D2N is proposed. Specically, our D2N aims at
building a one-to-one correspondence between the network lters
and the domains. That is, each lter is only assigned to be activated
by one specic domain. Technically, we achieve this by proposing a
novel domain-specic gate that learns the activation state of lters
for a specic domain dynamically. We insert the D2N into the fea-
ture extractor of the student model and make it learnable together
with the model parameters.
We conduct extensive experiments on four dierent target datasets.
Results well indicate that our multi-expert learning strategy helps
address the data imbalance problem. Besides, our D2N further im-
proves the performance of the student model showing the advan-
tages of decomposing the student model into two domains.
Contributions.
We summarize our contributions as below: 1) For
the rst time, we introduce the multi-expert learning paradigm
into the task of CD-FSL with few labeled target data to prevent the
model from learning on unbalanced datasets directly. By learning
from two teachers, we avoid our model being biased towards the
source dataset with signicantly more samples. 2) A novel domain
decomposition module (D2N) is proposed to learn to decompose the
model’s lters into the source and target domain-specic parts. The
concept of domain decomposition has less been explored in previous
work, especially for the task of CD-FSL. 3) Extensive experiments
conducted show the eectiveness of our modules and our proposed
full model ME-D2N builds a new state of the art.
2 RELATED WORK
Cross-Domain Few-Shot Learning.
Recent study [
9
] nds that
most of the existing FSL methods [
12
,
15
,
17
,
28
,
37
,
37
,
39
,
41
–
43
,
46
,
52
–
54
] that assume the source and target datasets belong
to the same distribution fail to generalize to novel datasets with a
domain gap. Thus, CD-FSL which aims at addressing FSL across
dierent domains has risen increasing attentions [
4
,
13
,
14
,
18
,
24
,