2 Y. Qi et al.
invariant features in limited samples is a particularly critical problem. On the
other hand, it is complicated to distinguish subtle visual appearance clues on
account of the small differences between categories. Therefore, we consider that
the invariant global structure and the discriminative local details of objects are
both crucial for fine-grained few-shot classification.
To effectively learn latent patterns from few labeled images, many approaches [7,
33] have been proposed in recent years. These methods can be roughly divided
into two branches: the meta-learning methods and the metric learning ones.
Metric learning has attracted more and more attention due to its simplicity and
effectiveness, and our work will focus on such methods. Traditional approaches
such as matching network [29] and relation network [27] usually utilize global
features for recognition. However, the distribution of these image-level global fea-
tures cannot be accurately estimated because of the sparseness of the samples.
In addition, discriminative clues may not be detected only by relying on global
features. CovaMNet [15] and DN4 [16] introduce the deep local descriptors which
are exploited to describe the distribution with each class feature. Furthermore,
although these methods learn abundant features, they deal with each support
class independently and cannot employ the contextual information of the whole
task to generate task-specific features. In conclusion, the importance of different
parts changes with different tasks.
In this paper, we propose a Task-aware Dual Similarity Network(TDSNet)
for fine-grained few-shot learning, which makes full use of both global invari-
ant features and discriminative local details of images. More specifically, first,
a local feature enhancement module is employed to activate discriminative se-
mantic parts by matching the predicted distribution between objects and parts.
Second, in the dual similarity module, the proposed TDSNet calculates the class
prototypes as global invariant features. Especially, in the local similarity branch,
task-aware attention is adopted to select important image patches for the cur-
rent task. By considering the context of the entire support set as a whole, the
key patches in the task are selected and weighted without paying too much at-
tention to the unimportant parts. Finally, both global and local similarities are
employed for the final classification. We conduct comprehensive experiments on
three popular fine-grained datasets to demonstrate the effectiveness of our pro-
posed method. Especially, our method can also have good performance when
there is only one training image.
2 Related Work
Few shot learning. Few-shot learning aims at recognizing unseen classes with
only few samples. The recently popular literature on few-shot learning can be
roughly divided into the following two categories: meta-learning based methods
and metric-learning based methods.
Meta-learning based methods attempt to learn a good optimizer to update
model parameters. MAML [9] is dedicated to learning a good parameter ini-
tialization so that the model can adapt to the new task after training on few