A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning Yan Qi1 Han Sun1 Ningzhong Liu1 and Huiyu Zhou2

2025-04-30 0 0 880.46KB 12 页 10玖币
侵权投诉
A Task-aware Dual Similarity Network for
Fine-grained Few-shot Learning
Yan Qi1, Han Sun1( ), Ningzhong Liu1, and Huiyu Zhou2
1Nanjing University of Aeronautics and Astronautics, Jiangsu Nanjing, China
2School of Computing and Mathematical Sciences, University of Leicester, U.K
sunhan@nuaa.edu.cn
Abstract. The goal of fine-grained few-shot learning is to recognize
sub-categories under the same super-category by learning few labeled
samples. Most of the recent approaches adopt a single similarity mea-
sure, that is, global or local measure alone. However, for fine-grained
images with high intra-class variance and low inter-class variance, ex-
ploring global invariant features and discriminative local details is quite
essential. In this paper, we propose a Task-aware Dual Similarity Net-
work(TDSNet), which applies global features and local patches to achieve
better performance. Specifically, a local feature enhancement module is
adopted to activate the features with strong discriminability. Besides,
task-aware attention exploits the important patches among the entire
task. Finally, both the class prototypes obtained by global features and
discriminative local patches are employed for prediction. Extensive ex-
periments on three fine-grained datasets demonstrate that the proposed
TDSNet achieves competitive performance by comparing with other state-
of-the-art algorithms.
Keywords: Fine-grained image classification ·Few-shot learning ·Fea-
ture enhancement
1 Introduction
As one of the most important problems in the field of artificial intelligence,
fine-grained image classification [6, 8] aims to identify objects of sub-categories
under the same super-category. Different from the traditional image classification
task [21,22], the images of sub-categories are similar to each other, which makes
fine-grained recognition still a popular and challenging topic in computer vision.
Benefiting from the development of Convolution Neural Networks (CNNs),
fine-grained image classification has made significant progress. Most approaches
typically rely on supervision from a large number of labeled samples. In con-
trast, humans can identify new classes with only few labeled examples. Recently,
some studies [25, 31] focus on a more challenging setting, which aims to recog-
nize fine-grained images from few samples, and is called fine-grained few-shot
learning(FG-FSL). Learning from fine-grained images with few samples brings
two challenges. On the one hand, images in the same category are quite differ-
ent due to poses, illumination conditions, backgrounds, etc. So how to capture
arXiv:2210.12348v1 [cs.CV] 22 Oct 2022
2 Y. Qi et al.
invariant features in limited samples is a particularly critical problem. On the
other hand, it is complicated to distinguish subtle visual appearance clues on
account of the small differences between categories. Therefore, we consider that
the invariant global structure and the discriminative local details of objects are
both crucial for fine-grained few-shot classification.
To effectively learn latent patterns from few labeled images, many approaches [7,
33] have been proposed in recent years. These methods can be roughly divided
into two branches: the meta-learning methods and the metric learning ones.
Metric learning has attracted more and more attention due to its simplicity and
effectiveness, and our work will focus on such methods. Traditional approaches
such as matching network [29] and relation network [27] usually utilize global
features for recognition. However, the distribution of these image-level global fea-
tures cannot be accurately estimated because of the sparseness of the samples.
In addition, discriminative clues may not be detected only by relying on global
features. CovaMNet [15] and DN4 [16] introduce the deep local descriptors which
are exploited to describe the distribution with each class feature. Furthermore,
although these methods learn abundant features, they deal with each support
class independently and cannot employ the contextual information of the whole
task to generate task-specific features. In conclusion, the importance of different
parts changes with different tasks.
In this paper, we propose a Task-aware Dual Similarity Network(TDSNet)
for fine-grained few-shot learning, which makes full use of both global invari-
ant features and discriminative local details of images. More specifically, first,
a local feature enhancement module is employed to activate discriminative se-
mantic parts by matching the predicted distribution between objects and parts.
Second, in the dual similarity module, the proposed TDSNet calculates the class
prototypes as global invariant features. Especially, in the local similarity branch,
task-aware attention is adopted to select important image patches for the cur-
rent task. By considering the context of the entire support set as a whole, the
key patches in the task are selected and weighted without paying too much at-
tention to the unimportant parts. Finally, both global and local similarities are
employed for the final classification. We conduct comprehensive experiments on
three popular fine-grained datasets to demonstrate the effectiveness of our pro-
posed method. Especially, our method can also have good performance when
there is only one training image.
2 Related Work
Few shot learning. Few-shot learning aims at recognizing unseen classes with
only few samples. The recently popular literature on few-shot learning can be
roughly divided into the following two categories: meta-learning based methods
and metric-learning based methods.
Meta-learning based methods attempt to learn a good optimizer to update
model parameters. MAML [9] is dedicated to learning a good parameter ini-
tialization so that the model can adapt to the new task after training on few
A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning 3
samples. Ravi et al. [24] propose a meta-learner optimizer based on LSTM to
optimize a classifier while also studying an initialization for the learner that
contains task-aware knowledge.
Metric-learning based methods aim to measure the similarity by learning an
appropriate metric that quantifies the relationship between the query images
and support sets. Koch et al. [13] adopt a siamese convolutional neural network
to learn generic image representations, which is performed as a binary classifi-
cation network. Lifchitz et al. [18] directly predict classification for each local
representation and calculates the loss. DN4 [16] employs k-nearest neighbors to
construct an image-to-class search space that utilizes deep local representations.
Unlike DN4, which is most relevant to our work, we argue that considering each
support class independently may capture features shared among classes that are
unimportant for classification. In this paper, task-aware local representations
will be detected to explore richer information.
Fine-grained image classification. Because some early approaches [1, 3]
require a lot of bounding boxes or part annotations as supervision that needs
a high cost of expert knowledge, more and more researchers are turning their
attention to weakly supervised methods [20, 23] that rely only on image-level
annotations. Inspired by different convolutional feature channels corresponding
to different types of visual modes, MC-Loss [5] proposes a mutual-channel loss
that consists of a discriminality component and a diversity component to get the
channels with locally discriminative regions for a specific class. TDSA-Loss [4]
obtains multi-regional and multi-granularity features by constraining mid-level
features with the attention generated by high-level features. Different from these
methods, we consider that the discriminability of local features obtained only
by the attention maps may not be guaranteed. In order to overcome this lim-
itation, the proposed TDSNet activates the local representations with strong
discriminability by matching the distribution between the global features and
their sub-features, so that the discriminability of global features at fine-grained
scales is improved.
3 Method
3.1 Problem Definition
In this paper, the proposed TDSNet also follows the common setup of other few-
shot learning methods. Specifically, few-shot classification is usually formalized
as N-way K-shot classification problems. Let Sdenote a support set that contains
Ndistinct image classes, and each class contains Klabeled samples. Given
a query set Q, the purpose of few-shot learning is classifying each unlabeled
sample in Qaccording to the support set S. However, limited samples in S
make it difficult to efficiently train a network. Therefore, auxiliary set Ais always
introduced to learn transferable knowledge to improve classification performance.
Note that Sand Ahave their own distinct label spaces without intersections.
In order to learn transferable knowledge better, the episode training mecha-
nism [29] is adopted in the training phase. Specifically, at each iteration, support
摘要:

ATask-awareDualSimilarityNetworkforFine-grainedFew-shotLearningYanQi1,HanSun1(),NingzhongLiu1,andHuiyuZhou21NanjingUniversityofAeronauticsandAstronautics,JiangsuNanjing,China2SchoolofComputingandMathematicalSciences,UniversityofLeicester,U.Ksunhan@nuaa.edu.cnAbstract.Thegoalofne-grainedfew-shotlear...

展开>> 收起<<
A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning Yan Qi1 Han Sun1 Ningzhong Liu1 and Huiyu Zhou2.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:880.46KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注