A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning Yan Qi1 Han Sun1 Ningzhong Liu1 and Huiyu Zhou2

2025-04-30 0 0 880.46KB 12 页 10玖币

侵权投诉

A Task-aware Dual Similarity Network for

Fine-grained Few-shot Learning

Yan Qi1, Han Sun1( ), Ningzhong Liu1, and Huiyu Zhou2

1Nanjing University of Aeronautics and Astronautics, Jiangsu Nanjing, China

2School of Computing and Mathematical Sciences, University of Leicester, U.K

sunhan@nuaa.edu.cn

Abstract. The goal of ﬁne-grained few-shot learning is to recognize

sub-categories under the same super-category by learning few labeled

samples. Most of the recent approaches adopt a single similarity mea-

sure, that is, global or local measure alone. However, for ﬁne-grained

images with high intra-class variance and low inter-class variance, ex-

ploring global invariant features and discriminative local details is quite

essential. In this paper, we propose a Task-aware Dual Similarity Net-

work(TDSNet), which applies global features and local patches to achieve

better performance. Speciﬁcally, a local feature enhancement module is

adopted to activate the features with strong discriminability. Besides,

task-aware attention exploits the important patches among the entire

task. Finally, both the class prototypes obtained by global features and

discriminative local patches are employed for prediction. Extensive ex-

periments on three ﬁne-grained datasets demonstrate that the proposed

TDSNet achieves competitive performance by comparing with other state-

of-the-art algorithms.

Keywords: Fine-grained image classiﬁcation ·Few-shot learning ·Fea-

ture enhancement

1 Introduction

As one of the most important problems in the ﬁeld of artiﬁcial intelligence,

ﬁne-grained image classiﬁcation [6, 8] aims to identify objects of sub-categories

under the same super-category. Diﬀerent from the traditional image classiﬁcation

task [21,22], the images of sub-categories are similar to each other, which makes

ﬁne-grained recognition still a popular and challenging topic in computer vision.

Beneﬁting from the development of Convolution Neural Networks (CNNs),

ﬁne-grained image classiﬁcation has made signiﬁcant progress. Most approaches

typically rely on supervision from a large number of labeled samples. In con-

trast, humans can identify new classes with only few labeled examples. Recently,

some studies [25, 31] focus on a more challenging setting, which aims to recog-

nize ﬁne-grained images from few samples, and is called ﬁne-grained few-shot

learning(FG-FSL). Learning from ﬁne-grained images with few samples brings

two challenges. On the one hand, images in the same category are quite diﬀer-

ent due to poses, illumination conditions, backgrounds, etc. So how to capture

arXiv:2210.12348v1 [cs.CV] 22 Oct 2022

2 Y. Qi et al.

invariant features in limited samples is a particularly critical problem. On the

other hand, it is complicated to distinguish subtle visual appearance clues on

account of the small diﬀerences between categories. Therefore, we consider that

the invariant global structure and the discriminative local details of objects are

both crucial for ﬁne-grained few-shot classiﬁcation.

To eﬀectively learn latent patterns from few labeled images, many approaches [7,

33] have been proposed in recent years. These methods can be roughly divided

into two branches: the meta-learning methods and the metric learning ones.

Metric learning has attracted more and more attention due to its simplicity and

eﬀectiveness, and our work will focus on such methods. Traditional approaches

such as matching network [29] and relation network [27] usually utilize global

features for recognition. However, the distribution of these image-level global fea-

tures cannot be accurately estimated because of the sparseness of the samples.

In addition, discriminative clues may not be detected only by relying on global

features. CovaMNet [15] and DN4 [16] introduce the deep local descriptors which

are exploited to describe the distribution with each class feature. Furthermore,

although these methods learn abundant features, they deal with each support

class independently and cannot employ the contextual information of the whole

task to generate task-speciﬁc features. In conclusion, the importance of diﬀerent

parts changes with diﬀerent tasks.

In this paper, we propose a Task-aware Dual Similarity Network(TDSNet)

for ﬁne-grained few-shot learning, which makes full use of both global invari-

ant features and discriminative local details of images. More speciﬁcally, ﬁrst,

a local feature enhancement module is employed to activate discriminative se-

mantic parts by matching the predicted distribution between objects and parts.

Second, in the dual similarity module, the proposed TDSNet calculates the class

prototypes as global invariant features. Especially, in the local similarity branch,

task-aware attention is adopted to select important image patches for the cur-

rent task. By considering the context of the entire support set as a whole, the

key patches in the task are selected and weighted without paying too much at-

tention to the unimportant parts. Finally, both global and local similarities are

employed for the ﬁnal classiﬁcation. We conduct comprehensive experiments on

three popular ﬁne-grained datasets to demonstrate the eﬀectiveness of our pro-

posed method. Especially, our method can also have good performance when

there is only one training image.

2 Related Work

Few shot learning. Few-shot learning aims at recognizing unseen classes with

only few samples. The recently popular literature on few-shot learning can be

roughly divided into the following two categories: meta-learning based methods

and metric-learning based methods.

Meta-learning based methods attempt to learn a good optimizer to update

model parameters. MAML [9] is dedicated to learning a good parameter ini-

tialization so that the model can adapt to the new task after training on few

A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning 3

samples. Ravi et al. [24] propose a meta-learner optimizer based on LSTM to

optimize a classiﬁer while also studying an initialization for the learner that

contains task-aware knowledge.

Metric-learning based methods aim to measure the similarity by learning an

appropriate metric that quantiﬁes the relationship between the query images

and support sets. Koch et al. [13] adopt a siamese convolutional neural network

to learn generic image representations, which is performed as a binary classiﬁ-

cation network. Lifchitz et al. [18] directly predict classiﬁcation for each local

representation and calculates the loss. DN4 [16] employs k-nearest neighbors to

construct an image-to-class search space that utilizes deep local representations.

Unlike DN4, which is most relevant to our work, we argue that considering each

support class independently may capture features shared among classes that are

unimportant for classiﬁcation. In this paper, task-aware local representations

will be detected to explore richer information.

Fine-grained image classiﬁcation. Because some early approaches [1, 3]

require a lot of bounding boxes or part annotations as supervision that needs

a high cost of expert knowledge, more and more researchers are turning their

attention to weakly supervised methods [20, 23] that rely only on image-level

annotations. Inspired by diﬀerent convolutional feature channels corresponding

to diﬀerent types of visual modes, MC-Loss [5] proposes a mutual-channel loss

that consists of a discriminality component and a diversity component to get the

channels with locally discriminative regions for a speciﬁc class. TDSA-Loss [4]

obtains multi-regional and multi-granularity features by constraining mid-level

features with the attention generated by high-level features. Diﬀerent from these

methods, we consider that the discriminability of local features obtained only

by the attention maps may not be guaranteed. In order to overcome this lim-

itation, the proposed TDSNet activates the local representations with strong

discriminability by matching the distribution between the global features and

their sub-features, so that the discriminability of global features at ﬁne-grained

scales is improved.

3 Method

3.1 Problem Deﬁnition

In this paper, the proposed TDSNet also follows the common setup of other few-

shot learning methods. Speciﬁcally, few-shot classiﬁcation is usually formalized

as N-way K-shot classiﬁcation problems. Let Sdenote a support set that contains

Ndistinct image classes, and each class contains Klabeled samples. Given

a query set Q, the purpose of few-shot learning is classifying each unlabeled

sample in Qaccording to the support set S. However, limited samples in S

make it diﬃcult to eﬃciently train a network. Therefore, auxiliary set Ais always

introduced to learn transferable knowledge to improve classiﬁcation performance.

Note that Sand Ahave their own distinct label spaces without intersections.

In order to learn transferable knowledge better, the episode training mecha-

nism [29] is adopted in the training phase. Speciﬁcally, at each iteration, support

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ATask-awareDualSimilarityNetworkforFine-grainedFew-shotLearningYanQi1,HanSun1(),NingzhongLiu1,andHuiyuZhou21NanjingUniversityofAeronauticsandAstronautics,JiangsuNanjing,China2SchoolofComputingandMathematicalSciences,UniversityofLeicester,U.Ksunhan@nuaa.edu.cnAbstract.Thegoalofne-grainedfew-shotlear...

展开>> 收起<<

A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning Yan Qi1 Han Sun1 Ningzhong Liu1 and Huiyu Zhou2.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Task-aware Dual Similarity Network for Fine-grained Few-shot Learning Yan Qi1 Han Sun1 Ningzhong Liu1 and Huiyu Zhou2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: