A Unied Framework with Meta-dropout for Few-shot Learning Shaobo Lin Xingyu Zeng and Rui Zhao

2025-04-27 0 0 911.11KB 22 页 10玖币

侵权投诉

A Uniﬁed Framework with Meta-dropout for

Few-shot Learning

Shaobo Lin, Xingyu Zeng, and Rui Zhao

Sensetime Research

{linshaobo,zengxingyu,zhaorui}@sensetime.com

Abstract. Conventional training of deep neural networks usually re-

quires a substantial amount of data with expensive human annotations.

In this paper, we utilize the idea of meta-learning to explain two very

diﬀerent streams of few-shot learning, i.e., the episodic meta-learning-

based and pre-train ﬁnetune-based few-shot learning, and form a uniﬁed

meta-learning framework. In order to improve the generalization power

of our framework, we propose a simple yet eﬀective strategy named meta-

dropout, which is applied to the transferable knowledge generalized from

base categories to novel categories. The proposed strategy can eﬀectively

prevent neural units from co-adapting excessively in the meta-training

stage. Extensive experiments on the few-shot object detection and few-

shot image classiﬁcation datasets, i.e., Pascal VOC, MS COCO, CUB,

and mini-ImageNet, validate the eﬀectiveness of our method.

1 Introduction

Deep Neural Networks (DNNs) have achieved great progress in many computer

vision tasks [1,2,3,4]. However, the impressive performance of these models

largely relies on a large amount of data as well as expensive human annotation.

When the annotated data are scarce, DNNs cannot generalize well to testing

data especially when the testing data belongs to diﬀerent classes of the training

data. In contrast, humans can learn to recognize or detect a novel object quickly

with only a few labeled examples. Due to some object categories naturally have

few samples or their annotations are extremely hard to obtain, the generalization

ability of conventional neural networks is far from satisfactory.

Few-shot learning, therefore, becomes an important research topic to achieve

better generalization ability by learning from only a few examples. The main-

stream few-shot learning approaches consists of episodic approaches [5,6,7,8]

and pretrain-ﬁnetune based approaches [9,10,11]. Episodic meta-learning encap-

sulates the training samples into an episode [12] to mimic the procedure of few-

shot testing. Pre-train ﬁnetune-based methods are composed of the pre-training

stage and ﬁne-tuning stage, the former is responsible for obtaining a good ini-

tialization point from base classes, and the latter adapts the pre-trained model

to a speciﬁc task, respectively. In order to transfer knowledge from base data to

novel ones, both methods are trained with two stages, where the data-suﬃcient

base classes and the data-scarce novel classes are used separately. However, there

arXiv:2210.06409v1 [cs.CV] 12 Oct 2022

2 Shaobo Lin, Xingyu Zeng, and Rui Zhao

Fig. 1. The generalization power of transferable knowledge across diﬀerent source tasks

is the key for few-shot learning, in which transferable knowledge is adapted to the target

task.

is no framework to unify the two very diﬀerent streams, hindering exploring the

common and eccentric problem of few-shot learning.

In this paper, we incorporate episodic meta-learning-based which is denoted

as episode-based for simpliﬁcation, and pre-train ﬁnetune-based few-shot meth-

ods into one uniﬁed optimization framework based on the idea of meta-learning.

The framework consists of a novel reformulated meta-training stage and a meta-

testing stage. In the meta-training stage, our framework considers the common

elements of few-shot learning, including meta-knowledge, task-knowledge, meta-

loss, task-loss, and the distribution of the data and tasks. In the meta-testing

stage, the ﬁnal model for novel tasks is obtained based on the learned model.

As shown in Fig.1, a deep model can be divided into two components, called

task-speciﬁc knowledge and transferable knowledge. The former represents the

last fully-connected classiﬁer for speciﬁc categories, the latter is the well-learned

feature representation which is needed to be generalized to novel tasks. In the

diﬀerent approaches, meta-knowledge can be diﬀerent types, such as the frozen

feature in TFA [11] and the initialization point of backbone in FSCE [13], etc.

Therefore, it is crucial to improve the generalization power of transferable knowl-

edge if we want to apply it from source tasks to novel tasks in few-shot learn-

ing. To achieve this goal, we propose a simple yet eﬀective strategy, named

meta-dropout. Because our uniﬁed framework can integrate two very diﬀerent

streams of few-shot learning and identify which part of a model is the trans-

ferable knowledge, the proposed meta-dropout can be easily applied to existing

A Uniﬁed Framework with Meta-dropout for Few-shot Learning 3

few-shot models. We select several diﬀerent methods from the above two streams

as the baselines to validate the correctness of our framework and the eﬀectiveness

of our meta-dropout. By utilizing meta-dropout, our model demonstrates great

superiority towards the current excellent methods on few-shot object detection

and few-shot image classiﬁcation tasks. Dropout is not a new idea. However, we

use it to solve a new problem (few-shot learning) and we provide more insights

about how to use it, which are the major novelties. Our overall contributions

can be summarized as three-fold:

–We utilize the idea of meta-learning to explain two diﬀerent streams of few-

shot learning, i.e., the episodic meta-learning-based and pre-train ﬁnetune-

based few-shot learning, and form a uniﬁed meta-learning framework.

–We propose a simple yet eﬀective strategy, named meta-dropout, to improve

the generalization power of meta-knowledge in our framework.

–Experiments of baselines from diﬀerent streams evaluate the eﬀectiveness

of our approach on the few-shot object detection and image classiﬁcation

datasets, i.e., Pascal VOC, MS COCO, CUB, and mini-ImageNet.

2 Related Work

The episode-based and pre-train ﬁnetune-based methods are two existing main-

stream methods in few-shot learning. The diﬀerences between the episode-based

and pre-train ﬁnetune-based methods include the training pipeline (normal train-

ing or episodic training) and the distribution of datasets during training (one

overall task or multiple serial tasks).

2.1 Episode-based Few-shot Learning

Few-shot learning is an important yet unsolved task in computer vision [14,15,12,16].

Nowadays, the meta-learning strategy which is called “learning-to-learn” has be-

come an increasingly popular solution. The goal of meta-learning is to obtain

task-level meta-knowledge that helps the model quickly generalize across all

tasks [17,18,19,20]. Recent methods for few-shot learning usually extract meta-

knowledge from a set of auxiliary tasks via the episode-based strategy [12], where

each episode contains Cclasses and Ksamples of each class, i.e.,C-way K-shot.

In few-shot image classiﬁcation, [12] proposed Matching Networks to ﬁnd the

most similar class for the target image among a small set of labeled samples.

Prototypical Networks (PN) [21] extended Matching Networks by producing a

linear classiﬁer instead of the weighted nearest neighbor for each class. The

cosine similarity-based classiﬁer further enhanced the discriminative power of

the trained model [10,22]. Relation Network (RN) [23] used a neural network

to learn a distance metric, in which the unlabeled images could be classiﬁed

according to the relation scores between the target sample and a few labeled

images. Graph Neural Network (GNN) [24,25] was also utilized to model rela-

tionships between diﬀerent categories. In few-shot object detection, [5] applied

4 Shaobo Lin, Xingyu Zeng, and Rui Zhao

a feature re-weighting module to a single-stage object detector (YOLOv2) with

the support masks as inputs. [6] introduced a Predictor-head Remodeling Net-

work (PRN) that shared its backbone with Faster/Mask R-CNN. To disentan-

gle the category-agnostic and category-speciﬁc components in a CNN model,

[7] proposed a weight prediction meta-model for predicting the parameters of

category-speciﬁc components from few samples. [26] proposed a new module for

distance metric learning (DML) that could be used as the classiﬁcation head

combined with the standard object detection model. [8] introduced a few-shot

object detection method, which consisted of an attention-rpn, a multi-relation

detector, and a contrastive training strategy.

2.2 Pre-train Finetune-based Few-shot Learning

Pre-train ﬁnetune-based approaches are basic yet ignored in few-shot learn-

ing due to the excellent performance of episode-based methods. However, some

simple pre-train ﬁnetune-based methods turn out to be more favorable than

many episode-based works [10,9,27]. [10] introduced a pre-train ﬁnetune base-

line with a distance-based classiﬁer, achieving a competitive performance with

state-of-the-art episode-based classiﬁcation approaches. [27] explored a simple

process: meta-learning over a whole classiﬁcation pre-trained model. This simple

method achieves competitive performance to state-of-the-art methods on stan-

dard benchmarks. By using the proposed regularization, the standard detectors,

such as SSD [28] and FRCNN [1], were ﬁne-tuned for few-shot problems. Further-

more, [11] demonstrated that only ﬁne-tuning the last layer of existing models

was crucial to the few-shot object detection task. Such a simple approach out-

performs the episode-based approaches by 2 to 20 points and even doubles the

accuracy of the prior works on current benchmarks.

3 Method

3.1 The Solution of Few-shot problems

Common supervised learning problems based on abundant training data can

be solved by minizing the Equ (1), in which Θis the trainable parameters of

neural networks, and x is input data sampled from p(D). While few-shot tasks

are limited in the number of samples, optimizing it directly is likely to results in

the overﬁtting of Θdue to its high-dimensional property.

min

ΘE

x∼p(D)L(Θ;x) (1)

Few-shot learning aims to solve learning problems with just a few training

examples. To achieve this goal, the common solution is to reduce the learnable

dimension of Θ, and thus the Equ (1) can be re-written as Equ (2), in which Θ

= [θ,w]. wrepresents the useful foundation for few-shot learning, such as a good

initailization point or a well-learned feature representation, which is obtained

from source tasks. θare updated on target tasks based on the learned w.

A Uniﬁed Framework with Meta-dropout for Few-shot Learning 5

Fig. 2. The optimization processes of pre-train ﬁnetune-based and episodic meta-

learning-based methods.

min

θE

x∼p(D)L(θ;x|w) (2)

Given a labeled source dataset Dsource, there are Csource source classes with

a large number of images in each class. Novel dataset Dtarget with novel classes

Ctarget consists of a few samples in each class. Csource and Ctarget do not have

overlapping categories. The learning goal is adapting the model from Dsource

to Dtarget.Dsource and Dtarget are used to optimize wand θrespectively. The

C-way K-shot few-shot setting that is used for evaluating the performance of a

few-shot model means Dtarget has Cnovel categories and each novel category

has Kimages.

Existing few-shot learning methods consist of two diﬀerent streams of cate-

gories, i.e., the episodic meta-learning-based and pre-train ﬁnetune-based meth-

ods. However, both streams can be explained by the above formulation as shown

in Fig.2. win pre-train ﬁnetune-based methods, like TFA [11], is its backbone

which is transferred to the second stage for initialization. θin TFA means the

last fully-connected layer that is optimized for novel tasks based on the frozen

w.win Meta R-CNN [6], which is a representative episodic meta-learning-based

method, provides a backbone with good generalization ability. θin Meta R-CNN

is the trainable backbone and the last fully-connected regression and classiﬁca-

tion layers for novel tasks.

3.2 The Uniﬁed Meta-learning Framework

In order to explain the learning process of existing few-shot methods, we propose

a uniﬁed meta-learning framework to re-formulate Equ (2) as Equ (3), in which

the learning goal is to obtain a general transferable knowledge via optimizing the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AUniedFrameworkwithMeta-dropoutforFew-shotLearningShaoboLin,XingyuZeng,andRuiZhaoSensetimeResearchflinshaobo,zengxingyu,zhaoruig@sensetime.comAbstract.Conventionaltrainingofdeepneuralnetworksusuallyre-quiresasubstantialamountofdatawithexpensivehumanannotations.Inthispaper,weutilizetheideaofmeta-lea...

展开>> 收起<<

A Unied Framework with Meta-dropout for Few-shot Learning Shaobo Lin Xingyu Zeng and Rui Zhao.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Unied Framework with Meta-dropout for Few-shot Learning Shaobo Lin Xingyu Zeng and Rui Zhao

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: