A Unied Framework with Meta-dropout for Few-shot Learning Shaobo Lin Xingyu Zeng and Rui Zhao

2025-04-27 0 0 911.11KB 22 页 10玖币
侵权投诉
A Unified Framework with Meta-dropout for
Few-shot Learning
Shaobo Lin, Xingyu Zeng, and Rui Zhao
Sensetime Research
{linshaobo,zengxingyu,zhaorui}@sensetime.com
Abstract. Conventional training of deep neural networks usually re-
quires a substantial amount of data with expensive human annotations.
In this paper, we utilize the idea of meta-learning to explain two very
different streams of few-shot learning, i.e., the episodic meta-learning-
based and pre-train finetune-based few-shot learning, and form a unified
meta-learning framework. In order to improve the generalization power
of our framework, we propose a simple yet effective strategy named meta-
dropout, which is applied to the transferable knowledge generalized from
base categories to novel categories. The proposed strategy can effectively
prevent neural units from co-adapting excessively in the meta-training
stage. Extensive experiments on the few-shot object detection and few-
shot image classification datasets, i.e., Pascal VOC, MS COCO, CUB,
and mini-ImageNet, validate the effectiveness of our method.
1 Introduction
Deep Neural Networks (DNNs) have achieved great progress in many computer
vision tasks [1,2,3,4]. However, the impressive performance of these models
largely relies on a large amount of data as well as expensive human annotation.
When the annotated data are scarce, DNNs cannot generalize well to testing
data especially when the testing data belongs to different classes of the training
data. In contrast, humans can learn to recognize or detect a novel object quickly
with only a few labeled examples. Due to some object categories naturally have
few samples or their annotations are extremely hard to obtain, the generalization
ability of conventional neural networks is far from satisfactory.
Few-shot learning, therefore, becomes an important research topic to achieve
better generalization ability by learning from only a few examples. The main-
stream few-shot learning approaches consists of episodic approaches [5,6,7,8]
and pretrain-finetune based approaches [9,10,11]. Episodic meta-learning encap-
sulates the training samples into an episode [12] to mimic the procedure of few-
shot testing. Pre-train finetune-based methods are composed of the pre-training
stage and fine-tuning stage, the former is responsible for obtaining a good ini-
tialization point from base classes, and the latter adapts the pre-trained model
to a specific task, respectively. In order to transfer knowledge from base data to
novel ones, both methods are trained with two stages, where the data-sufficient
base classes and the data-scarce novel classes are used separately. However, there
arXiv:2210.06409v1 [cs.CV] 12 Oct 2022
2 Shaobo Lin, Xingyu Zeng, and Rui Zhao
Fig. 1. The generalization power of transferable knowledge across different source tasks
is the key for few-shot learning, in which transferable knowledge is adapted to the target
task.
is no framework to unify the two very different streams, hindering exploring the
common and eccentric problem of few-shot learning.
In this paper, we incorporate episodic meta-learning-based which is denoted
as episode-based for simplification, and pre-train finetune-based few-shot meth-
ods into one unified optimization framework based on the idea of meta-learning.
The framework consists of a novel reformulated meta-training stage and a meta-
testing stage. In the meta-training stage, our framework considers the common
elements of few-shot learning, including meta-knowledge, task-knowledge, meta-
loss, task-loss, and the distribution of the data and tasks. In the meta-testing
stage, the final model for novel tasks is obtained based on the learned model.
As shown in Fig.1, a deep model can be divided into two components, called
task-specific knowledge and transferable knowledge. The former represents the
last fully-connected classifier for specific categories, the latter is the well-learned
feature representation which is needed to be generalized to novel tasks. In the
different approaches, meta-knowledge can be different types, such as the frozen
feature in TFA [11] and the initialization point of backbone in FSCE [13], etc.
Therefore, it is crucial to improve the generalization power of transferable knowl-
edge if we want to apply it from source tasks to novel tasks in few-shot learn-
ing. To achieve this goal, we propose a simple yet effective strategy, named
meta-dropout. Because our unified framework can integrate two very different
streams of few-shot learning and identify which part of a model is the trans-
ferable knowledge, the proposed meta-dropout can be easily applied to existing
A Unified Framework with Meta-dropout for Few-shot Learning 3
few-shot models. We select several different methods from the above two streams
as the baselines to validate the correctness of our framework and the effectiveness
of our meta-dropout. By utilizing meta-dropout, our model demonstrates great
superiority towards the current excellent methods on few-shot object detection
and few-shot image classification tasks. Dropout is not a new idea. However, we
use it to solve a new problem (few-shot learning) and we provide more insights
about how to use it, which are the major novelties. Our overall contributions
can be summarized as three-fold:
We utilize the idea of meta-learning to explain two different streams of few-
shot learning, i.e., the episodic meta-learning-based and pre-train finetune-
based few-shot learning, and form a unified meta-learning framework.
We propose a simple yet effective strategy, named meta-dropout, to improve
the generalization power of meta-knowledge in our framework.
Experiments of baselines from different streams evaluate the effectiveness
of our approach on the few-shot object detection and image classification
datasets, i.e., Pascal VOC, MS COCO, CUB, and mini-ImageNet.
2 Related Work
The episode-based and pre-train finetune-based methods are two existing main-
stream methods in few-shot learning. The differences between the episode-based
and pre-train finetune-based methods include the training pipeline (normal train-
ing or episodic training) and the distribution of datasets during training (one
overall task or multiple serial tasks).
2.1 Episode-based Few-shot Learning
Few-shot learning is an important yet unsolved task in computer vision [14,15,12,16].
Nowadays, the meta-learning strategy which is called “learning-to-learn” has be-
come an increasingly popular solution. The goal of meta-learning is to obtain
task-level meta-knowledge that helps the model quickly generalize across all
tasks [17,18,19,20]. Recent methods for few-shot learning usually extract meta-
knowledge from a set of auxiliary tasks via the episode-based strategy [12], where
each episode contains Cclasses and Ksamples of each class, i.e.,C-way K-shot.
In few-shot image classification, [12] proposed Matching Networks to find the
most similar class for the target image among a small set of labeled samples.
Prototypical Networks (PN) [21] extended Matching Networks by producing a
linear classifier instead of the weighted nearest neighbor for each class. The
cosine similarity-based classifier further enhanced the discriminative power of
the trained model [10,22]. Relation Network (RN) [23] used a neural network
to learn a distance metric, in which the unlabeled images could be classified
according to the relation scores between the target sample and a few labeled
images. Graph Neural Network (GNN) [24,25] was also utilized to model rela-
tionships between different categories. In few-shot object detection, [5] applied
4 Shaobo Lin, Xingyu Zeng, and Rui Zhao
a feature re-weighting module to a single-stage object detector (YOLOv2) with
the support masks as inputs. [6] introduced a Predictor-head Remodeling Net-
work (PRN) that shared its backbone with Faster/Mask R-CNN. To disentan-
gle the category-agnostic and category-specific components in a CNN model,
[7] proposed a weight prediction meta-model for predicting the parameters of
category-specific components from few samples. [26] proposed a new module for
distance metric learning (DML) that could be used as the classification head
combined with the standard object detection model. [8] introduced a few-shot
object detection method, which consisted of an attention-rpn, a multi-relation
detector, and a contrastive training strategy.
2.2 Pre-train Finetune-based Few-shot Learning
Pre-train finetune-based approaches are basic yet ignored in few-shot learn-
ing due to the excellent performance of episode-based methods. However, some
simple pre-train finetune-based methods turn out to be more favorable than
many episode-based works [10,9,27]. [10] introduced a pre-train finetune base-
line with a distance-based classifier, achieving a competitive performance with
state-of-the-art episode-based classification approaches. [27] explored a simple
process: meta-learning over a whole classification pre-trained model. This simple
method achieves competitive performance to state-of-the-art methods on stan-
dard benchmarks. By using the proposed regularization, the standard detectors,
such as SSD [28] and FRCNN [1], were fine-tuned for few-shot problems. Further-
more, [11] demonstrated that only fine-tuning the last layer of existing models
was crucial to the few-shot object detection task. Such a simple approach out-
performs the episode-based approaches by 2 to 20 points and even doubles the
accuracy of the prior works on current benchmarks.
3 Method
3.1 The Solution of Few-shot problems
Common supervised learning problems based on abundant training data can
be solved by minizing the Equ (1), in which Θis the trainable parameters of
neural networks, and x is input data sampled from p(D). While few-shot tasks
are limited in the number of samples, optimizing it directly is likely to results in
the overfitting of Θdue to its high-dimensional property.
min
ΘE
xp(D)L(Θ;x) (1)
Few-shot learning aims to solve learning problems with just a few training
examples. To achieve this goal, the common solution is to reduce the learnable
dimension of Θ, and thus the Equ (1) can be re-written as Equ (2), in which Θ
= [θ,w]. wrepresents the useful foundation for few-shot learning, such as a good
initailization point or a well-learned feature representation, which is obtained
from source tasks. θare updated on target tasks based on the learned w.
A Unified Framework with Meta-dropout for Few-shot Learning 5
Fig. 2. The optimization processes of pre-train finetune-based and episodic meta-
learning-based methods.
min
θE
xp(D)L(θ;x|w) (2)
Given a labeled source dataset Dsource, there are Csource source classes with
a large number of images in each class. Novel dataset Dtarget with novel classes
Ctarget consists of a few samples in each class. Csource and Ctarget do not have
overlapping categories. The learning goal is adapting the model from Dsource
to Dtarget.Dsource and Dtarget are used to optimize wand θrespectively. The
C-way K-shot few-shot setting that is used for evaluating the performance of a
few-shot model means Dtarget has Cnovel categories and each novel category
has Kimages.
Existing few-shot learning methods consist of two different streams of cate-
gories, i.e., the episodic meta-learning-based and pre-train finetune-based meth-
ods. However, both streams can be explained by the above formulation as shown
in Fig.2. win pre-train finetune-based methods, like TFA [11], is its backbone
which is transferred to the second stage for initialization. θin TFA means the
last fully-connected layer that is optimized for novel tasks based on the frozen
w.win Meta R-CNN [6], which is a representative episodic meta-learning-based
method, provides a backbone with good generalization ability. θin Meta R-CNN
is the trainable backbone and the last fully-connected regression and classifica-
tion layers for novel tasks.
3.2 The Unified Meta-learning Framework
In order to explain the learning process of existing few-shot methods, we propose
a unified meta-learning framework to re-formulate Equ (2) as Equ (3), in which
the learning goal is to obtain a general transferable knowledge via optimizing the
摘要:

AUni edFrameworkwithMeta-dropoutforFew-shotLearningShaoboLin,XingyuZeng,andRuiZhaoSensetimeResearchflinshaobo,zengxingyu,zhaoruig@sensetime.comAbstract.Conventionaltrainingofdeepneuralnetworksusuallyre-quiresasubstantialamountofdatawithexpensivehumanannotations.Inthispaper,weutilizetheideaofmeta-lea...

展开>> 收起<<
A Unied Framework with Meta-dropout for Few-shot Learning Shaobo Lin Xingyu Zeng and Rui Zhao.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:911.11KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注