Few-Shot Learning of Compact Models via Task-Specific Meta Distillation Yong Wu1 Shekhor Chanda2 Mehrdad Hosseinzadeh3 Zhi Liu1 Yang Wang4 1Shanghai University2University of Manitoba3Huawei Technologies Canada4Concordia University

2025-05-06 1 0 638.07KB 10 页 10玖币
侵权投诉
Few-Shot Learning of Compact Models via Task-Specific Meta Distillation
Yong Wu1, Shekhor Chanda2, Mehrdad Hosseinzadeh3, Zhi Liu1, Yang Wang4*
1Shanghai University, 2University of Manitoba, 3Huawei Technologies Canada, 4Concordia University
yong wu@shu.edu.cn,chandas@myumanitoba.ca,mehrdad.hosseinzadeh@live.com
liuzhi@staff.shu.edu.cn,yang.wang@concordia.ca
Abstract
We consider a new problem of few-shot learning of com-
pact models. Meta-learning is a popular approach for few-
shot learning. Previous work in meta-learning typically as-
sumes that the model architecture during meta-training is
the same as the model architecture used for final deploy-
ment. In this paper, we challenge this basic assumption. For
final deployment, we often need the model to be small. But
small models usually do not have enough capacity to effec-
tively adapt to new tasks. In the mean time, we often have
access to the large dataset and extensive computing power
during meta-training since meta-training is typically per-
formed on a server. In this paper, we propose task-specific
meta distillation that simultaneously learns two models in
meta-learning: a large teacher model and a small stu-
dent model. These two models are jointly learned during
meta-training. Given a new task during meta-testing, the
teacher model is first adapted to this task, then the adapted
teacher model is used to guide the adaptation of the stu-
dent model. The adapted student model is used for final de-
ployment. We demonstrate the effectiveness of our approach
in few-shot image classification using model-agnostic meta-
learning (MAML). Our proposed method outperforms other
alternatives on several benchmark datasets.
1. Introduction
Meta-learning techniques, such as model-agnostic meta-
learning (MAML) [9], have been successfully applied in
many computer vision and machine learning tasks, such
as few-shot learning [9], domain adaptation [24], domain
generalization [25], etc. Meta-learning consists of a meta-
training stage and a meta-testing stage. During meta-
training, a global model is learned from a set of tasks. For
example, in the case of few-shot learning (FSL), each task
is a few-shot classification problem. During meta-testing,
*Corresponding authors: Zhi Liu and Yang Wang.
Cloud
Server
Adaptation
Model
Client-1
Client-2
Figure 1: Motivation of our problem. Consider the image
classification problem with a server (e.g. cloud vendor) and
many clients. The cloud vendor has extensive computing
power and training images of many classes. Each client may
want to solve a client-specific image classification problem
with some new classes not covered by the cloud vendor.
For example, one client wants to classify different medical
images, while another client wants to classify various mer-
chandise in a particular store. Each client only has few-shot
data. The cloud vendor can train some model on the server
side. Due to privacy concerns, clients do not want to up-
load their own data to the cloud vendor. Instead, each client
adapts the model from the cloud vendor to his/her specific
image classification task on the client side. We would like
the final model for deployment on the client side to be small.
the learned global model can be adapted to a new few-
shot classification problem with only few labeled examples.
Meta-training is typically done on a central server where it
is reasonable to assume access to extensive computational
resources. In contrast, meta-testing is presumably done by
an end-user or client who may not have the same compu-
tational resources as the central server, especially if the ap-
plication of the client needs to run on low-powered edge
devices. Existing meta-learning literature has largely over-
looked this gap of computational resources. Existing meta-
learning (or more broadly, few-shot learning in general) ap-
proaches typically assume that the architecture of the model
1
arXiv:2210.09922v1 [cs.LG] 18 Oct 2022
T
Training
S
Knowledge
Distillation
Meta-training
Tasks
Jointly training
Adaptation
S
Novel task data
(Few)
T
Adapt
Adapt
SDeploy Tibetan mastiff
Deployment Computational
resources
Computational resources
TTeacher model
SStudent model
Figure 2: Key idea of our approach. On the server side,
we jointly learn a large teacher model and a small student
model in a meta-training framework. At the client side,
the client first performs an adaptation stage. During this
stage, the teacher model is first adapted to the task, then the
adapted teacher model is used to guide the adaptation of the
student model via distillation. The adapted student model
is then used for final deployment. Different stages (meta-
training, adaptation, deployment) of this pipeline involve
different levels of computational resources.
during meta-training is the same as the one used by the
client for the final deployment. In this paper, we challenge
this basic assumption of existing meta-learning solutions.
We propose a new problem setting that takes into account
of different levels of available computational resources dur-
ing meta-training and meta-testing.
Our problem setting is motivated by a practical scenario
(shown in Fig. 1) consisting of a server and many clients.
For example, the server can be a cloud vendor that pro-
vides pretrained image classification models (possibly via
web API). On the server side, the cloud vendor may have
a large image dataset with many object classes. The cloud
vendor typically has access to significant computational re-
sources to train very large models. We also have clients
who are interested in solving some application-specific im-
age classification problems. Each client may only be inter-
ested in recognizing a handful of object classes that are po-
tentially not covered by the training dataset from the server
side. For example, one client might be a medical doctor in-
terested in recognizing different tumors in medical images,
while another client might be a retail owner interested in
classifying different merchandise in a store. Because of the
cost of acquiring labeled images, each client may only have
a small number of labeled examples for the target applica-
tion. Due to privacy concerns, clients may not want to send
their data to the cloud vendor. In this case, a natural so-
lution is for a client to re-use a pretrained model provided
by the cloud vendor and perform few-shot learning to adapt
the pretrained model to the new object classes for the target
application.
At first glance, the scenario in Fig. 1 is a classic meta-
learning problem. The cloud vendor can perform meta-
training on the server to obtain a global model. On the client
side, the client performs two steps. The first step (called
adaptation) is to adapt the global model from the server
side to the target application. For example, the adaptation
step in MAML performs a few gradient updates on the few-
shot data. After the adaptation, the second step (called de-
ployment) is to deploy the adapted model for the end ap-
plication. The combination of these two steps (adaptation
and deployment) is commonly known as “meta-testing” in
meta-learning. In this paper, we make a distinction between
adaptation and deployment since this distinction is impor-
tant for motivating our problem setting. Our key observa-
tion is that the available computing resources are vastly dif-
ferent in these different stages. The meta-training stage is
done on a server or the cloud with significant computing re-
sources. The adaptation stage is often done on a client’s lo-
cal machine with moderate computing powers (e.g., desktop
or laptop). For the deployment, we may only have access to
very limited computing power if the model is deployed on
an edge device. If we want to use classic meta-learning in
this case, we have to choose a small model architecture to
make sure that the final model can be deployed on the edge
device. Unfortunately, previous work [27, 34] has shown
that a small model may not have enough capacity to con-
sume the information from a large amount of data available
during meta-training, so the learned model may not effec-
tively adapt to a new task.
In this paper, we propose a new approach called task-
specific meta distillation to solve this problem. The key
idea of our approach is illustrated in Fig. 2. During meta-
training, we simultaneously learn a large teacher model and
a small student model. Since the teacher model has a larger
capacity, it can better adapt to a new task. During meta-
training, these two models are jointly learned in a way that
the teacher model can effectively guide the adaptation of the
student model. During the adaptation step of meta-testing,
we first adapt the teacher model to the target task, then use
the adapted teacher to guide the adaptation of the student
model via knowledge distillation. Finally, the adapted stu-
dent model is used for the final deployment. In this paper,
we apply our proposed approach to improve few-shot im-
age classification with MAML. But our technique is gen-
erally applicable in other meta-learning tasks beyond few-
shot learning.
The contributions of this work are manifold. First, pre-
vious work in meta-learning has largely overlooked the is-
sue of the computational resource gap at different stages
of meta-learning. This issue poses challenges in the real-
world adoption of meta-learning applications. In this pa-
摘要:

Few-ShotLearningofCompactModelsviaTask-SpecicMetaDistillationYongWu1,ShekhorChanda2,MehrdadHosseinzadeh3,ZhiLiu1,YangWang4*1ShanghaiUniversity,2UniversityofManitoba,3HuaweiTechnologiesCanada,4ConcordiaUniversityyongwu@shu.edu.cn,chandas@myumanitoba.ca,mehrdad.hosseinzadeh@live.comliuzhi@staff.shu....

展开>> 收起<<
Few-Shot Learning of Compact Models via Task-Specific Meta Distillation Yong Wu1 Shekhor Chanda2 Mehrdad Hosseinzadeh3 Zhi Liu1 Yang Wang4 1Shanghai University2University of Manitoba3Huawei Technologies Canada4Concordia University.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:638.07KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注