Few-Shot Learning of Compact Models via Task-Speciﬁc Meta Distillation Yong Wu1 Shekhor Chanda2 Mehrdad Hosseinzadeh3 Zhi Liu1 Yang Wang4 1Shanghai University2University of Manitoba3Huawei Technologies Canada4Concordia University

2025-05-06 1 0 638.07KB 10 页 10玖币

侵权投诉

Few-Shot Learning of Compact Models via Task-Speciﬁc Meta Distillation

Yong Wu1, Shekhor Chanda2, Mehrdad Hosseinzadeh3, Zhi Liu1∗, Yang Wang4*

1Shanghai University, 2University of Manitoba, 3Huawei Technologies Canada, 4Concordia University

yong wu@shu.edu.cn,chandas@myumanitoba.ca,mehrdad.hosseinzadeh@live.com

liuzhi@staff.shu.edu.cn,yang.wang@concordia.ca

Abstract

We consider a new problem of few-shot learning of com-

pact models. Meta-learning is a popular approach for few-

shot learning. Previous work in meta-learning typically as-

sumes that the model architecture during meta-training is

the same as the model architecture used for ﬁnal deploy-

ment. In this paper, we challenge this basic assumption. For

ﬁnal deployment, we often need the model to be small. But

small models usually do not have enough capacity to effec-

tively adapt to new tasks. In the mean time, we often have

access to the large dataset and extensive computing power

during meta-training since meta-training is typically per-

formed on a server. In this paper, we propose task-speciﬁc

meta distillation that simultaneously learns two models in

meta-learning: a large teacher model and a small stu-

dent model. These two models are jointly learned during

meta-training. Given a new task during meta-testing, the

teacher model is ﬁrst adapted to this task, then the adapted

teacher model is used to guide the adaptation of the stu-

dent model. The adapted student model is used for ﬁnal de-

ployment. We demonstrate the effectiveness of our approach

in few-shot image classiﬁcation using model-agnostic meta-

learning (MAML). Our proposed method outperforms other

alternatives on several benchmark datasets.

1. Introduction

Meta-learning techniques, such as model-agnostic meta-

learning (MAML) [9], have been successfully applied in

many computer vision and machine learning tasks, such

as few-shot learning [9], domain adaptation [24], domain

generalization [25], etc. Meta-learning consists of a meta-

training stage and a meta-testing stage. During meta-

training, a global model is learned from a set of tasks. For

example, in the case of few-shot learning (FSL), each task

is a few-shot classiﬁcation problem. During meta-testing,

*Corresponding authors: Zhi Liu and Yang Wang.

Cloud

Server

Adaptation

Model

Client-1

Client-2

Figure 1: Motivation of our problem. Consider the image

classiﬁcation problem with a server (e.g. cloud vendor) and

many clients. The cloud vendor has extensive computing

power and training images of many classes. Each client may

want to solve a client-speciﬁc image classiﬁcation problem

with some new classes not covered by the cloud vendor.

For example, one client wants to classify different medical

images, while another client wants to classify various mer-

chandise in a particular store. Each client only has few-shot

data. The cloud vendor can train some model on the server

side. Due to privacy concerns, clients do not want to up-

load their own data to the cloud vendor. Instead, each client

adapts the model from the cloud vendor to his/her speciﬁc

image classiﬁcation task on the client side. We would like

the ﬁnal model for deployment on the client side to be small.

the learned global model can be adapted to a new few-

shot classiﬁcation problem with only few labeled examples.

Meta-training is typically done on a central server where it

is reasonable to assume access to extensive computational

resources. In contrast, meta-testing is presumably done by

an end-user or client who may not have the same compu-

tational resources as the central server, especially if the ap-

plication of the client needs to run on low-powered edge

devices. Existing meta-learning literature has largely over-

looked this gap of computational resources. Existing meta-

learning (or more broadly, few-shot learning in general) ap-

proaches typically assume that the architecture of the model

arXiv:2210.09922v1 [cs.LG] 18 Oct 2022

Training

Knowledge

Distillation

Meta-training

Tasks

Jointly training

Adaptation

…

Novel task data

(Few)

Adapt

SDeploy Tibetan mastiff

Deployment Computational

resources

Computational resources

TTeacher model

SStudent model

Figure 2: Key idea of our approach. On the server side,

we jointly learn a large teacher model and a small student

model in a meta-training framework. At the client side,

the client ﬁrst performs an adaptation stage. During this

stage, the teacher model is ﬁrst adapted to the task, then the

adapted teacher model is used to guide the adaptation of the

student model via distillation. The adapted student model

is then used for ﬁnal deployment. Different stages (meta-

training, adaptation, deployment) of this pipeline involve

different levels of computational resources.

during meta-training is the same as the one used by the

client for the ﬁnal deployment. In this paper, we challenge

this basic assumption of existing meta-learning solutions.

We propose a new problem setting that takes into account

of different levels of available computational resources dur-

ing meta-training and meta-testing.

Our problem setting is motivated by a practical scenario

(shown in Fig. 1) consisting of a server and many clients.

For example, the server can be a cloud vendor that pro-

vides pretrained image classiﬁcation models (possibly via

web API). On the server side, the cloud vendor may have

a large image dataset with many object classes. The cloud

vendor typically has access to signiﬁcant computational re-

sources to train very large models. We also have clients

who are interested in solving some application-speciﬁc im-

age classiﬁcation problems. Each client may only be inter-

ested in recognizing a handful of object classes that are po-

tentially not covered by the training dataset from the server

side. For example, one client might be a medical doctor in-

terested in recognizing different tumors in medical images,

while another client might be a retail owner interested in

classifying different merchandise in a store. Because of the

cost of acquiring labeled images, each client may only have

a small number of labeled examples for the target applica-

tion. Due to privacy concerns, clients may not want to send

their data to the cloud vendor. In this case, a natural so-

lution is for a client to re-use a pretrained model provided

by the cloud vendor and perform few-shot learning to adapt

the pretrained model to the new object classes for the target

application.

At ﬁrst glance, the scenario in Fig. 1 is a classic meta-

learning problem. The cloud vendor can perform meta-

training on the server to obtain a global model. On the client

side, the client performs two steps. The ﬁrst step (called

adaptation) is to adapt the global model from the server

side to the target application. For example, the adaptation

step in MAML performs a few gradient updates on the few-

shot data. After the adaptation, the second step (called de-

ployment) is to deploy the adapted model for the end ap-

plication. The combination of these two steps (adaptation

and deployment) is commonly known as “meta-testing” in

meta-learning. In this paper, we make a distinction between

adaptation and deployment since this distinction is impor-

tant for motivating our problem setting. Our key observa-

tion is that the available computing resources are vastly dif-

ferent in these different stages. The meta-training stage is

done on a server or the cloud with signiﬁcant computing re-

sources. The adaptation stage is often done on a client’s lo-

cal machine with moderate computing powers (e.g., desktop

or laptop). For the deployment, we may only have access to

very limited computing power if the model is deployed on

an edge device. If we want to use classic meta-learning in

this case, we have to choose a small model architecture to

make sure that the ﬁnal model can be deployed on the edge

device. Unfortunately, previous work [27, 34] has shown

that a small model may not have enough capacity to con-

sume the information from a large amount of data available

during meta-training, so the learned model may not effec-

tively adapt to a new task.

In this paper, we propose a new approach called task-

speciﬁc meta distillation to solve this problem. The key

idea of our approach is illustrated in Fig. 2. During meta-

training, we simultaneously learn a large teacher model and

a small student model. Since the teacher model has a larger

capacity, it can better adapt to a new task. During meta-

training, these two models are jointly learned in a way that

the teacher model can effectively guide the adaptation of the

student model. During the adaptation step of meta-testing,

we ﬁrst adapt the teacher model to the target task, then use

the adapted teacher to guide the adaptation of the student

model via knowledge distillation. Finally, the adapted stu-

dent model is used for the ﬁnal deployment. In this paper,

we apply our proposed approach to improve few-shot im-

age classiﬁcation with MAML. But our technique is gen-

erally applicable in other meta-learning tasks beyond few-

shot learning.

The contributions of this work are manifold. First, pre-

vious work in meta-learning has largely overlooked the is-

sue of the computational resource gap at different stages

of meta-learning. This issue poses challenges in the real-

world adoption of meta-learning applications. In this pa-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Few-ShotLearningofCompactModelsviaTask-SpecicMetaDistillationYongWu1,ShekhorChanda2,MehrdadHosseinzadeh3,ZhiLiu1,YangWang4*1ShanghaiUniversity,2UniversityofManitoba,3HuaweiTechnologiesCanada,4ConcordiaUniversityyongwu@shu.edu.cn,chandas@myumanitoba.ca,mehrdad.hosseinzadeh@live.comliuzhi@staff.shu....

展开>> 收起<<

Few-Shot Learning of Compact Models via Task-Speciﬁc Meta Distillation Yong Wu1 Shekhor Chanda2 Mehrdad Hosseinzadeh3 Zhi Liu1 Yang Wang4 1Shanghai University2University of Manitoba3Huawei Technologies Canada4Concordia University.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Few-Shot Learning of Compact Models via Task-Speciﬁc Meta Distillation Yong Wu1 Shekhor Chanda2 Mehrdad Hosseinzadeh3 Zhi Liu1 Yang Wang4 1Shanghai University2University of Manitoba3Huawei Technologies Canada4Concordia University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: