Learnware Small Models Do Big Zhi-Hua Zhou Zhi-Hao Tan National Key Laboratory for Novel Software Technology

2025-04-29 0 0 5.05MB 23 页 10玖币
侵权投诉
Learnware: Small Models Do Big
Zhi-Hua Zhou, Zhi-Hao Tan
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China
zhouzh@nju.edu.cn
Abstract
There are complaints about current machine learning techniques such as the requirement
of a huge amount of training data and proficient training skills, the difficulty of continual
learning, the risk of catastrophic forgetting, the leaking of data privacy/proprietary, etc.
Most research efforts have been focusing on one of those concerned issues separately, paying
less attention to the fact that most issues are entangled in practice. The prevailing big
model paradigm, which has achieved impressive results in natural language processing and
computer vision applications, has not yet addressed those issues, whereas becoming a serious
source of carbon emissions. This article offers an overview of the learnware paradigm, which
attempts to enable users not need to build machine learning models from scratch, with the
hope of reusing small models to do things even beyond their original purposes, where the
key ingredient is the specification which enables a trained model to be adequately identified
to reuse according to the requirement of future users who know nothing about the model
in advance.
1. Introduction
Machine learning has achieved great success, while there are lots of complaints about the
requirement of a huge amount of training data (particularly data with labels), the difficulty
of adapting a trained model to changing environments, and the embarrassment of catas-
trophic forgetting when refining a trained model incrementally is demanded, etc. There
are great efforts such as weakly supervised learning [29] trying to reduce the requirement
arXiv:2210.03647v3 [cs.LG] 30 Oct 2023
of labeled training data, open-environment machine learning [30] trying to enable learning
models to adapt to environments, continual learning [4] trying to help deep neural networks
resist forgetting; however, these issues are still far from solved.
Indeed, most efforts have been focusing on one of those concerned issues separately, paying
less attention to the fact that most issues are entangled in practice. For example, a well-
studied technique of weakly supervised learning for reducing the requirement of labeled
training data is to collect and exploit a huge amount of unlabeled data drawn from the
distribution the same as that of the labeled training data, paying less attention to the fact
that in changing environments the data distributions are subject to change inherently. For
another example, an effective approach to cope with changing environments is to emphasize
data received in very recent timeslots since the changes have not yet caused significant
differences, paying less attention to the fact that the emphasis on very recent data may
tend to aggravate the severity of catastrophic forgetting.
There are many other issues, e.g., most ordinary users can hardly produce well-performed
models starting from scratch, due to the lack of proficient training skills; in many real-world
tasks the data privacy/proprietary issue may disable data sharing, leading to the difficulty
of sharing experience among different users; in really big data applications, it is generally
unaffordable or even infeasible to hold the whole data to support many passes of scanning.
The prevailing deep learning big model paradigm, which has achieved impressive results in
natural language processing and computer vision applications [20, 3], has not yet addressed
the above issues. Note that each big model is targeted to a task (or task class) planned in
advance, generally helpless to others, e.g., a big model trained for face recognition can hardly
be helpful to financial futures trading. It would be too ambitious to build a pre-trained big
model for every possible task, because the number of possible tasks can be unimaginably
big or even infinite. In addition, sadly, the training of big models is becoming a serious
source of carbon emissions threatening our environment.
Admitting the usefulness of big models in their specifically targeted tasks, is there any
paradigm offering the possibility of tackling the above issues simultaneously?
2
This article overviews the progress of learnware, a paradigm offering a promising answer to
the above question. It attempts to systematically reuse small models to do things that may
even be beyond their original purposes, and enables users not need to build their machine
learning models from scratch.
2. The Learnware Proposal
The learnware paradigm was proposed in [28]. A learnware is a well-performed trained
machine learning model with a specification which enables it to be adequately identified to
reuse according to the requirement of future users who know nothing about the learnware
in advance.
The developer or owner 1of a trained machine learning model (no matter whether the
model is a deep neural network, a support vector machine, or a decision tree, etc.) can
spontaneously submit her trained model into a learnware market. If the learnware market
decides to accept the model, it assigns a specification to the model and accommodates it
in the market. The learnware market should not be small, otherwise it can hardly offer
help for various tasks; it would be common to accommodate thousands or millions of well-
performed models submitted by different developers, on different tasks, using different data,
optimizing different objectives, etc.
Once the learnware market has been built, when a user is going to tackle a machine learning
task, she can do it in the following way rather than building her model from scratch. As the
comic in Figure 1 illustrates, she can submit her requirement to the learnware market, and
then the market will identify and deploy some helpful learnware(s) by considering the learn-
ware specification. The learnware can be applied by the user directly, or adapted/polished
by user’s own data for better usage, or exploited in other ways to help improve the model
built from the user’s own data. No matter which mechanism for model reuse is adopted,
1There are situations where the developer and owner of a trained machine learning model are different.
Here, for simplicity, we do not distinguish them and assume that the developer holds all rights of the model.
3
Specification
Specification
melon
good to cut
Specification
Recommend
Submit
Requirement
meat
good to cut
I can use it
directly or use
my data to
make it
sharper!
Figure 1: An analogy of learnware.
the whole process can be much less expensive and more efficient than building a model from
scratch by herself.
The learnware proposal offers the possibility of addressing most issues concerned in Sec-
tion 1:
Lack of training data: Strong machine learning models can be attained even for tasks
with small data, because the models are built upon well-performed learnwares, and only a
small amount of data are needed for adaptation or refinement for most cases.
Lack of training skills: Strong machine learning models can be attained even for ordi-
nary users with little training skills, because the users can get help from well-performed
learnwares rather than building a model from scratch by themselves.
Catastrophic forgetting: A learnware will always be accommodated in the learnware
market once it is accepted, unless every aspect of its function can be replaced by other
4
learnwares. Thus, the old knowledge in the learnware market is always held. Nothing to
be forgotten.
Continual learning: The learnware market naturally realizes continual and lifelong learn-
ing, because with the constant submissions of well-performed learnwares trained from di-
verse tasks, the knowledge held in the learnware market is being continually enriched.
Data privacy/proprietary: The developers only submit their models without sharing
their own data, and thus, the data privacy/proprietary can be well preserved. Although
one could not deny the possibility of reverse engineering the models, the risk would be too
small compared with many other privacy-preserving solutions.
Unplanned tasks: The learnware market is to be open to all legal developers. Thus, there
would exist helpful learnwares in the market unless a task is new to all legal developers.
Moreover, some new tasks, though no developer has built models for them specially, could
be addressed by selecting and assembling some existing learners.
Carbon emission: Assembling small models may offer good-enough performance for most
applications; thus, one may have less interest to train too many big models. The possibility
of reusing other developers’ models can help reduce repetitive development. Besides, a
not-so-good model for one user may be very helpful for another user. No training cost
wasted.
Though the learnware proposal shows a bright future, there is much work to be done to
make it a reality. Section 3-5 will present some of our progress.
3. The Design
There are three important entities: developers, users, and the market. The developers are
usually machine learning experts who produce and want to share/sell their well-performed
trained machine learning models. The users need machine learning services but usually
have only limited data and lack machine learning knowledge and skills. The learnware
market accepts/buys well-performed trained models from developers, accommodates them
5
摘要:

Learnware:SmallModelsDoBigZhi-HuaZhou,Zhi-HaoTanNationalKeyLaboratoryforNovelSoftwareTechnologyNanjingUniversity,Nanjing210023,Chinazhouzh@nju.edu.cnAbstractTherearecomplaintsaboutcurrentmachinelearningtechniquessuchastherequirementofahugeamountoftrainingdataandproficienttrainingskills,thedifficulty...

展开>> 收起<<
Learnware Small Models Do Big Zhi-Hua Zhou Zhi-Hao Tan National Key Laboratory for Novel Software Technology.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:5.05MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注