Multi-Modal Fusion by Meta-Initialization Matthew T. Jackson Department of Engineering Science
2025-05-02
3
0
1.51MB
8 页
10玖币
侵权投诉
Multi-Modal Fusion by Meta-Initialization
Matthew T. Jackson∗
Department of Engineering Science
University of Oxford
jackson@robots.ox.ac.uk
Shreshth A. Malik*
Department of Engineering Science
University of Oxford
shreshth@robots.ox.ac.uk
Michael T. Matthews
Department of Computer Science
University College London
Yousuf Mohamed-Ahmed
Department of Computer Science
University College London
Abstract
When experience is scarce, models may have insufficient information to adapt
to a new task. In this case, auxiliary information—such as a textual descrip-
tion of the task—can enable improved task inference and adaptation. In this
work, we propose an extension to the Model-Agnostic Meta-Learning algorithm
(MAML), which allows the model to adapt using auxiliary information as well as
task experience. Our method, Fusion by Meta-Initialization (FuMI), conditions
the model initialization on auxiliary information using a hypernetwork, rather than
learning a single, task-agnostic initialization. Furthermore, motivated by the short-
comings of existing multi-modal few-shot learning benchmarks, we constructed
iNat-Anim—a large-scale image classification dataset with succinct and visually
pertinent textual class descriptions. On iNat-Anim, FuMI significantly outper-
forms uni-modal baselines such as MAML in the few-shot regime. The code for
this project and a dataset exploration tool for iNat-Anim are publicly available at
https://github.com/s-a-malik/multi-few.
1 Introduction
Learning effectively in resource-constrained environments is an open challenge in machine learning
[1, 2, 3]. Yet humans are capable of rapidly learning new tasks from limited experience, in part by
drawing on auxiliary information about the task. This information can be particularly helpful in
the few-shot regime, as it can highlight features that have not been seen directly in task experience,
but are necessary to solve the task. For example, Figure 1 shows an example image classification
task where a text description of the class contains discriminative information that is not contained in
the training (support) images. Designing algorithms that can incorporate auxiliary information into
meta-learning approaches has consequently attracted much attention [4, 5, 6, 7, 8, 9, 10].
Model-agnostic meta-learning (MAML) [1] is a popular method for few-shot learning. However, it
cannot incorporate auxiliary task information. In this work, we propose Fusion by Meta-Initialization
(FuMI), an extension of MAML which uses a hypernetwork [11] to learn a mapping from auxiliary
task information to a parameter initialization. While MAML learns an initialization that facilitates
rapid learning across all tasks, FuMI conditions the initialization on the specific task to enable
improved adaption.
Existing multi-modal few-shot learning benchmarks largely rely on hand-crafted feature vectors for
each class [12, 13], or use noisy language descriptions from sources such as Wikipedia [14, 15].
∗Equal Contribution
Preprint. Under review.
arXiv:2210.04843v1 [cs.LG] 10 Oct 2022
Figure 1: An example few-shot learning task, using images and class descriptions from our proposed
dataset, iNat-Anim. Here, we see the class description contains information (the colour of the bird’s
breast) which is not found in the class images (as they are all turned away).
For this reason, we release iNat-Anim—a large animal species image classification dataset with
high quality descriptions of visual features. On this benchmark, we find that FuMI significantly
outperforms MAML in the very-few-shot regime.
2 Background
In the meta-learning framework [1], we suppose tasks are drawn from a task distribution
p(T)
. At
meta-train time, the model
fθ
is evaluated on a series of tasks
Ti∈ Dtrain
, where
Dtrain
is a finite set
of samples from
p(T)
. This gives task loss
LTi
, which is used to update the model parameters
θ
in
accordance with the meta-learning algorithm. At meta-test time, the trained model is evaluated on all
tasks in Dtest, another set of samples from p(T).
In an
N
-shot,
K
-way multi-modal classification problem
2
, a task
T= (S,Q)
is defined by a support
set
S={({xi,j }N
j=1, ti, yi)}K
i=1
and a query set
Q={({xi,j }M
j=1, yi)}K
i=1
, where
M
is the number
of query shots. The support set contains
N
samples and auxiliary class information
ti
for each of the
K
classes, which are used by the meta-learner to train an adapted model. Once this has been trained,
the adapted model is evaluated on the unseen query set, giving task loss
LQ
. In the context of our
work,
ti
denotes the textual description of the class
yi
, meaning each class has a textual description
and Nsupport images. Figure 1 shows an example task using the notation outlined here.
3 Data
Existing Multi-Modal Few-shot Benchmarks.
While there are a number of popular uni-modal
few-shot learning benchmarks [16, 17, 18], multi-modal benchmarks are less common. Some
works simply extend few-shot benchmarks by using the class label as auxiliary information [6, 19].
Benchmarks explicitly incorporating auxiliary modalities include Animals with Attributes (AWA)
[12] and Caltech-UCSD-Birds (CUB) [13] which augment images of animals/birds with hand-crafted
class attributes. While semantic class features can be highly discriminative, they require manual
labelling and are thus difficult to obtain at scale. Recent work instead uses the more general approach
of using natural language descriptions, for example, through augmenting CUB with Wikipedia articles
[14, 15]. However, these articles are subject to change and visual information is sparse, thus reducing
the relative benefit of the auxiliary information.
The iNat-Anim Dataset.
Motivated by these shortcomings, we constructed the iNat-Anim
3
dataset.
iNat-Anim consists of 195,605 images across 673 animal species, which is orders of magnitude larger
than existing benchmarks (AWA and CUB). The images are a subset of the iNaturalist 2021 CVPR
challenge [20] and have been augmented with textual descriptions from Animalia [21] to provide
2
For consistency with our dataset, the problem setting formulation is for classification. However our method
can also be applied to regression and reinforcement learning.
3https://doi.org/10.5281/zenodo.6703088
2
摘要:
展开>>
收起<<
Multi-ModalFusionbyMeta-InitializationMatthewT.JacksonDepartmentofEngineeringScienceUniversityofOxfordjackson@robots.ox.ac.ukShreshthA.Malik*DepartmentofEngineeringScienceUniversityofOxfordshreshth@robots.ox.ac.ukMichaelT.MatthewsDepartmentofComputerScienceUniversityCollegeLondonYousufMohamed-Ahmed...
声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
相关推荐
-
公司营销部领导述职述廉报告VIP免费
2024-12-03 4 -
100套述职述廉述法述学框架提纲VIP免费
2024-12-03 3 -
20220106政府党组班子党史学习教育专题民主生活会“五个带头”对照检查材料VIP免费
2024-12-03 3 -
20220106县纪委监委领导班子党史学习教育专题民主生活会对照检查材料VIP免费
2024-12-03 6 -
A文秘笔杆子工作资料汇编手册(近70000字)VIP免费
2024-12-03 3 -
20220106县领导班子党史学习教育专题民主生活会对照检查材料VIP免费
2024-12-03 4 -
经济开发区党工委书记管委会主任述学述职述廉述法报告VIP免费
2024-12-03 34 -
20220106政府领导专题民主生活会五个方面对照检查材料VIP免费
2024-12-03 11 -
派出所教导员述职述廉报告6篇VIP免费
2024-12-03 8 -
民主生活会对县委班子及其成员批评意见清单VIP免费
2024-12-03 50
分类:图书资源
价格:10玖币
属性:8 页
大小:1.51MB
格式:PDF
时间:2025-05-02


渝公网安备50010702506394