
Hypernetwork approach to Bayesian MAML
P. Borycki, P. Kubacki, M. Przewi˛e´zlikowski, T. Ku´
smierczyk, J. Tabor, P. Spurek
Faculty of Mathematics and Computer Science,
Jagiellonian University, Kraków, Poland
przemyslaw.spurek@gmail.com
Abstract
The main goal of Few-Shot learning algorithms is to enable learning from small
amounts of data. One of the most popular and elegant Few-Shot learning ap-
proaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this
method is to learn the shared universal weights of a meta-model, which are then
adapted for specific tasks. However, the method suffers from over-fitting and
poorly quantifies uncertainty due to limited data size. Bayesian approaches could,
in principle, alleviate these shortcomings by learning weight distributions in place
of point-wise weights. Unfortunately, previous modifications of MAML are limited
due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight
updates, or by the same structure enforced for universal and adapted weights.
In this paper, we propose a novel framework for Bayesian MAML called
BayesianHMAML, which employs Hypernetworks for weight updates. It learns the
universal weights point-wise, but a probabilistic structure is added when adapted
for specific tasks. In such a framework, we can use simple Gaussian distributions
or more complicated posteriors induced by Continuous Normalizing Flows.
1 Introduction
Few-Shot learning models easily adapt to previously unseen tasks based on a few labeled samples.
One of the most popular and elegant among them is Model-Agnostic Meta-Learning (MAML) [14].
The main idea behind this method is to produce universal weights which can be rapidly updated
to solve new small tasks (see the first plot in Fig. 1). However, limited data sets lead to two
main problems. First, the method tends to overfit to training data, preventing us from using deep
architectures with large numbers of weights. Second, it lacks good quantification of uncertainty,
e.g., the model does not know how reliable its predictions are. Both problems can be addressed by
employing Bayesian Neural Networks (BNNs) [
26
], which learn distributions in place of point-wise
estimates.
There exist a few Bayesian modifications of the classical MAML algorithm. Bayesian MAML [
54
],
Amortized bayesian meta-learning [
38
], PACOH [
41
,
40
], FO-MAML [
30
], MLAP-M [
1
], Meta-
Mixture [
22
] learn distributions for the common universal weights, which are then updated to per-task
local weights distributions. The above modifications of MAML, similar to the original MAML,
rely on gradient-based updates. Weights specialized for small tasks are obtained by taking a fixed
number of gradient steps from the standard universal weights. Such a procedure needs two levels of
Bayesian regularization and the universal distribution is usually employed as a prior for the per-task
specializations (see the second plot in Fig. 1). However, the hierarchical structure complicates the
optimization procedure and limits updates in the MAML procedure.
The paper presents BayesianHMAML – a new framework for Bayesian Few-Shot learning. It
simplifies the explained above weight-adapting procedure and thanks to the use of hypernetworks,
enables learning more complicated posterior updates. Similar to the previous approaches, the
final weight posteriors are obtained by updating from the universal weights. However, we avoid
arXiv:2210.02796v2 [cs.LG] 30 Aug 2023