Self-Attention Message Passing for Contrastive Few-Shot Learning Ojas Kishorkumar Shirekar12 Anuj Singh12 Hadi Jamali-Rad12 1Delft University of Technology TU Delft The Netherlands

2025-04-26 0 0 2.84MB 11 页 10玖币
侵权投诉
Self-Attention Message Passing for Contrastive Few-Shot Learning
Ojas Kishorkumar Shirekar1,2, Anuj Singh1,2, Hadi Jamali-Rad1,2
1Delft University of Technology (TU Delft), The Netherlands
2Shell Global Solutions International B.V., Amsterdam, The Netherlands
{o.k.shirekar, a.r.singh}@student.tudelft.nl, h.jamalirad@tudelft.nl
Abstract
Humans have a unique ability to learn new represen-
tations from just a handful of examples with little to no
supervision. Deep learning models, however, require an
abundance of data and supervision to perform at a sat-
isfactory level. Unsupervised few-shot learning (U-FSL)
is the pursuit of bridging this gap between machines and
humans. Inspired by the capacity of graph neural net-
works (GNNs) in discovering complex inter-sample rela-
tionships, we propose a novel self-attention based mes-
sage passing contrastive learning approach (coined as
SAMP-CLR
) for U-FSL pre-training. We also propose an
optimal transport (OT) based fine-tuning strategy (we call
OpT-Tune
) to efficiently induce task awareness into our
novel end-to-end unsupervised few-shot classification frame-
work (
SAMPTransfer
). Our extensive experimental re-
sults corroborate the efficacy of
SAMPTransfer
in a vari-
ety of downstream few-shot classification scenarios, setting
a new state-of-the-art for U-FSL on both miniImageNet and
tieredImageNet benchmarks, offering up to
7%+
and
5%+
improvements, respectively. Our further investigations also
confirm that
SAMPTransfer
remains on-par with some
supervised baselines on miniImageNet and outperforms all
existing U-FSL baselines in a challenging cross-domain sce-
nario. Our code can be found in our GitHub repository:
https://github.com/ojss/SAMPTransfer/.1
1. Introduction
Deep learning models have become increasingly large and
data hungry to be able to guarantee acceptable downstream
performance. Humans need neither a ton of data samples
nor extensive forms of supervision to understand their sur-
roundings and the semantics therein. Few-shot learning has
garnered an upsurge of interest recently as it underscores
this fundamental gap between humans’ adaptive learning
capacity compared to data-demanding deep learning meth-
ods. In this realm, few-shot classification is cast as the task
of predicting class labels for a set of unlabeled data points
1This paper is accepted to appear in the proceedings of WACV 2023.
(query set) given only a small set of labeled ones (support
set). Typically, query and support data samples are drawn
from the same distribution.
Few-shot classification methods usually consist of two
sequential phases: (i) pre-training on a large dataset of base
classes, regardless of this pre-training being supervised or
unsupervised. This is followed by (ii) fine-tuning on an
unseen dataset consisting of novel classes. Normally, the
classes used in the pre-training and fine-tuning are mutually
exclusive. In this paper, we focus on the self-supervised
setting (also interchangeably called “unsupervised” in litera-
ture) where we have no access to the actual class labels of
the “base” dataset. Our motivation to tackle unsupervised
few-shot learning (U-FSL) is that it poses a more realistic
challenge, closer to humans’ learning process.
The body of work around U-FSL can be broadly classi-
fied into two different approaches. The first approach relies
on the use of meta-learning and episodic pre-training that
involves the creation of synthetic “tasks” to mimic the subse-
quent episodic fine-tuning phase [
1
,
16
,
23
25
,
29
,
56
]. The
second approach follows a transfer learning strategy, where
the network is trained non-episodically to learn optimal rep-
resentations in the pre-training phase from the abundance
of unlabeled data and is then followed by an episodic fine-
tuning phase [
14
,
32
,
39
]. To be more specific, a feature
extractor is first pre-trained to capture the structure of un-
labeled data (present in base classes) using some form of
representation learning [
5
,
6
,
32
,
39
]. Next, a prediction layer
(a linear layer, by convention) is fine-tuned in conjunction
with the pre-trained feature extractor for a swift adaptation
to the novel classes. The better the feature extractor models
the distribution of the unlabeled data, the less the predictor
requires training samples, and the faster it adapts itself to
the unseen classes in the fine-tuning and eventual testing
phases. Some recent studies [
11
,
32
,
42
] argue that transfer
learning approaches outperform meta-learning counterparts
in standard in-domain and cross-domain settings, where base
and novel classes come from totally different distributions.
On the other side of the aisle, supervised FSL approaches
that follow the episodic training paradigm may include a
arXiv:2210.06339v1 [cs.CV] 12 Oct 2022
certain degree of task awareness. Such approaches exploit
the information available in the query set during the training
and testing phases [
9
,
54
,
57
] to alleviate the model’s sample
bias. As a result, the model learns to generate task-specific
embeddings by better aligning the features of the support and
query samples for optimal metric based label assignment.
Some other supervised approaches do not rely purely on
convolutional feature extractors. Instead, they use graph
neural networks (GNN) to model instance-level and class-
level relationships [
26
,
37
,
55
,
58
]. This is owing to the fact
that GNN’s are capable of exploiting the manifold structure
of the novel classes [
52
]. However, looking at the recent
literature, one can barely see any GNN based architectures
being used in the unsupervised setting.
Recent unsupervised methods use a successful form of
contrastive learning [
6
] in their self-supervised pre-training
phase. Contrastive learning methods typically treat each
image in a batch as its own class. The only other images
that share the class are the augmentations of the image in
question. Such methods enforce similarity of representations
between pairs of an image and its augmentations (positive
pairs), while enforcing dissimilarity between all other pairs
of images (negative pairs) through a contrastive loss. Al-
though these methods work well, they overlook the possi-
bility that within a randomly sampled batch of images there
could be several images (apart from their augmentations)
that in reality belong to the same class. By applying the con-
trastive loss, the network may inadvertently learn different
representations for such images and classes.
To address this problem, recent methods such as SimCLR
[
6
] introduce larger batch sizes in the pre-training phase to
maximize the number of negative samples. However, this
approach faces two shortcomings: (i) increasingly larger
batch sizes, mandate more costly training infrastructure, and
(ii) it still does not ingrain intra-class dependencies into
the network. Point (ii) still applies to even more recent
approaches, such as ProtoCLR [
32
]. A simple yet effective
remedy of this problem proposed in C
3
LR [
39
] where an
intermediate clustering and re-ranking step is introduced,
and the contrastive loss is accordingly adjusted to ingest
a semblance of class-cognizance. However, the problem
could be approached from a different perspective, where the
network explores the structure of data samples per batch.
We propose a novel U-FSL approach (coined as
SAMPTransfer
) that marries the potential of GNNs in
learning the global structure of data in the pre-training stage,
and the efficiency of optimal transport (OT) for inducing
task-awareness in the following fine-tuning phase. More
concretely, with
SAMPTransfer
we introduce a novel self-
attention message passing contrastive learning (
SAMP-CLR
)
scheme that uses a form of graph attention allowing the
network to learn refined representations by looking beyond
single-image instances per batch. Furthermore, the proposed
OT based fine-tuning strategy (we call
OpT-Tune
) aligns
the distributions of the support and query samples to improve
downstream adaptability of the pre-trained encoder, without
requiring any additional parameters. Our contributions can
be summarized as:
1.
We propose
SAMPTransfer
, a novel U-FSL ap-
proach that introduces a self-attention message passing
contrastive learning (
SAMP-CLR
) paradigm for unsu-
pervised few-shot pre-training.
2.
We propose applying an optimal transport (OT) based
fine-tuning (
OpT-Tune
) strategy to efficiently induce
task-awareness in both fine-tuning and inference stages.
3.
We present a theoretical foundation for
SAMPTransfer
, as well as extensive experimental
results corroborating the efficacy of
SAMPTransfer
,
and setting a new state-of-the-art (to our best knowl-
edge) in both miniImageNet and tieredImageNet
benchmarks, we also report competitive performance
on the challenging CDFSL benchmark [20].
2. Related Work
Self-Supervised learning
. Self-supervised learning
(SSL) is a term used for a collection of unsupervised meth-
ods that obtain supervisory signals from within the data
itself, typically by leveraging the underlying structure in the
data. The general technique of self-supervised learning is to
predict any unobserved (or property) of the input from any
observed part. Several recent advances in the SSL space have
made waves by eclipsing their fully supervised counterparts
[
18
]. Some examples of seminal works include SimCLR [
6
],
BYOL [
19
], SWaV [
5
], MoCo [
21
], and SimSiam [
7
]. Our
pre-training method
SAMP-CLR
is inspired by SimCLR [
6
],
ProtoTransfer [32] and C3LR [39].
Metric learning
. Metric learning aims to learn a repre-
sentation function that maps the data to an embedding space.
The distance between objects in the embedding space must
preserve their similarity (or dissimilarity) - similar objects
are closer, while dissimilar objects are farther. For example,
unsupervised methods based on some form of contrastive
loss, such as SimCLR [
6
] or NNCLR [
15
], guide objects
belonging to the same potential class to be mapped to the
same point and those from different classes to be mapped to
different points. Note that in an unsupervised setting, each
image in a batch is its own class. This process generally in-
volves taking two crops of the same image and encouraging
the network to emit an identical representation for the two,
while ensuring that the representations remain different from
all other images in a given batch. Metric learning methods
have been shown to work quite well for few-shot learning.
AAL-ProtoNets [
1
], ProtoTransfer [
32
], UMTRA [
25
], and
certain GNN methods [
37
] are excellent examples that use
metric learning for few-shot learning.
f=fθfψ
Figure 1:
SAMP-CLR
schematic view and pre-training procedure. In the figure,
xi,a
is an image sampled from the augmented
set A. The p-message passing steps refine the features extracted using a CNN encoder.
f
Figure 2: Features extracted from the pre-trained CNN are used to build a graph. The features are first refined using the
pre-trained SAMP layer(s). Then OpT-Tune aligns support features with query features.
Graph Neural Networks for FSL
. Since the first use of
graphs for FSL in [
37
], there have been several advance-
ments and continued interest in using graphs for supervised
FSL. In [
37
], each node corresponds to one instance (labeled
or unlabeled) and is represented as the concatenation of a
feature embedding and a label embedding. The final layer
of their model is a linear classifier layer that directly outputs
the prediction scores for each unlabeled node. There has also
been an increase in methods that use transduction. TPN [
31
]
is one of those methods that uses graphs to propagate labels
[
52
] from labeled samples to unlabeled samples. Although
methods such as EGNN [
26
] make use of edge and node fea-
tures, earlier methods focused only on using node features.
Graphs are attractive, as they can model intra-batch relations
and can be extended for transduction, as used in [
26
,
31
].
In addition to transduction and relation modeling, graphs
are highly potent as task adaptation modules. HGNN [
58
]
is an example in which a graph is used to refine and adapt
feature embeddings. It must be noted that most graph-based
methods have been applied in the supervised FSL setting.
To the best of our knowledge, we are the first to use it in
any form for U-FSL. More specifically, we use a message
passing network as a part of our network architecture and
pre-training scheme.
3. Proposed Method (SAMPTransfer)
In this section, we first describe our problem formulation.
We then discuss the two subsequent phases of the proposed
approach: (i) self-supervised pre-training (
SAMP-CLR
),
and (ii) the optimal transport based episodic supervised
fine-tuning (
OpT-Tune
). Together, these two phases con-
stitute our overall approach, which we have coined as
SAMPTransfer
. The mechanics of the proposed pre-
training and fine-tuning procedures are illustrated in Figs. 1
and 2, respectively.
3.1. Preliminaries
Let us denote the training data of size
D
as
Dtr =
{(xi, yi)}D
i=1
with
(xi, yi)
representing an image and its
class label, respectively. In the pre-training phase, we sam-
ple
L
random images from
Dtr
and augment each sam-
ple
A
times by randomly sampling augmentation functions
ζa(.),a[A]
from the set
A
. This results in a mini-batch
of size
B= (A+ 1)L
total samples. Note that in the unsu-
pervised setting, we have no access to the data labels in the
pre-training phase. Next, we fine-tune our model episodi-
cally [
47
] on a set of randomly sampled tasks
Ti
drawn from
the test dataset
Dtst ={(xi, yi)}D
i=1
of size
D
. A task,
Ti
,
摘要:

Self-AttentionMessagePassingforContrastiveFew-ShotLearningOjasKishorkumarShirekar1,2,AnujSingh1,2,HadiJamali-Rad1,21DelftUniversityofTechnology(TUDelft),TheNetherlands2ShellGlobalSolutionsInternationalB.V.,Amsterdam,TheNetherlands{o.k.shirekar,a.r.singh}@student.tudelft.nl,h.jamalirad@tudelft.nlAbst...

展开>> 收起<<
Self-Attention Message Passing for Contrastive Few-Shot Learning Ojas Kishorkumar Shirekar12 Anuj Singh12 Hadi Jamali-Rad12 1Delft University of Technology TU Delft The Netherlands.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2.84MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注