Unsupervised Few-shot Learning via Deep Laplacian Eigenmaps Kuilin Chen

2025-05-06 0 0 714.5KB 19 页 10玖币

侵权投诉

Unsupervised Few-shot Learning via Deep Laplacian

Eigenmaps

Kuilin Chen

University of Toronto

kuilin.chen@mail.utoronto.ca

Chi-Guhn Lee

University of Toronto

cglee@mie.utoronto.ca

Abstract

Learning a new task from a handful of examples remains an open challenge in machine learning. Despite

the recent progress in few-shot learning, most methods rely on supervised pretraining or meta-learning

on labeled meta-training data and cannot be applied to the case where the pretraining data is unlabeled.

In this study, we present an unsupervised few-shot learning method via deep Laplacian eigenmaps. Our

method learns representation from unlabeled data by grouping similar samples together and can be intuitively

interpreted by random walks on augmented training data. We analytically show how deep Laplacian

eigenmaps avoid collapsed representation in unsupervised learning without explicit comparison between

positive and negative samples. The proposed method signiﬁcantly closes the performance gap between

supervised and unsupervised few-shot learning. Our method also achieves comparable performance to

current state-of-the-art self-supervised learning methods under linear evaluation protocol.

1 Introduction

Few-shot learning (Fei-Fei et al.,2006) aims to learn a new classiﬁcation or regression model on

a novel task that is not seen during training, given only a few examples in the novel task. Existing

few-shot learning methods either rely on episodic meta-learning (Finn et al.,2017,Snell et al.,2017)

or standard pretraining (Chen et al.,2019,Tian et al.,2020b) in a supervised manner to extract trans-

ferrable knowledge to a new few-shot task. Unfortunately, these methods require many labeled

meta-training samples. Acquiring a lot of labeled data is costly or even impossible in practice. Re-

cently, several unsupervised meta-learning approaches have attempted to address this problem by

constructing synthetic tasks on unlabeled meta-training data (Hsu et al.,2019,Khodadadeh et al.,

2019,2021) or meta-training on self-supervised pretrained features (Lee et al.,2021a). However, the

performance of unsupervised meta-learning approaches is still far from their supervised counter-

parts. Empirical studies in supervised pretraining show that representation learning via grouping

similar samples together (Chen et al.,2019,Dhillon et al.,2020,Laenen and Bertinetto,2021,Tian

et al.,2020b) outperforms a wide range of episodic meta-learning methods, where the deﬁnition

of similar samples is given by class labels. The motivation of this study is to develop an unsuper-

vised representation learning method by grouping unlabeled meta-training data without episodic

training and close the performance gap between supervised and unsupervised few-shot learning.

Contrastive self-supervised learning has shown remarkable success in learning representation

from unlabeled data, which is competitive with supervised learning on multiple visual tasks (Hé-

naff et al.,2020,Tian et al.,2020a). The common underlying theme behind contrastive learning

is to pull together representation of augmented views of the same training sample (positive sam-

arXiv:2210.03595v1 [cs.LG] 7 Oct 2022

Figure 1: A graph on augmented views of unlabeled training data. The thickness of the edge

indicates the transition probability between vertices, which is proportional to their similarity. We

group similar vertices together by minimizing the total transition probability between different

groups.

ple) and disperse representation of augmented views from different training samples (negative

sample) (Wang and Isola,2020,Wu et al.,2018). Typically, contrastive learning methods require a

large size of negative samples to learn high-quality representation from unlabeled data (Chen et al.,

2020,He et al.,2020). This inevitably requires a large batch size of samples, demanding signiﬁ-

cant computing resources. Non-contrastive methods try to overcome the issue by accomplishing

self-supervised learning with only positive pairs. However, non-contrastive methods suffer from

trivial solutions where the model maps all inputs to the same constant vector, known as the col-

lapsed representation. Various methods have been proposed to avoid collapsed representation on

an ad hoc basis, such as asymmetric network architecture (Grill et al.,2020), stop gradient (Chen

and He,2021), and feature decorrelation (Ermolov et al.,2021,Hua et al.,2021,Zbontar et al.,2021).

However, theoretical understanding about how non-contrastive methods avoid collapsed represen-

tation is limited, though some preliminary attempts are made to analyze the training dynamics of

non-contrastive methods (Tian et al.,2021). Besides, most self-supervised learning methods focus

on the linear evaluation task where the training and test data come from the same classes. They

do not account for the domain gap between training and test classes, which is the case in few-shot

learning and cross-domain few-shot learning.

We develop a novel unsupervised representation learning method in which a weighted graph

is used to capture unlabeled samples as nodes and similarity among samples as the weights of

edges. Two samples are deemed similar if they are augmentations of a single sample and cluster-

ing of samples is accomplished by partitioning the graph. We provide an intuitive understanding

of the graph partition problem from the perspective of random walks on the graph, where the

transition probability between two vertices is proportional to their similarity. The optimal parti-

tion can be found by minimizing the total transition probability between clusters. It is linked to

the well-known Laplacian eigenmaps in spectral analysis (Belkin and Niyogi,2003,Meila and Shi,

2000,Shi and Malik,2000). We replace the locality-preserving projection in Laplacian eigenmaps

with deep neural networks for better scalability and ﬂexibility in learning high-dimensional data

such as images.

An additional technique is integrated into deep Laplacian eigenmaps to handle the domain gap

between the meta-training and meta-testing sets. Previous studies on word embeddings (e.g. king

- man + woman ≈queen) (Mikolov et al.,2013) and disentangled generative models (Karras et al.,

2019) show that interpolation between latent embeddings may correspond to the representation

of a realistic sample, which may not be seen in the training data. In contrast, interpolation in the

input space does not result in realistic samples. In parallel, interpolation between the distributions

of the nearest two meta-training classes in the embedding space can approximate the distribution

of one meta-testing class after the feature extractor is trained (Yang et al.,2021). To enhance the

performance on downstream few-shot learning tasks, we make interpolation of unlabeled meta-

training samples on data manifold to mimic unseen meta-test samples and integrate them into

unsupervised training of the feature extractor.

Our contributions are summarized as follows:

• A new unsupervised few-shot learning method is developed based on deep Laplacian eigen-

maps with an intuitive explanation based on random walks.

• Our loss function is analyzed to show how collapsed representation is avoided without ex-

plicit comparison to negative samples, shedding light on existing feature decorrelation based

self-supervised learning methods.

• The proposed method signiﬁcantly closes the performance gap between unsupervised and

supervised few-shot learning methods.

• Our method achieves comparable performance to current state-of-the-art (SOTA) self-

supervised learning methods under the linear evaluation protocol.

2 Methodology

2.1 Graph from augmented data

First, we construct a graph using augmented views of unlabeled data. Let ¯

x∈Rdbe a raw

sample without augmentation. For image data, augmented views are created by the commonly

used augmentations deﬁned in SimCLR (Chen et al.,2020), including horizontal ﬂip, Gaussian

blur, color jittering, and random cropping. Xdenotes the set of all augmented data and N=|X |.

We represent the augmented data in the form of a weighted graph G= (X, S), where each x∈ X is

a vertex of the graph and Sdenotes the edge weights. The edge between two vertices xi,xj∈ X is

weighted by the non-negative similarity sij between them. For unrelated xiand xj, the similarity

sij should be small. On the other hand, the similarity sij should be large when xiand xjare

augmentations from the same image or augmentations from two images within the same latent

classes. An illustrative diagram is shown in Fig. 1.

Let di=Pxj∈X sij denote the total weights associated with xi, which is the degree of a vertex

xiin a weighted graph. The degree matrix Dis deﬁned as the diagonal matrix with the degrees

d1, ..., dNon the diagonal. The volume of Xis Vol(X) = Pxi∈X di. Similarly, the volume of a

subset C⊂ X is deﬁned as Vol(C) = Pxi∈Cdi. Let P=D−1Sbe the random walk Laplacian of G,

where pij = (P)ij represents the transition probability from xito xj, and each row of Psums to 1.

L=D−Sis the unnormalized Laplacian of G.

2.2 Random walks and graph partition

Xcan be grouped into Kclusters, where similar vertices should be grouped into the same clus-

ter and dissimilar vertices should be grouped into different clusters. It resembles the supervised

pretraining by embedding samples from the same class together. We will show later that Kis also

the dimension of the embedding. Since we do not know the number of classes in unlabeled training

data, we set the embedding dimension K= 2048, which works well on a wide range of datasets.

Graph partition into clusters can be done by minimizing the total similarity Pxi∈C,xj∈C0sij between

two clusters C, C0∈ X ,C∩C0=∅. However, minimizing the total inter-cluster similarity is unde-

sirable because it can simply separate one individual vertex from the rest of the graph. Instead, we

run random walks on vertices X.

The random walk Laplacian Pdeﬁnes a Markov chain on the vertices X. The stationary dis-

tribution πof this chain has an explicit form πi=di/Vol(X)for xi∈ X (Meila and Shi,2000). The

transition probability P(C0|C) = P(X1∈C0|X0∈C)is given by

PX1∈C0|X0∈C

=



Vol(X)X

xi∈C,xj∈C0

sij

Vol(C)

Vol(X)−1

=Pxi∈C,xj∈C0sij

Vol(C)

(1)

The inter-cluster transition probability is a proper criterion because it has a small value only if

Pxi∈C,xj∈C0sij is small (low similarity for vertices in different clusters) and all clusters have sufﬁ-

ciently large volumes. As a result, minimization of the inter-cluster transition probability prevents

trivial solutions that simply separate one individual vertex from the rest of the graph.

The inter-cluster transition probability minimizing problem is a constrained optimization prob-

lem. For the case of ﬁnding Kclusters, we deﬁne a matrix Z∈RN×K, where zik represents the

cluster assignment of the vertex xi.

zik =1/pvol (Ck)if xi∈Ck

0otherwise (2)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnsupervisedFew-shotLearningviaDeepLaplacianEigenmapsKuilinChenUniversityofTorontokuilin.chen@mail.utoronto.caChi-GuhnLeeUniversityofTorontocglee@mie.utoronto.caAbstractLearninganewtaskfromahandfulofexamplesremainsanopenchallengeinmachinelearning.Despitetherecentprogressinfew-shotlearning,mostmethod...

展开>> 收起<<

Unsupervised Few-shot Learning via Deep Laplacian Eigenmaps Kuilin Chen.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unsupervised Few-shot Learning via Deep Laplacian Eigenmaps Kuilin Chen

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: