and He,2021), and feature decorrelation (Ermolov et al.,2021,Hua et al.,2021,Zbontar et al.,2021).
However, theoretical understanding about how non-contrastive methods avoid collapsed represen-
tation is limited, though some preliminary attempts are made to analyze the training dynamics of
non-contrastive methods (Tian et al.,2021). Besides, most self-supervised learning methods focus
on the linear evaluation task where the training and test data come from the same classes. They
do not account for the domain gap between training and test classes, which is the case in few-shot
learning and cross-domain few-shot learning.
We develop a novel unsupervised representation learning method in which a weighted graph
is used to capture unlabeled samples as nodes and similarity among samples as the weights of
edges. Two samples are deemed similar if they are augmentations of a single sample and cluster-
ing of samples is accomplished by partitioning the graph. We provide an intuitive understanding
of the graph partition problem from the perspective of random walks on the graph, where the
transition probability between two vertices is proportional to their similarity. The optimal parti-
tion can be found by minimizing the total transition probability between clusters. It is linked to
the well-known Laplacian eigenmaps in spectral analysis (Belkin and Niyogi,2003,Meila and Shi,
2000,Shi and Malik,2000). We replace the locality-preserving projection in Laplacian eigenmaps
with deep neural networks for better scalability and flexibility in learning high-dimensional data
such as images.
An additional technique is integrated into deep Laplacian eigenmaps to handle the domain gap
between the meta-training and meta-testing sets. Previous studies on word embeddings (e.g. king
- man + woman ≈queen) (Mikolov et al.,2013) and disentangled generative models (Karras et al.,
2019) show that interpolation between latent embeddings may correspond to the representation
of a realistic sample, which may not be seen in the training data. In contrast, interpolation in the
input space does not result in realistic samples. In parallel, interpolation between the distributions
of the nearest two meta-training classes in the embedding space can approximate the distribution
of one meta-testing class after the feature extractor is trained (Yang et al.,2021). To enhance the
performance on downstream few-shot learning tasks, we make interpolation of unlabeled meta-
training samples on data manifold to mimic unseen meta-test samples and integrate them into
unsupervised training of the feature extractor.
Our contributions are summarized as follows:
• A new unsupervised few-shot learning method is developed based on deep Laplacian eigen-
maps with an intuitive explanation based on random walks.
• Our loss function is analyzed to show how collapsed representation is avoided without ex-
plicit comparison to negative samples, shedding light on existing feature decorrelation based
self-supervised learning methods.
• The proposed method significantly closes the performance gap between unsupervised and
supervised few-shot learning methods.
• Our method achieves comparable performance to current state-of-the-art (SOTA) self-
supervised learning methods under the linear evaluation protocol.
2 Methodology
2.1 Graph from augmented data
First, we construct a graph using augmented views of unlabeled data. Let ¯
x∈Rdbe a raw
sample without augmentation. For image data, augmented views are created by the commonly
3