Comparing with existing conventional deterministic rep-
resentation modelling, we model representations as a ran-
dom variable with learnable parameters and represent proto-
types in the form of distributions. We take the form of mul-
tivariate Gaussian distribution for both representations and
prototypes. An illustration of proposed probabilistic repre-
sentations and distribution prototypes is shown in Figure 1.
The involvement of probability is shown in z∼p(z|x). The
pixel of the fuzzy train carriage xiis mapped to ziin the la-
tent space which contains two parts including the most like
representation µand the probability σ2which correspond
to the mean and variance of distribution respectively. Simi-
larly, the pixels of the car xj1and xj2are mapped to zj1and
zj2respectively. For comparison, deterministic mapping is
shown in z=h(f(x)). Considering the scenario where the
distance from representation zito prototype ρiis same as the
distance from zito ρj, there exist an ambiguity of the map-
ping from zito ρiand ρjin deterministic representation.
On the contrary, ziis mapped to ρiin probabilistic repre-
sentation since ρihas a smaller σ2than ρj. Note that σ2is
inversely proportional to the probability, which implies that
the mapping from zito ρiis more reliable. Furthermore, zj1
and zj2contribute to the car prototype ρjto different de-
grees. Through taking the probability of representations into
consideration, the prototypes can be estimated more accu-
rately. Meanwhile, the variance σ2is constrained during the
training procedure, which further improves the reliability of
the representations and prototypes.
In this paper, we define pixel-wise representations and
prototypes from a new perspective of probability theory
and design a new framework for pixel-wise Probabilistic
Representation Contrastive Learning, named PRCL. Our
key insight is to: (i) involve modelling probability into the
representations and prototypes, and (ii) explore a more ac-
curate similarity measurement between probabilistic repre-
sentations and prototypes. For the first objective (i), we con-
catenate an Probability head (Multilayer Perceptron, MLP)
to encoder to predict the probabilities of representations and
construct a distribution prototype with probabilistic repre-
sentations as observations based on Bayesian Estimation
(Vaseghi 2008). In the latent space, each prototype is rep-
resented as a distribution rather than a point, which enables
them to explore the uncertainty. For objective (ii), we lever-
age mutual likelihood score (MLS) (Shi and Jain 2019) to di-
rectly compute the similarity among probabilistic represen-
tations and distribution prototypes. MLS can naturally adjust
the weight of distance based on the uncertainty, i.e. penalize
ambiguous representations and vice versa. Taking the advan-
tage of the confidence information contained in probabilis-
tic representations, model robustness to inaccurate pseudo-
labels is significantly enhaned for stable training. In addi-
tion, we propose a soft freezing strategy to optimize prob-
ability head free from probability converging sharply to ∞
during training without constraint.
In summary, we propose to alleviate the negative effects
from inaccurate pseudo-labels by introducing probabilistic
representation with PRCL framework, which reduces the
contribution of representations with high uncertainty and
concentrates on more reliable ones in contrastive learning.
To the best of our knowledge, we are the first to simulta-
neously train the representation and probability. Extensive
evaluation on Pascal VOC (Everingham et al. 2010) and
CityScapes (Cordts et al. 2016) to demonstrate the superior
performances than the SOTA baselines.
Related Work
Semi-supervised semantic segmentation
The goal of semantic segmentation is to classify each pixel
in an entire image by class. The training of such dense pre-
diction tasks relies on large amounts of data and tedious
manual annotations. Semi-supervised learning is a label-
efficient task that needs to take advantage of a large amount
of unlabeled data to improve model performance. Entropy
minimization (Hung et al. 2018; Ke et al. 2020a) and consis-
tency regularization (Ouali, Hudelot, and Tami 2020; Peng
et al. 2020; Fan et al. 2022) are two main branches. Recently,
self-training methods benefit from strong data augmenta-
tion (French et al. 2020; Olsson et al. 2021; Hu et al. 2021)
and well-refined pseudo-labels (Sohn et al. 2020; Feng et al.
2022). Besides, some methods (Guan et al. 2022) balancing
the distributions of subclass are competitive in some scenar-
ios. Recent works based on self-training (Liu et al. 2021;
Wang et al. 2022; Xie et al. 2022) attempt to regularize rep-
resentations in latent space for better embedding space dis-
tribution. This improves the quality of features and leads to
better model performance, which is also our goal.
Contrastive Learning
As a major branch of metric learning, the key idea of con-
trastive learning is to pull positive pairs close and push neg-
ative pairs apart in the latent feature space through a con-
trastive loss. At the instance level, it treats each image as
a single class and distinguishes the image from others in
multiple views (Wu et al. 2018; Ye et al. 2019; Chen et al.
2020; He et al. 2020; Grill et al. 2020). To alleviate the neg-
ative impact of sampling bias, some works (Chuang et al.
2020) try to correct for the sampling of same-label data, even
without the information of true labels. Furthermore, in some
supervised or semi-supervised settings, some works (Zhao
et al. 2022) introduce class information to train models to
distinguish between classes. At the pixel level, pixel-wise
representations are distinguished by labels or pseudo-labels
(Lai et al. 2021; Wang et al. 2021). However, in the semi-
supervised setting, only a small amount of labeled data is
available. Most pixel divisions are based on pseudo-labels,
and inaccurate pseudo-labels lead to a disorder in the latent
space. To address these issues, previous methods (Liu et al.
2021; Alonso et al. 2021) try to polish pseudo-labels. In our
approach, we focus on tolerating inaccurate pseudo-labels
rather than filtering them.
Probabilistic Embedding
Probabilistic Embeddings (PE) is an extension of conven-
tional embeddings. Methods of PE usually predict the over-
all distribution of the embeddings, e.g. , Gaussian (Shi and
Jain 2019) and von Mises-Fisher (Li et al. 2021), rather