Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations Haoyu Xie1 Changqi Wang1 Mingkai Zheng2 3 Minjing Dong2 Shan You3

2025-05-06 0 0 5.05MB 9 页 10玖币
侵权投诉
Boosting Semi-Supervised Semantic Segmentation with Probabilistic
Representations
Haoyu Xie1, Changqi Wang1, Mingkai Zheng2, 3, Minjing Dong2, Shan You3,
Chong Fu1, 4, Chang Xu 2
1School of Computer Science and Engineering, Northeastern University, Shenyang, China
2School of Computer Science, Faculty of Engineer, The University of Sydney, Sydney, Australia
3SenseTime Research, Beijing, China
4Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China
{xiehaoyu, wangchangqi}@stumail.neu.edu.cn, mingkaizheng@outlook.com, mdon0736@uni.sydney.edu.au,
youshan@sensetime.com, fuchong@mail.neu.edu.cn, c.xu@sydney.edu.au
Abstract
Recent breakthroughs in semi-supervised semantic segmen-
tation have been developed through contrastive learning. In
prevalent pixel-wise contrastive learning solutions, the model
maps pixels to deterministic representations and regularizes
them in the latent space. However, there exist inaccurate
pseudo-labels which map the ambiguous representations of
pixels to the wrong classes due to the limited cognitive abil-
ity of the model. In this paper, we define pixel-wise repre-
sentations from a new perspective of probability theory and
propose a Probabilistic Representation Contrastive Learning
(PRCL) framework that improves representation quality by
taking its probability into consideration. Through modelling
the mapping from pixels to representations as the probabil-
ity via multivariate Gaussian distributions, we can tune the
contribution of the ambiguous representations to tolerate the
risk of inaccurate pseudo-labels. Furthermore, we define pro-
totypes in the form of distributions, which indicates the con-
fidence of a class, while the point prototype cannot. More-
over, we propose to regularize the distribution variance to
enhance the reliability of representations. Taking advantage
of these benefits, high-quality feature representations can be
derived in the latent space, thereby the performance of se-
mantic segmentation can be further improved. We conduct
sufficient experiment to evaluate PRCL on Pascal VOC and
CityScapes to demonstrate its superiority. The code is avail-
able at https://github.com/Haoyu-Xie/PRCL.
Introduction
Semantic segmentation is a pixel-level classification task,
i.e. predicting the class of each pixel. Existing super-
vised methods rely on large-scale annotated data, which re-
quires high manual-labeling costs. Semi-supervised learn-
ing (Ouali, Hudelot, and Tami 2020; Ke et al. 2020b; Zou
et al. 2021; Sohn et al. 2020) takes advantage of unlabeled
data and relieves the labor of human annotation. Some meth-
ods use unlabeled data to improve segmentation models via
adversarial learning (Ke et al. 2020b), consistency regular-
ization (Peng et al. 2020), and self-training (Tarvainen and
Valpola 2017). Self-training is a typical solution that uses
Copyright © 2023, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
Figure 1: Contradistinction between two types of represen-
tations and prototypes. Deterministic representations: left
triangle symbol, Probabilistic representations: right dotted
cross symbol, Point prototypes: left filled circle symbols,
Distribution prototypes: right radial circle symbols.
the prediction generated by a model trained on labeled data
(called pseudo-label) as ground-truth to train unlabeled data.
Recently, powerful methods based on self-training (Liu
et al. 2021; Wang et al. 2022) additionally introduce a pixel-
wise contrastive learning as an auxiliary task to further ex-
plore unlabeled data. Contrastive learning benefits from not
only the local context of neighbouring pixels, but also global
semantic class relations across the mini-batch even the en-
tire dataset. They map pixels to representations and reg-
ularize them in the latent space in a supervised way, i.e.
gather representations belonging to the same class and scat-
ter representations belonging to different classes, where the
semantic class information comes from both ground truths
and pseudo-labels. Most of contrastive learning methods is
guided by pseudo-labels in semi-supervised settings. There-
fore, the quality of pseudo-labels is critical since inaccu-
rate pseudo-labels lead to assigning representations to wrong
classes and cause a disorder in latent space. Existing efforts
try to polish pseudo-labels via either confidence (Liu et al.
2021) or entropy (Feng et al. 2022). These techniques can
improve the quality of pseudo-labels and eliminate inaccu-
rate ones to some extent. However, the inherent noise as well
as essential incorrectness in pseudo-labels are rather difficult
to be perfectly tackled by existing work. Thus, we propose to
alleviate this risk in an opposite way. Specifically, instead of
paying more attention to polishing pseudo-labels for inaccu-
racy elimination, we propose to improve the quality of rep-
resentations from data via modelling probability, and allow
them to perform better under the inaccurate pseudo labels.
arXiv:2210.14670v3 [cs.CV] 16 Dec 2022
Comparing with existing conventional deterministic rep-
resentation modelling, we model representations as a ran-
dom variable with learnable parameters and represent proto-
types in the form of distributions. We take the form of mul-
tivariate Gaussian distribution for both representations and
prototypes. An illustration of proposed probabilistic repre-
sentations and distribution prototypes is shown in Figure 1.
The involvement of probability is shown in zp(z|x). The
pixel of the fuzzy train carriage xiis mapped to ziin the la-
tent space which contains two parts including the most like
representation µand the probability σ2which correspond
to the mean and variance of distribution respectively. Simi-
larly, the pixels of the car xj1and xj2are mapped to zj1and
zj2respectively. For comparison, deterministic mapping is
shown in z=h(f(x)). Considering the scenario where the
distance from representation zito prototype ρiis same as the
distance from zito ρj, there exist an ambiguity of the map-
ping from zito ρiand ρjin deterministic representation.
On the contrary, ziis mapped to ρiin probabilistic repre-
sentation since ρihas a smaller σ2than ρj. Note that σ2is
inversely proportional to the probability, which implies that
the mapping from zito ρiis more reliable. Furthermore, zj1
and zj2contribute to the car prototype ρjto different de-
grees. Through taking the probability of representations into
consideration, the prototypes can be estimated more accu-
rately. Meanwhile, the variance σ2is constrained during the
training procedure, which further improves the reliability of
the representations and prototypes.
In this paper, we define pixel-wise representations and
prototypes from a new perspective of probability theory
and design a new framework for pixel-wise Probabilistic
Representation Contrastive Learning, named PRCL. Our
key insight is to: (i) involve modelling probability into the
representations and prototypes, and (ii) explore a more ac-
curate similarity measurement between probabilistic repre-
sentations and prototypes. For the first objective (i), we con-
catenate an Probability head (Multilayer Perceptron, MLP)
to encoder to predict the probabilities of representations and
construct a distribution prototype with probabilistic repre-
sentations as observations based on Bayesian Estimation
(Vaseghi 2008). In the latent space, each prototype is rep-
resented as a distribution rather than a point, which enables
them to explore the uncertainty. For objective (ii), we lever-
age mutual likelihood score (MLS) (Shi and Jain 2019) to di-
rectly compute the similarity among probabilistic represen-
tations and distribution prototypes. MLS can naturally adjust
the weight of distance based on the uncertainty, i.e. penalize
ambiguous representations and vice versa. Taking the advan-
tage of the confidence information contained in probabilis-
tic representations, model robustness to inaccurate pseudo-
labels is significantly enhaned for stable training. In addi-
tion, we propose a soft freezing strategy to optimize prob-
ability head free from probability converging sharply to
during training without constraint.
In summary, we propose to alleviate the negative effects
from inaccurate pseudo-labels by introducing probabilistic
representation with PRCL framework, which reduces the
contribution of representations with high uncertainty and
concentrates on more reliable ones in contrastive learning.
To the best of our knowledge, we are the first to simulta-
neously train the representation and probability. Extensive
evaluation on Pascal VOC (Everingham et al. 2010) and
CityScapes (Cordts et al. 2016) to demonstrate the superior
performances than the SOTA baselines.
Related Work
Semi-supervised semantic segmentation
The goal of semantic segmentation is to classify each pixel
in an entire image by class. The training of such dense pre-
diction tasks relies on large amounts of data and tedious
manual annotations. Semi-supervised learning is a label-
efficient task that needs to take advantage of a large amount
of unlabeled data to improve model performance. Entropy
minimization (Hung et al. 2018; Ke et al. 2020a) and consis-
tency regularization (Ouali, Hudelot, and Tami 2020; Peng
et al. 2020; Fan et al. 2022) are two main branches. Recently,
self-training methods benefit from strong data augmenta-
tion (French et al. 2020; Olsson et al. 2021; Hu et al. 2021)
and well-refined pseudo-labels (Sohn et al. 2020; Feng et al.
2022). Besides, some methods (Guan et al. 2022) balancing
the distributions of subclass are competitive in some scenar-
ios. Recent works based on self-training (Liu et al. 2021;
Wang et al. 2022; Xie et al. 2022) attempt to regularize rep-
resentations in latent space for better embedding space dis-
tribution. This improves the quality of features and leads to
better model performance, which is also our goal.
Contrastive Learning
As a major branch of metric learning, the key idea of con-
trastive learning is to pull positive pairs close and push neg-
ative pairs apart in the latent feature space through a con-
trastive loss. At the instance level, it treats each image as
a single class and distinguishes the image from others in
multiple views (Wu et al. 2018; Ye et al. 2019; Chen et al.
2020; He et al. 2020; Grill et al. 2020). To alleviate the neg-
ative impact of sampling bias, some works (Chuang et al.
2020) try to correct for the sampling of same-label data, even
without the information of true labels. Furthermore, in some
supervised or semi-supervised settings, some works (Zhao
et al. 2022) introduce class information to train models to
distinguish between classes. At the pixel level, pixel-wise
representations are distinguished by labels or pseudo-labels
(Lai et al. 2021; Wang et al. 2021). However, in the semi-
supervised setting, only a small amount of labeled data is
available. Most pixel divisions are based on pseudo-labels,
and inaccurate pseudo-labels lead to a disorder in the latent
space. To address these issues, previous methods (Liu et al.
2021; Alonso et al. 2021) try to polish pseudo-labels. In our
approach, we focus on tolerating inaccurate pseudo-labels
rather than filtering them.
Probabilistic Embedding
Probabilistic Embeddings (PE) is an extension of conven-
tional embeddings. Methods of PE usually predict the over-
all distribution of the embeddings, e.g. , Gaussian (Shi and
Jain 2019) and von Mises-Fisher (Li et al. 2021), rather
摘要:

BoostingSemi-SupervisedSemanticSegmentationwithProbabilisticRepresentationsHaoyuXie1,ChangqiWang1,MingkaiZheng2,3,MinjingDong2,ShanYou3,ChongFu1,4,ChangXu21SchoolofComputerScienceandEngineering,NortheasternUniversity,Shenyang,China2SchoolofComputerScience,FacultyofEngineer,TheUniversityofSydney,Sydn...

展开>> 收起<<
Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations Haoyu Xie1 Changqi Wang1 Mingkai Zheng2 3 Minjing Dong2 Shan You3.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:5.05MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注