Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations Haoyu Xie1 Changqi Wang1 Mingkai Zheng2 3 Minjing Dong2 Shan You3

2025-05-06 0 0 5.05MB 9 页 10玖币

侵权投诉

Boosting Semi-Supervised Semantic Segmentation with Probabilistic

Representations

Haoyu Xie1, Changqi Wang1, Mingkai Zheng2, 3, Minjing Dong2, Shan You3,

Chong Fu1, 4, Chang Xu 2

1School of Computer Science and Engineering, Northeastern University, Shenyang, China

2School of Computer Science, Faculty of Engineer, The University of Sydney, Sydney, Australia

3SenseTime Research, Beijing, China

4Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, China

{xiehaoyu, wangchangqi}@stumail.neu.edu.cn, mingkaizheng@outlook.com, mdon0736@uni.sydney.edu.au,

youshan@sensetime.com, fuchong@mail.neu.edu.cn, c.xu@sydney.edu.au

Abstract

Recent breakthroughs in semi-supervised semantic segmen-

tation have been developed through contrastive learning. In

prevalent pixel-wise contrastive learning solutions, the model

maps pixels to deterministic representations and regularizes

them in the latent space. However, there exist inaccurate

pseudo-labels which map the ambiguous representations of

pixels to the wrong classes due to the limited cognitive abil-

ity of the model. In this paper, we deﬁne pixel-wise repre-

sentations from a new perspective of probability theory and

propose a Probabilistic Representation Contrastive Learning

(PRCL) framework that improves representation quality by

taking its probability into consideration. Through modelling

the mapping from pixels to representations as the probabil-

ity via multivariate Gaussian distributions, we can tune the

contribution of the ambiguous representations to tolerate the

risk of inaccurate pseudo-labels. Furthermore, we deﬁne pro-

totypes in the form of distributions, which indicates the con-

ﬁdence of a class, while the point prototype cannot. More-

over, we propose to regularize the distribution variance to

enhance the reliability of representations. Taking advantage

of these beneﬁts, high-quality feature representations can be

derived in the latent space, thereby the performance of se-

mantic segmentation can be further improved. We conduct

sufﬁcient experiment to evaluate PRCL on Pascal VOC and

CityScapes to demonstrate its superiority. The code is avail-

able at https://github.com/Haoyu-Xie/PRCL.

Introduction

Semantic segmentation is a pixel-level classiﬁcation task,

i.e. predicting the class of each pixel. Existing super-

vised methods rely on large-scale annotated data, which re-

quires high manual-labeling costs. Semi-supervised learn-

ing (Ouali, Hudelot, and Tami 2020; Ke et al. 2020b; Zou

et al. 2021; Sohn et al. 2020) takes advantage of unlabeled

data and relieves the labor of human annotation. Some meth-

ods use unlabeled data to improve segmentation models via

adversarial learning (Ke et al. 2020b), consistency regular-

ization (Peng et al. 2020), and self-training (Tarvainen and

Valpola 2017). Self-training is a typical solution that uses

Figure 1: Contradistinction between two types of represen-

tations and prototypes. Deterministic representations: left

triangle symbol, Probabilistic representations: right dotted

cross symbol, Point prototypes: left ﬁlled circle symbols,

Distribution prototypes: right radial circle symbols.

the prediction generated by a model trained on labeled data

(called pseudo-label) as ground-truth to train unlabeled data.

Recently, powerful methods based on self-training (Liu

et al. 2021; Wang et al. 2022) additionally introduce a pixel-

wise contrastive learning as an auxiliary task to further ex-

plore unlabeled data. Contrastive learning beneﬁts from not

only the local context of neighbouring pixels, but also global

semantic class relations across the mini-batch even the en-

tire dataset. They map pixels to representations and reg-

ularize them in the latent space in a supervised way, i.e.

gather representations belonging to the same class and scat-

ter representations belonging to different classes, where the

semantic class information comes from both ground truths

and pseudo-labels. Most of contrastive learning methods is

guided by pseudo-labels in semi-supervised settings. There-

fore, the quality of pseudo-labels is critical since inaccu-

rate pseudo-labels lead to assigning representations to wrong

classes and cause a disorder in latent space. Existing efforts

try to polish pseudo-labels via either conﬁdence (Liu et al.

2021) or entropy (Feng et al. 2022). These techniques can

improve the quality of pseudo-labels and eliminate inaccu-

rate ones to some extent. However, the inherent noise as well

as essential incorrectness in pseudo-labels are rather difﬁcult

to be perfectly tackled by existing work. Thus, we propose to

alleviate this risk in an opposite way. Speciﬁcally, instead of

paying more attention to polishing pseudo-labels for inaccu-

racy elimination, we propose to improve the quality of rep-

resentations from data via modelling probability, and allow

them to perform better under the inaccurate pseudo labels.

arXiv:2210.14670v3 [cs.CV] 16 Dec 2022

Comparing with existing conventional deterministic rep-

resentation modelling, we model representations as a ran-

dom variable with learnable parameters and represent proto-

types in the form of distributions. We take the form of mul-

tivariate Gaussian distribution for both representations and

prototypes. An illustration of proposed probabilistic repre-

sentations and distribution prototypes is shown in Figure 1.

The involvement of probability is shown in z∼p(z|x). The

pixel of the fuzzy train carriage xiis mapped to ziin the la-

tent space which contains two parts including the most like

representation µand the probability σ2which correspond

to the mean and variance of distribution respectively. Simi-

larly, the pixels of the car xj1and xj2are mapped to zj1and

zj2respectively. For comparison, deterministic mapping is

shown in z=h(f(x)). Considering the scenario where the

distance from representation zito prototype ρiis same as the

distance from zito ρj, there exist an ambiguity of the map-

ping from zito ρiand ρjin deterministic representation.

On the contrary, ziis mapped to ρiin probabilistic repre-

sentation since ρihas a smaller σ2than ρj. Note that σ2is

inversely proportional to the probability, which implies that

the mapping from zito ρiis more reliable. Furthermore, zj1

and zj2contribute to the car prototype ρjto different de-

grees. Through taking the probability of representations into

consideration, the prototypes can be estimated more accu-

rately. Meanwhile, the variance σ2is constrained during the

training procedure, which further improves the reliability of

the representations and prototypes.

In this paper, we deﬁne pixel-wise representations and

prototypes from a new perspective of probability theory

and design a new framework for pixel-wise Probabilistic

Representation Contrastive Learning, named PRCL. Our

key insight is to: (i) involve modelling probability into the

representations and prototypes, and (ii) explore a more ac-

curate similarity measurement between probabilistic repre-

sentations and prototypes. For the ﬁrst objective (i), we con-

catenate an Probability head (Multilayer Perceptron, MLP)

to encoder to predict the probabilities of representations and

construct a distribution prototype with probabilistic repre-

sentations as observations based on Bayesian Estimation

(Vaseghi 2008). In the latent space, each prototype is rep-

resented as a distribution rather than a point, which enables

them to explore the uncertainty. For objective (ii), we lever-

age mutual likelihood score (MLS) (Shi and Jain 2019) to di-

rectly compute the similarity among probabilistic represen-

tations and distribution prototypes. MLS can naturally adjust

the weight of distance based on the uncertainty, i.e. penalize

ambiguous representations and vice versa. Taking the advan-

tage of the conﬁdence information contained in probabilis-

tic representations, model robustness to inaccurate pseudo-

labels is signiﬁcantly enhaned for stable training. In addi-

tion, we propose a soft freezing strategy to optimize prob-

ability head free from probability converging sharply to ∞

during training without constraint.

In summary, we propose to alleviate the negative effects

from inaccurate pseudo-labels by introducing probabilistic

representation with PRCL framework, which reduces the

contribution of representations with high uncertainty and

concentrates on more reliable ones in contrastive learning.

To the best of our knowledge, we are the ﬁrst to simulta-

neously train the representation and probability. Extensive

evaluation on Pascal VOC (Everingham et al. 2010) and

CityScapes (Cordts et al. 2016) to demonstrate the superior

performances than the SOTA baselines.

Related Work

Semi-supervised semantic segmentation

The goal of semantic segmentation is to classify each pixel

in an entire image by class. The training of such dense pre-

diction tasks relies on large amounts of data and tedious

manual annotations. Semi-supervised learning is a label-

efﬁcient task that needs to take advantage of a large amount

of unlabeled data to improve model performance. Entropy

minimization (Hung et al. 2018; Ke et al. 2020a) and consis-

tency regularization (Ouali, Hudelot, and Tami 2020; Peng

et al. 2020; Fan et al. 2022) are two main branches. Recently,

self-training methods beneﬁt from strong data augmenta-

tion (French et al. 2020; Olsson et al. 2021; Hu et al. 2021)

and well-reﬁned pseudo-labels (Sohn et al. 2020; Feng et al.

2022). Besides, some methods (Guan et al. 2022) balancing

the distributions of subclass are competitive in some scenar-

ios. Recent works based on self-training (Liu et al. 2021;

Wang et al. 2022; Xie et al. 2022) attempt to regularize rep-

resentations in latent space for better embedding space dis-

tribution. This improves the quality of features and leads to

better model performance, which is also our goal.

Contrastive Learning

As a major branch of metric learning, the key idea of con-

trastive learning is to pull positive pairs close and push neg-

ative pairs apart in the latent feature space through a con-

trastive loss. At the instance level, it treats each image as

a single class and distinguishes the image from others in

multiple views (Wu et al. 2018; Ye et al. 2019; Chen et al.

2020; He et al. 2020; Grill et al. 2020). To alleviate the neg-

ative impact of sampling bias, some works (Chuang et al.

2020) try to correct for the sampling of same-label data, even

without the information of true labels. Furthermore, in some

supervised or semi-supervised settings, some works (Zhao

et al. 2022) introduce class information to train models to

distinguish between classes. At the pixel level, pixel-wise

representations are distinguished by labels or pseudo-labels

(Lai et al. 2021; Wang et al. 2021). However, in the semi-

supervised setting, only a small amount of labeled data is

available. Most pixel divisions are based on pseudo-labels,

and inaccurate pseudo-labels lead to a disorder in the latent

space. To address these issues, previous methods (Liu et al.

2021; Alonso et al. 2021) try to polish pseudo-labels. In our

approach, we focus on tolerating inaccurate pseudo-labels

rather than ﬁltering them.

Probabilistic Embedding

Probabilistic Embeddings (PE) is an extension of conven-

tional embeddings. Methods of PE usually predict the over-

all distribution of the embeddings, e.g. , Gaussian (Shi and

Jain 2019) and von Mises-Fisher (Li et al. 2021), rather

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BoostingSemi-SupervisedSemanticSegmentationwithProbabilisticRepresentationsHaoyuXie1,ChangqiWang1,MingkaiZheng2,3,MinjingDong2,ShanYou3,ChongFu1,4,ChangXu21SchoolofComputerScienceandEngineering,NortheasternUniversity,Shenyang,China2SchoolofComputerScience,FacultyofEngineer,TheUniversityofSydney,Sydn...

展开>> 收起<<

Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations Haoyu Xie1 Changqi Wang1 Mingkai Zheng2 3 Minjing Dong2 Shan You3.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations Haoyu Xie1 Changqi Wang1 Mingkai Zheng2 3 Minjing Dong2 Shan You3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: