Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization Hai-Ming Xu1 Lingqiao Liu1 Qiuchen Bian2 Zhen Yang3

2025-05-03 1 0 6.29MB 18 页 10玖币

侵权投诉

Semi-supervised Semantic Segmentation with

Prototype-based Consistency Regularization

Hai-Ming Xu1, Lingqiao Liu1∗

, Qiuchen Bian2, Zhen Yang3

1Australian Institute for Machine Learning, The University of Adelaide,

2Northeastern University, 3Huawei Noah’s Ark Lab

{hai-ming.xu, lingqiao.liu}@adelaide.edu.au ,

bian.qiu@northeastern.edu ,yang.zhen@hauwei.com

Abstract

Semi-supervised semantic segmentation requires the model to effectively propagate

the label information from limited annotated images to unlabeled ones. A challenge

for such a per-pixel prediction task is the large intra-class variation, i.e., regions be-

longing to the same class may exhibit a very different appearance even in the same

picture. This diversity will make the label propagation hard from pixels to pixels.

To address this problem, we propose a novel approach to regularize the distribution

of within-class features to ease label propagation difﬁculty. Speciﬁcally, our ap-

proach encourages the consistency between the prediction from a linear predictor

and the output from a prototype-based predictor, which implicitly encourages fea-

tures from the same pseudo-class to be close to at least one within-class prototype

while staying far from the other between-class prototypes. By further incorporating

CutMix operations and a carefully-designed prototype maintenance strategy, we

create a semi-supervised semantic segmentation algorithm that demonstrates su-

perior performance over the state-of-the-art methods from extensive experimental

evaluation on both Pascal VOC and Cityscapes benchmarks2.

1 Introduction

Semantic segmentation is a fundamental task in computer vision and has been widely used in many

vision applications [

]. Despite the advances, most existing successful semantic segmentation

systems [

] are supervised, which require a large amount of annotated data, a time-

consuming and costly process. Semi-supervised semantic segmentation [

]

is a promising solution to this problem, which only requires a limited number of annotated images

and aims to learn from both labeled and unlabeled data to improve the segmentation performance.

Recent studies in semi-supervised learning approaches suggest that pseudo-labeling [

] and

consistency-based regularization [

] are two effective schemes to leverage the unlabeled

data. Those two schemes are often integrated into a teacher-student learning paradigm: the teacher

model generates pseudo labels to train a student model that takes a perturbed input [

]. In such a

scheme, and also for most pseudo-labeling-based approaches, the key to success is how to effectively

propagate labels from the limited annotated images to the unlabeled ones. A challenge for the

semi-supervised semantic segmentation task is the large intra-class variation, i.e., regions belonging

to the same class may exhibit a very different appearance even in the same picture. This diversity will

make the label propagation hard from pixels to pixels.

In this paper, we propose a novel approach to regularize the distribution of within-class features

to ease label propagation difﬁculty. Our method adopts two segmentation heads (a.k.a, predictors):

∗Corresponding author

2Code is available at https://github.com/HeimingX/semi_seg_proto.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04388v1 [cs.CV] 10 Oct 2022

a standard linear predictor and a prototype-based predictor. The former has learnable parameters

that could be updated through back-propagation, while the latter relies on a set of prototypes that

are essentially local mean vectors and are calculated through running average. Our key idea is to

encourage the consistency between the prediction from a linear predictor and the output from a

prototype-based predictor. Such a scheme implicitly regularizes the feature representation: features

from the same class must be close to at least one class prototype while staying far from the other class

prototypes. We further incorporate CutMix operation [

] to ensure such consistency is also preserved

for perturbed (mixed) input images, which enhances the robustness of the feature representation.

This gives rise to a new semi-supervised semantic segmentation algorithm that only involves one

extra consistency loss to the state-of-the-art framework and can be readily plugged into other semi-

supervised semantic segmentation methods. Despite its simplicity, it has demonstrated remarkable

improvement over the baseline approach and competitive results compared to the state-of-the-art

approaches, as discovered in our experimental study.

2 Related Work

Semi-supervised Learning

has made great progress in recent years due to its economic learning

philosophy [

]. The success of most of the semi-supervised learning researches can attribute

to the following two learning schemes: pseudo-labeling and consistency regularization. Pseudo-

labeling based methods [

] propose to train the model on unlabeled samples with pseudo

labels generated from the up-to-date optimized model. While consistency regularization based

methods [

] build upon the smoothness assumption [

] and encourage the model to

perform consistent on the same example with different perturbations. The recently proposed semi-

supervised method FixMatch [

] successfully combine these two techniques together to produce

the state-of-the-art classiﬁcation performance. Our approach draws on the successful experience of

general semi-supervised learning and applies it to the semi-supervised semantic segmentation task.

Semi-supervised Semantic Segmentation

beneﬁts from the development of general semi-supervised

learning and various kinds of semi-supervised semantic segmentation algorithms have been proposed.

For example, PseudoSeg method [

] utilizes the Grad-CAM [

] trick to calibrate the generated

pseudo-labels for semantic segmentation network training. While CPS [

] builds two parallel net-

works to generate cross pseudo labels for each each. CutMix-Seg method [

] introduces the CutMix

augmentation into semantic segmentation to construct consistency constraints on unlabeled samples.

Alternatively, CCT [

] chooses to insert perturbations into the manifold feature representation to

enforce a consistent prediction. And U

PL [

] proposes to make sufﬁcient use of unreliable pseudo

supervisions. Meanwhile, considering the class-imbalance problem of semi-supervised semantic

segmentation, several researches [

] have been published. Our approach is inspired by the

observation that large intra-class variation hinders the label information propagation from pixels

to pixels in semi-supervised semantic segmentation and we propose a prototype-based consistency

regularization method to alleviate this problem which is novel for related literature.

Prototype-based Learning

has been well studied in the machine learning area [

]. The nearest

neighbors algorithm [

] is one of the earliest works to explore the use of prototypes. Recently,

researchers have successfully used prototype-based learning to solve various problems, e.g., the

prototypical networks [

] for few-shot learning and prototype-based classiﬁer for semantic seg-

mentation [

]. Our work further introduces the prototype-based learning into the semi-supervised

problem and proves its effectiveness.

3 Our Approach

In this section, we ﬁrst give an overview of our approach and then introduce the core concept of

prototype-based consistency regularization for semi-supervised semantic segmentation. Finally, we

introduce how the prototype is constructed and maintained throughout the learning process.

3.1 Preliminary

Problem setting:

Given a set of labeled training images

Dl={(Il

i, Y l

i)}Nl

i=1

and a set of unlabeled

images

Du={Iu

i}Nu

i=1

, where

NuNl

, semi-supervised semantic segmentation aims to learn a

strongly-augmented

weakly-augmented

CutMix

student

segmentation

network

teacher

segmentation

network

Linear predictor

Prototype-based

predictor

Linear predictor CutMix

plinear

<latexit sha1_base64="+PzAYLFhcCk74v43krDT/8goIas=">AAACz3icjVHLSsQwFD1T3+9RV+KmOAhuHFpdqBsZcOPSAWcUfJHWzFjsIySpMgwjbv0Bt/op/oX4B/oX3mQq+EA0pe3Jueec5CaBiCOlPe+l5AwMDg2PjI6NT0xOTc+UZ+eaKstlyBthFmfyMGCKx1HKGzrSMT8UkrMkiPlBcLlj6gdXXKooS/d1R/CThLXTqBWFTBN1LM7Uadd4meydlSte1bPD/Qn8AlRqC5tPQ9urwV5WfsYxzpEhRI4EHCk04RgMip4j+PAgiDtBlzhJKLJ1jh7GyZuTipOCEXtJ3zbNjgo2pbnJVNYd0ioxvZKcLpbJk5FOEjarubae22TD/pbdtZlmbx36B0VWQqzGBbF/+T6U//WZXjRa2LQ9RNSTsIzpLixScnsqZufup640JQjiDD6nuiQcWufHObvWo2zv5myZrb9apWHNPCy0Od7MLumC/e/X+RM016r+enWt7ldqW+iPUSxiCSt0nxuoYRd7aFC2wD0e8OjUnWvnxrntS51S4ZnHl+HcvQPK+Zah</latexit>

plinear

lce(plinear

s,ˆ

<latexit sha1_base64="sQTbb+rBOO2hi9KPnbbwrwwn/DA=">AAAC4nicjVHJSsRAEH3GfR/1JCIEB0FBhkQP6m3Ai8cRHBdchk5PzxjMRqcjSBgQvHkTr/6AV/0Y8Q/0L6zuieCCaIUkr1/Ve93V5SWBnyrHeemxevv6BwaHhkdGx8YnJktT03tpnEku6jwOYnngsVQEfiTqyleBOEikYKEXiH3vfEvn9y+ETP042lWXiTgJWTvyWz5niqhGaT5o5Fx0lpJGepprEyY7K/bxGVP5YWe5USo7FceE/RO4BShXZ6+goxaXnnGMJmJwZAghEEERDsCQ0nMEFw4S4k6QEycJ+SYv0MEIaTOqElTBiD2nb5tWRwUb0Vp7pkbNaZeAXklKG4ukialOEta72SafGWfN/uadG099tkv6e4VXSKzCGbF/6T4q/6vTvSi0sGF68KmnxDC6O164ZOZW9MntT10pckiI07hJeUmYG+XHPdtGk5re9d0yk381lZrVa17UZnjTp6QBu9/H+RPsrVbctcrqjluubqIbQ5jDApZonuuoYhs11Mn7Gg94xJPVtG6sW+uuW2r1FJoZfAnr/h3+JZxE</latexit>

lce(plinear

s,ˆ

pprototype

<latexit sha1_base64="vHql9n+jHNd+TkB4xIwFezU/EfE=">AAAC1HicjVHLSsQwFD3W97vqStwUB8GNQ6sLdSMDblwqOKOg49BmohbbpiSpIKMrcesPuNX/8C/EP9C/8CZ2QB1EU9qenHvOTe69UZ7ESvv+a5/TPzA4NDwyOjY+MTk17c7MNpQoJON1JhIhD6NQ8STOeF3HOuGHueRhGiX8ILrYNvGDSy5VLLJ9fZXzZhqeZfFpzEJNVMudzlvqpJNLoYUJ37Tcil/17fJ6QVCCSm1+43lwayXaFe4LjtGGAEOBFBwZNOEEIRQ9RwjgIyeuiQ5xklBs4xw3GCNvQSpOipDYC/qe0e6oZDPam5zKuhmdktAryelhiTyCdJKwOc2z8cJmNuxvuTs2p7nbFf2jMldKrMY5sX/5usr/+kwtGqfYsDXEVFNuGVMdK7MUtivm5t6XqjRlyIkzuE1xSZhZZ7fPnvUoW7vpbWjjb1ZpWLNnpbbAu7klDTj4Oc5e0FitBmvV1b2gUtvE5xrBAhaxTPNcRw072EXdzvwBj3hyGs61c+vcfUqdvtIzh2/Luf8AZNKYaw==</latexit>

pprototype

lce(pprototype

s,ˆ

<latexit sha1_base64="zv0GhwAKXa1l5xD1VuKyor66ES4=">AAAC5XicjVHLSsNAFD2N73fVlbgJFkFBSlIX6q7gxmUFqxUfJRmnbTDNhMlEkFBw5c6duPUH3OqviH+gf+GdMQUfiE5Icubcc87MnfHjMEiU47wUrIHBoeGR0bHxicmp6Zni7Nx+IlLJeJ2JUMiG7yU8DCJeV4EKeSOW3Ov6IT/wz7d1/eCCyyQQ0Z66jPlJ12tHQStgniKqWbTDZsZ4byVuJqdZLIUSWtVbs487nsoOe6vNYskpO2bYP4Gbg1J14Qp61ETxGcc4gwBDii44IijCITwk9BzBhYOYuBNkxElCgalz9DBO3pRUnBQesef0bdPsKGcjmuvMxLgZrRLSK8lpY5k8gnSSsF7NNvXUJGv2t+zMZOq9XdLfz7O6xCp0iP3L11f+16d7UWhh0/QQUE+xYXR3LE9JzanondufulKUEBOn8RnVJWFmnP1zto0nMb3rs/VM/dUoNavnLNemeNO7pAt2v1/nT7BfKbvr5cquW6pu4WOMYhFLWKH73EAVO6ihTtnXeMAjnqy2dWPdWncfUquQe+bxZVj37x1Pnd0=</latexit>

lce(pprototype

s,ˆ

plinear

<latexit sha1_base64="hkFVXn9trJCjyu88hKoXqzFK3LU=">AAACz3icjVHLSsQwFD1T3+9RV+KmOAhuHFpdqBsZcOPSAWcUfNHGzBjsI6SpMgwjbv0Bt/op/oX4B/oX3mQq+EA0pe3Jueec5CahjESmPe+l5AwMDg2PjI6NT0xOTc+UZ+eaWZorxhssjVJ1GAYZj0TCG1roiB9KxYM4jPhBeLlj6gdXXGUiTfZ1R/KTOGgnoiVYoIk6lmf6tGu8geqdlSte1bPD/Qn8AlRqC5tPQ9ur4V5afsYxzpGCIUcMjgSacIQAGT1H8OFBEneCLnGKkLB1jh7GyZuTipMiIPaSvm2aHRVsQnOTmVk3o1UiehU5XSyTJyWdImxWc209t8mG/S27azPN3jr0D4usmFiNC2L/8n0o/+szvWi0sGl7ENSTtIzpjhUpuT0Vs3P3U1eaEiRxBp9TXRFm1vlxzq71ZLZ3c7aBrb9apWHNnBXaHG9ml3TB/vfr/Amaa1V/vbpW9yu1LfTHKBaxhBW6zw3UsIs9NChb4h4PeHTqzrVz49z2pU6p8Mzjy3Du3gHNYpai</latexit>

plinear

<latexit sha1_base64="lQC4QzCPFGdhh6OYZgwQBeu9bYM=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVRJdqCsLbly4qGAf0hZJptM2NE3CZCKU0J0/4FZ/xr8Q/0D/wjvTFNQiOiHJmXPPuTP3XjfyvVha1lvOWFhcWl7JrxbW1jc2t4rbO/U4TATjNRb6oWi6Tsx9L+A16UmfNyPBnZHr84Y7vFDxxj0XsRcGN3Ic8c7I6Qdez2OOJKrRHjgyvZ3cFUtW2dLLnAd2BkrnL9CrGhZf0UYXIRgSjMARQBL24SCmpwUbFiLiOkiJE4Q8HeeYoEDehFScFA6xQ/r2adfK2ID2Kmes3YxO8ekV5DRxQJ6QdIKwOs3U8URnVuxvuVOdU91tTH83yzUiVmJA7F++mfK/PlWLRA+nugaPaoo0o6pjWZZEd0Xd3PxSlaQMEXEKdykuCDPtnPXZ1J5Y16566+j4u1YqVu1Zpk3woW5JA7Z/jnMe1I/K9nH56NouVc6mk0Yee9jHIc3zBBVcooqarvIRT3g2rgxhjI10KjVymWcX35bx8AlYEZMi</latexit>

<latexit sha1_base64="9ttsIPPKTq7x0WRVba6+Oo8FE6A=">AAACzHicjVHLSsNAFD2Nr1pfVZdugkVwVRJdqCsLblxJBfuiLZKk0zp08iCZCKV06w+41X/xL8Q/0L/wzjQFtYhOSHLm3HvOzL3XjQRPpGW95YyFxaXllfxqYW19Y3OruL1TT8I09ljNC0UYN10nYYIHrCa5FKwZxczxXcEa7vBCxRv3LE54GNzIUcS6vjMIeJ97jiSq1ZFc9Ni4NbktlqyypZc5D+wMlM5foFc1LL6igx5CeEjhgyGAJCzgIKGnDRsWIuK6GBMXE+I6zjBBgbQpZTHKcIgd0ndAu3bGBrRXnolWe3SKoDcmpYkD0oSUFxNWp5k6nmpnxf7mPdae6m4j+ruZl0+sxB2xf+lmmf/VqVok+jjVNXCqKdKMqs7LXFLdFXVz80tVkhwi4hTuUTwm7GnlrM+m1iS6dtVbR8ffdaZi1d7LclN8qFvSgO2f45wH9aOyfVw+urZLlbPppJHHHvZxSPM8QQWXqKJG3j4e8YRn48qQxtiYTFONXKbZxbdlPHwCo/aUCw==</latexit>

EMA update

Figure 1: Overview of our method. Our method is build upon the popular student-teacher frameworks

with CutMix operations. In addition to the existing modules in such a framework, we further introduce

a prototype-based predictor for the student model. The output

pprototype

of prototype-based predictor

will be supervised with the pseudo-label generated from the linear predictor of teacher model. Such

kind of consistency regularization will encourage the features from the same class to be closer than

the features of other classes and ease the difﬁculty of propagating label information from pixels to

pixels. This simple modiﬁcation brings a signiﬁcant improvement.

segmentation model from both the labeled and unlabeled images. We use

denote the segmentation

output and ˜

Y[a, b]indicates the output at the (a, b)coordinate.

Overview:

the overall structure of the proposed method is shown in Figure1, our approach is built

on top of the popular student-teacher framework for semi-supervised learning [

During the training procedure, the teacher model prediction will be selectively used as pseudo-labels

for supervising the student model. In other words, the back-propagation is performed on the student

model only. More speciﬁcally, the parameters of the teacher network are the exponential moving

average of the student network parameters [

]. Following the common practice [

], we also adopt

the weak-strong augmentation paradigm by feeding the teacher model weakly-augmented images and

the student strongly-augmented images. In the context of image segmentation, we take the normal

data augmentation (i.e., random crop and random horizontal ﬂip of the input image) as the weak

augmentation and CutMix [44] as the strong data augmentation.

The key difference between our method and existing methods [

] is the use of

both a linear predictor (in both teacher and student models) and a prototype-based predictor (in

the student model only). As will be explained in the following section, the prediction from the

teacher model’s linear predictor will be used to create pseudo labels to supervise the training of the

prototype-based predictor of student model. This process acts as a regularization that could beneﬁt

the label information propagation.

3.2 Prototype-based Predictor for Semantic Segmentation

Prototype-based classiﬁer is a long-standing technique in machine learning [

]. From its early

form of the nearest neighbour classiﬁer or the nearest mean classiﬁer to prototypical networks in the

few-shot learning literature [

], its idea of using prototypes instead of a parameterized classiﬁer has

been widely adopted in many ﬁelds. Very recently, prototype-based variety has been introduced into

the semantic segmentation task [

] and has been proved to be effective under a fully-supervised

setting. Formally, prototype-based classiﬁer/predictors make the prediction by comparing test samples

with a set of prototypes. The prototype can be a sample feature or the average of a set of sample

features of the same class. Without loss of generality, we denote the prototype set as

P={(pi, yi)}

with

indicate the prototype and

is its associated class. Note that the number of prototypes could

be larger than the number of classes. In other words, one class can have multiple prototypes for

modelling its diversity. More formally, with the prototype set, the classiﬁcation decision can be made

by using

˜y=yks.t. k = arg max

sim(x, pi),(1)

where

sim(·,·)

represents the similarity metric function, e.g., cosine distance.

˜y

means the class

assignment for the test data

. The posterior probability of assigning a sample to the

-th class can

also be estimated in prototype-based classiﬁer via:

pprototype(y=c|x) =

expmaxi|yi=csim(pi, x)/T 

t=1 expmaxj|yj=tsim(pj, x)/T ,(2)

where

is the temperature parameter and can be empirically set. Note that Eq. 2 essentially uses the

maximal similarity between a sample and prototypes of a class as the similarity between a sample

and a class.

3.3 Consistency Between Linear Predictor and Prototype-based Predictor

Although both prototype-based classiﬁers and linear classiﬁers can be used for semantic segmenta-

tion [

], they have quite different characteristics due to the nature of their decision-making process.

Speciﬁcally, linear classiﬁers could allocate learnable parameters

for each class, while prototype-

based classiﬁers solely rely on a good feature representation such that samples from the same class

will be close to at least one within-class prototypes while stay far from prototypes from other classes.

Consequently, linear classiﬁers could leverage the learnable parameter to focus more on discrimina-

tive dimensions of a feature representation while suppressing irrelevant feature dimensions, i.e., by

assigning a higher or lower weight to different dimensions. In contrast, prototype-based classiﬁers

cannot leverage that and tend to require more discriminative feature representations.

The different characteristics of prototype-based and linear classiﬁers motivate us to design a loss to

encourage the consistency of their predictions on unlabeled data to regularize the feature representa-

tion. Our key insight is that a good feature should support either type of classiﬁer to make correct

predictions. In addition to using two different types of classiﬁers, we also incorporate the CutMix [

]

strategy to enhance the above consistency regularization. CutMix augmentation is a popular ingredi-

ent in many state-of-the-art semi-supervised semantic segmentation methods [

]. Specially,

we ﬁrst perform weak augmentation, e.g., random ﬂip and crop operations, to the input images of the

teacher model and obtain the pseudo-labels from the linear classiﬁer. Next, we perform the CutMix

operation by mixing two unlabeled images

mix(Ii, Ij)

and their associated prediction

mix(˜

Yi,˜

Yj)

The mixed image

mix(Ii, Ij)

is fed to the student model and the output from the prototype-based

classiﬁer is then enforced to ﬁt the pseudo-labels generated from mix(˜

Yi,˜

Yj).

Algorithm details:

As a semi-supervised segmentation algorithm, we apply different loss functions

for labeled images and unlabeled images.

For a batch of labeled images

{(Il

i, Y l

i)}Bl

i=1 ∈Dl

, we train both the linear predictor and the

prototype-based predictor. The linear classiﬁer

{wi}C

i=1

can produce a posterior probability estima-

tion plinear

s(Y[a, b] = c|Il

plinear

s(Y[a, b] = c|Il

i) = exp(wT

c· Fl

i[a, b])

j=1 exp(wT

j· Fl

i[a, b]),(3)

where

i[a, b] = f(A0(Il

i))

means the feature extracted at location

(a, b)

by ﬁrst performing weak

data augmentation

and then feed it to the feature extractor

. Meanwhile, the posterior

probability of prototype-based predictor

pprototype

s(Y[a, b] = c|Il

can also be estimated via Eq.

2. We use cosine similarity for

sim(·,·)

and empirically set the temperature hyperparameter

0.1

. Based on the ground truth label

, the student model will be optimized by the gradient

back-propagated from the two predictors simultaneously

Ll=Llinear

l+Lprototype

l,where (4)

Llinear

l=1

lceplinear

s(Y|Il

i), Y l

i;(5)

Lprototype

l=1

lcepprototype

s(Y|Il

i), Y l

i.(6)

3Learnable parameters in the context means parameters that can be updated via back-propagation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Semi-supervisedSemanticSegmentationwithPrototype-basedConsistencyRegularizationHai-MingXu1,LingqiaoLiu1,QiuchenBian2,ZhenYang31AustralianInstituteforMachineLearning,TheUniversityofAdelaide,2NortheasternUniversity,3HuaweiNoah'sArkLab{hai-ming.xu,lingqiao.liu}@adelaide.edu.au,bian.qiu@northeastern.ed...

展开>> 收起<<

Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization Hai-Ming Xu1 Lingqiao Liu1 Qiuchen Bian2 Zhen Yang3.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization Hai-Ming Xu1 Lingqiao Liu1 Qiuchen Bian2 Zhen Yang3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: