Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization Hai-Ming Xu1 Lingqiao Liu1 Qiuchen Bian2 Zhen Yang3

2025-05-03 0 0 6.29MB 18 页 10玖币
侵权投诉
Semi-supervised Semantic Segmentation with
Prototype-based Consistency Regularization
Hai-Ming Xu1, Lingqiao Liu1
, Qiuchen Bian2, Zhen Yang3
1Australian Institute for Machine Learning, The University of Adelaide,
2Northeastern University, 3Huawei Noah’s Ark Lab
{hai-ming.xu, lingqiao.liu}@adelaide.edu.au ,
bian.qiu@northeastern.edu ,yang.zhen@hauwei.com
Abstract
Semi-supervised semantic segmentation requires the model to effectively propagate
the label information from limited annotated images to unlabeled ones. A challenge
for such a per-pixel prediction task is the large intra-class variation, i.e., regions be-
longing to the same class may exhibit a very different appearance even in the same
picture. This diversity will make the label propagation hard from pixels to pixels.
To address this problem, we propose a novel approach to regularize the distribution
of within-class features to ease label propagation difficulty. Specifically, our ap-
proach encourages the consistency between the prediction from a linear predictor
and the output from a prototype-based predictor, which implicitly encourages fea-
tures from the same pseudo-class to be close to at least one within-class prototype
while staying far from the other between-class prototypes. By further incorporating
CutMix operations and a carefully-designed prototype maintenance strategy, we
create a semi-supervised semantic segmentation algorithm that demonstrates su-
perior performance over the state-of-the-art methods from extensive experimental
evaluation on both Pascal VOC and Cityscapes benchmarks2.
1 Introduction
Semantic segmentation is a fundamental task in computer vision and has been widely used in many
vision applications [
34
,
2
,
31
]. Despite the advances, most existing successful semantic segmentation
systems [
27
,
6
,
9
,
48
] are supervised, which require a large amount of annotated data, a time-
consuming and costly process. Semi-supervised semantic segmentation [
51
,
46
,
32
,
21
,
8
,
47
,
20
,
40
]
is a promising solution to this problem, which only requires a limited number of annotated images
and aims to learn from both labeled and unlabeled data to improve the segmentation performance.
Recent studies in semi-supervised learning approaches suggest that pseudo-labeling [
25
,
1
,
45
] and
consistency-based regularization [
24
,
3
,
42
] are two effective schemes to leverage the unlabeled
data. Those two schemes are often integrated into a teacher-student learning paradigm: the teacher
model generates pseudo labels to train a student model that takes a perturbed input [
36
]. In such a
scheme, and also for most pseudo-labeling-based approaches, the key to success is how to effectively
propagate labels from the limited annotated images to the unlabeled ones. A challenge for the
semi-supervised semantic segmentation task is the large intra-class variation, i.e., regions belonging
to the same class may exhibit a very different appearance even in the same picture. This diversity will
make the label propagation hard from pixels to pixels.
In this paper, we propose a novel approach to regularize the distribution of within-class features
to ease label propagation difficulty. Our method adopts two segmentation heads (a.k.a, predictors):
Corresponding author
2Code is available at https://github.com/HeimingX/semi_seg_proto.
36th Conference on Neural Information Processing Systems (NeurIPS 2022).
arXiv:2210.04388v1 [cs.CV] 10 Oct 2022
a standard linear predictor and a prototype-based predictor. The former has learnable parameters
that could be updated through back-propagation, while the latter relies on a set of prototypes that
are essentially local mean vectors and are calculated through running average. Our key idea is to
encourage the consistency between the prediction from a linear predictor and the output from a
prototype-based predictor. Such a scheme implicitly regularizes the feature representation: features
from the same class must be close to at least one class prototype while staying far from the other class
prototypes. We further incorporate CutMix operation [
44
] to ensure such consistency is also preserved
for perturbed (mixed) input images, which enhances the robustness of the feature representation.
This gives rise to a new semi-supervised semantic segmentation algorithm that only involves one
extra consistency loss to the state-of-the-art framework and can be readily plugged into other semi-
supervised semantic segmentation methods. Despite its simplicity, it has demonstrated remarkable
improvement over the baseline approach and competitive results compared to the state-of-the-art
approaches, as discovered in our experimental study.
2 Related Work
Semi-supervised Learning
has made great progress in recent years due to its economic learning
philosophy [
50
]. The success of most of the semi-supervised learning researches can attribute
to the following two learning schemes: pseudo-labeling and consistency regularization. Pseudo-
labeling based methods [
25
,
5
,
1
,
45
] propose to train the model on unlabeled samples with pseudo
labels generated from the up-to-date optimized model. While consistency regularization based
methods [
24
,
37
,
39
,
3
,
42
] build upon the smoothness assumption [
28
] and encourage the model to
perform consistent on the same example with different perturbations. The recently proposed semi-
supervised method FixMatch [
36
] successfully combine these two techniques together to produce
the state-of-the-art classification performance. Our approach draws on the successful experience of
general semi-supervised learning and applies it to the semi-supervised semantic segmentation task.
Semi-supervised Semantic Segmentation
benefits from the development of general semi-supervised
learning and various kinds of semi-supervised semantic segmentation algorithms have been proposed.
For example, PseudoSeg method [
51
] utilizes the Grad-CAM [
33
] trick to calibrate the generated
pseudo-labels for semantic segmentation network training. While CPS [
8
] builds two parallel net-
works to generate cross pseudo labels for each each. CutMix-Seg method [
14
] introduces the CutMix
augmentation into semantic segmentation to construct consistency constraints on unlabeled samples.
Alternatively, CCT [
32
] chooses to insert perturbations into the manifold feature representation to
enforce a consistent prediction. And U
2
PL [
40
] proposes to make sufficient use of unreliable pseudo
supervisions. Meanwhile, considering the class-imbalance problem of semi-supervised semantic
segmentation, several researches [
20
,
19
,
15
] have been published. Our approach is inspired by the
observation that large intra-class variation hinders the label information propagation from pixels
to pixels in semi-supervised semantic segmentation and we propose a prototype-based consistency
regularization method to alleviate this problem which is novel for related literature.
Prototype-based Learning
has been well studied in the machine learning area [
17
]. The nearest
neighbors algorithm [
11
] is one of the earliest works to explore the use of prototypes. Recently,
researchers have successfully used prototype-based learning to solve various problems, e.g., the
prototypical networks [
35
] for few-shot learning and prototype-based classifier for semantic seg-
mentation [
48
]. Our work further introduces the prototype-based learning into the semi-supervised
problem and proves its effectiveness.
3 Our Approach
In this section, we first give an overview of our approach and then introduce the core concept of
prototype-based consistency regularization for semi-supervised semantic segmentation. Finally, we
introduce how the prototype is constructed and maintained throughout the learning process.
3.1 Preliminary
Problem setting:
Given a set of labeled training images
Dl={(Il
i, Y l
i)}Nl
i=1
and a set of unlabeled
images
Du={Iu
i}Nu
i=1
, where
NuNl
, semi-supervised semantic segmentation aims to learn a
2
strongly-augmented
weakly-augmented
CutMix
student
segmentation
network
teacher
segmentation
network
Linear predictor
Prototype-based
predictor
Linear predictor CutMix
plinear
s
<latexit sha1_base64="+PzAYLFhcCk74v43krDT/8goIas=">AAACz3icjVHLSsQwFD1T3+9RV+KmOAhuHFpdqBsZcOPSAWcUfJHWzFjsIySpMgwjbv0Bt/op/oX4B/oX3mQq+EA0pe3Jueec5CaBiCOlPe+l5AwMDg2PjI6NT0xOTc+UZ+eaKstlyBthFmfyMGCKx1HKGzrSMT8UkrMkiPlBcLlj6gdXXKooS/d1R/CThLXTqBWFTBN1LM7Uadd4meydlSte1bPD/Qn8AlRqC5tPQ9urwV5WfsYxzpEhRI4EHCk04RgMip4j+PAgiDtBlzhJKLJ1jh7GyZuTipOCEXtJ3zbNjgo2pbnJVNYd0ioxvZKcLpbJk5FOEjarubae22TD/pbdtZlmbx36B0VWQqzGBbF/+T6U//WZXjRa2LQ9RNSTsIzpLixScnsqZufup640JQjiDD6nuiQcWufHObvWo2zv5myZrb9apWHNPCy0Od7MLumC/e/X+RM016r+enWt7ldqW+iPUSxiCSt0nxuoYRd7aFC2wD0e8OjUnWvnxrntS51S4ZnHl+HcvQPK+Zah</latexit>
plinear
s
<latexit sha1_base64="+PzAYLFhcCk74v43krDT/8goIas=">AAACz3icjVHLSsQwFD1T3+9RV+KmOAhuHFpdqBsZcOPSAWcUfJHWzFjsIySpMgwjbv0Bt/op/oX4B/oX3mQq+EA0pe3Jueec5CaBiCOlPe+l5AwMDg2PjI6NT0xOTc+UZ+eaKstlyBthFmfyMGCKx1HKGzrSMT8UkrMkiPlBcLlj6gdXXKooS/d1R/CThLXTqBWFTBN1LM7Uadd4meydlSte1bPD/Qn8AlRqC5tPQ9urwV5WfsYxzpEhRI4EHCk04RgMip4j+PAgiDtBlzhJKLJ1jh7GyZuTipOCEXtJ3zbNjgo2pbnJVNYd0ioxvZKcLpbJk5FOEjarubae22TD/pbdtZlmbx36B0VWQqzGBbF/+T6U//WZXjRa2LQ9RNSTsIzpLixScnsqZufup640JQjiDD6nuiQcWufHObvWo2zv5myZrb9apWHNPCy0Od7MLumC/e/X+RM016r+enWt7ldqW+iPUSxiCSt0nxuoYRd7aFC2wD0e8OjUnWvnxrntS51S4ZnHl+HcvQPK+Zah</latexit>
lce(plinear
s,ˆ
Y)
<latexit sha1_base64="sQTbb+rBOO2hi9KPnbbwrwwn/DA=">AAAC4nicjVHJSsRAEH3GfR/1JCIEB0FBhkQP6m3Ai8cRHBdchk5PzxjMRqcjSBgQvHkTr/6AV/0Y8Q/0L6zuieCCaIUkr1/Ve93V5SWBnyrHeemxevv6BwaHhkdGx8YnJktT03tpnEku6jwOYnngsVQEfiTqyleBOEikYKEXiH3vfEvn9y+ETP042lWXiTgJWTvyWz5niqhGaT5o5Fx0lpJGepprEyY7K/bxGVP5YWe5USo7FceE/RO4BShXZ6+goxaXnnGMJmJwZAghEEERDsCQ0nMEFw4S4k6QEycJ+SYv0MEIaTOqElTBiD2nb5tWRwUb0Vp7pkbNaZeAXklKG4ukialOEta72SafGWfN/uadG099tkv6e4VXSKzCGbF/6T4q/6vTvSi0sGF68KmnxDC6O164ZOZW9MntT10pckiI07hJeUmYG+XHPdtGk5re9d0yk381lZrVa17UZnjTp6QBu9/H+RPsrVbctcrqjluubqIbQ5jDApZonuuoYhs11Mn7Gg94xJPVtG6sW+uuW2r1FJoZfAnr/h3+JZxE</latexit>
lce(plinear
s,ˆ
Y)
<latexit sha1_base64="sQTbb+rBOO2hi9KPnbbwrwwn/DA=">AAAC4nicjVHJSsRAEH3GfR/1JCIEB0FBhkQP6m3Ai8cRHBdchk5PzxjMRqcjSBgQvHkTr/6AV/0Y8Q/0L6zuieCCaIUkr1/Ve93V5SWBnyrHeemxevv6BwaHhkdGx8YnJktT03tpnEku6jwOYnngsVQEfiTqyleBOEikYKEXiH3vfEvn9y+ETP042lWXiTgJWTvyWz5niqhGaT5o5Fx0lpJGepprEyY7K/bxGVP5YWe5USo7FceE/RO4BShXZ6+goxaXnnGMJmJwZAghEEERDsCQ0nMEFw4S4k6QEycJ+SYv0MEIaTOqElTBiD2nb5tWRwUb0Vp7pkbNaZeAXklKG4ukialOEta72SafGWfN/uadG099tkv6e4VXSKzCGbF/6T4q/6vTvSi0sGF68KmnxDC6O164ZOZW9MntT10pckiI07hJeUmYG+XHPdtGk5re9d0yk381lZrVa17UZnjTp6QBu9/H+RPsrVbctcrqjluubqIbQ5jDApZonuuoYhs11Mn7Gg94xJPVtG6sW+uuW2r1FJoZfAnr/h3+JZxE</latexit>
pprototype
s
<latexit sha1_base64="vHql9n+jHNd+TkB4xIwFezU/EfE=">AAAC1HicjVHLSsQwFD3W97vqStwUB8GNQ6sLdSMDblwqOKOg49BmohbbpiSpIKMrcesPuNX/8C/EP9C/8CZ2QB1EU9qenHvOTe69UZ7ESvv+a5/TPzA4NDwyOjY+MTk17c7MNpQoJON1JhIhD6NQ8STOeF3HOuGHueRhGiX8ILrYNvGDSy5VLLJ9fZXzZhqeZfFpzEJNVMudzlvqpJNLoYUJ37Tcil/17fJ6QVCCSm1+43lwayXaFe4LjtGGAEOBFBwZNOEEIRQ9RwjgIyeuiQ5xklBs4xw3GCNvQSpOipDYC/qe0e6oZDPam5zKuhmdktAryelhiTyCdJKwOc2z8cJmNuxvuTs2p7nbFf2jMldKrMY5sX/5usr/+kwtGqfYsDXEVFNuGVMdK7MUtivm5t6XqjRlyIkzuE1xSZhZZ7fPnvUoW7vpbWjjb1ZpWLNnpbbAu7klDTj4Oc5e0FitBmvV1b2gUtvE5xrBAhaxTPNcRw072EXdzvwBj3hyGs61c+vcfUqdvtIzh2/Luf8AZNKYaw==</latexit>
pprototype
s
<latexit sha1_base64="vHql9n+jHNd+TkB4xIwFezU/EfE=">AAAC1HicjVHLSsQwFD3W97vqStwUB8GNQ6sLdSMDblwqOKOg49BmohbbpiSpIKMrcesPuNX/8C/EP9C/8CZ2QB1EU9qenHvOTe69UZ7ESvv+a5/TPzA4NDwyOjY+MTk17c7MNpQoJON1JhIhD6NQ8STOeF3HOuGHueRhGiX8ILrYNvGDSy5VLLJ9fZXzZhqeZfFpzEJNVMudzlvqpJNLoYUJ37Tcil/17fJ6QVCCSm1+43lwayXaFe4LjtGGAEOBFBwZNOEEIRQ9RwjgIyeuiQ5xklBs4xw3GCNvQSpOipDYC/qe0e6oZDPam5zKuhmdktAryelhiTyCdJKwOc2z8cJmNuxvuTs2p7nbFf2jMldKrMY5sX/5usr/+kwtGqfYsDXEVFNuGVMdK7MUtivm5t6XqjRlyIkzuE1xSZhZZ7fPnvUoW7vpbWjjb1ZpWLNnpbbAu7klDTj4Oc5e0FitBmvV1b2gUtvE5xrBAhaxTPNcRw072EXdzvwBj3hyGs61c+vcfUqdvtIzh2/Luf8AZNKYaw==</latexit>
lce(pprototype
s,ˆ
Y)
<latexit sha1_base64="zv0GhwAKXa1l5xD1VuKyor66ES4=">AAAC5XicjVHLSsNAFD2N73fVlbgJFkFBSlIX6q7gxmUFqxUfJRmnbTDNhMlEkFBw5c6duPUH3OqviH+gf+GdMQUfiE5Icubcc87MnfHjMEiU47wUrIHBoeGR0bHxicmp6Zni7Nx+IlLJeJ2JUMiG7yU8DCJeV4EKeSOW3Ov6IT/wz7d1/eCCyyQQ0Z66jPlJ12tHQStgniKqWbTDZsZ4byVuJqdZLIUSWtVbs487nsoOe6vNYskpO2bYP4Gbg1J14Qp61ETxGcc4gwBDii44IijCITwk9BzBhYOYuBNkxElCgalz9DBO3pRUnBQesef0bdPsKGcjmuvMxLgZrRLSK8lpY5k8gnSSsF7NNvXUJGv2t+zMZOq9XdLfz7O6xCp0iP3L11f+16d7UWhh0/QQUE+xYXR3LE9JzanondufulKUEBOn8RnVJWFmnP1zto0nMb3rs/VM/dUoNavnLNemeNO7pAt2v1/nT7BfKbvr5cquW6pu4WOMYhFLWKH73EAVO6ihTtnXeMAjnqy2dWPdWncfUquQe+bxZVj37x1Pnd0=</latexit>
lce(pprototype
s,ˆ
Y)
<latexit sha1_base64="zv0GhwAKXa1l5xD1VuKyor66ES4=">AAAC5XicjVHLSsNAFD2N73fVlbgJFkFBSlIX6q7gxmUFqxUfJRmnbTDNhMlEkFBw5c6duPUH3OqviH+gf+GdMQUfiE5Icubcc87MnfHjMEiU47wUrIHBoeGR0bHxicmp6Zni7Nx+IlLJeJ2JUMiG7yU8DCJeV4EKeSOW3Ov6IT/wz7d1/eCCyyQQ0Z66jPlJ12tHQStgniKqWbTDZsZ4byVuJqdZLIUSWtVbs487nsoOe6vNYskpO2bYP4Gbg1J14Qp61ETxGcc4gwBDii44IijCITwk9BzBhYOYuBNkxElCgalz9DBO3pRUnBQesef0bdPsKGcjmuvMxLgZrRLSK8lpY5k8gnSSsF7NNvXUJGv2t+zMZOq9XdLfz7O6xCp0iP3L11f+16d7UWhh0/QQUE+xYXR3LE9JzanondufulKUEBOn8RnVJWFmnP1zto0nMb3rs/VM/dUoNavnLNemeNO7pAt2v1/nT7BfKbvr5cquW6pu4WOMYhFLWKH73EAVO6ihTtnXeMAjnqy2dWPdWncfUquQe+bxZVj37x1Pnd0=</latexit>
plinear
t
<latexit sha1_base64="hkFVXn9trJCjyu88hKoXqzFK3LU=">AAACz3icjVHLSsQwFD1T3+9RV+KmOAhuHFpdqBsZcOPSAWcUfNHGzBjsI6SpMgwjbv0Bt/op/oX4B/oX3mQq+EA0pe3Jueec5CahjESmPe+l5AwMDg2PjI6NT0xOTc+UZ+eaWZorxhssjVJ1GAYZj0TCG1roiB9KxYM4jPhBeLlj6gdXXGUiTfZ1R/KTOGgnoiVYoIk6lmf6tGu8geqdlSte1bPD/Qn8AlRqC5tPQ9ur4V5afsYxzpGCIUcMjgSacIQAGT1H8OFBEneCLnGKkLB1jh7GyZuTipMiIPaSvm2aHRVsQnOTmVk3o1UiehU5XSyTJyWdImxWc209t8mG/S27azPN3jr0D4usmFiNC2L/8n0o/+szvWi0sGl7ENSTtIzpjhUpuT0Vs3P3U1eaEiRxBp9TXRFm1vlxzq71ZLZ3c7aBrb9apWHNnBXaHG9ml3TB/vfr/Amaa1V/vbpW9yu1LfTHKBaxhBW6zw3UsIs9NChb4h4PeHTqzrVz49z2pU6p8Mzjy3Du3gHNYpai</latexit>
plinear
t
<latexit sha1_base64="hkFVXn9trJCjyu88hKoXqzFK3LU=">AAACz3icjVHLSsQwFD1T3+9RV+KmOAhuHFpdqBsZcOPSAWcUfNHGzBjsI6SpMgwjbv0Bt/op/oX4B/oX3mQq+EA0pe3Jueec5CahjESmPe+l5AwMDg2PjI6NT0xOTc+UZ+eaWZorxhssjVJ1GAYZj0TCG1roiB9KxYM4jPhBeLlj6gdXXGUiTfZ1R/KTOGgnoiVYoIk6lmf6tGu8geqdlSte1bPD/Qn8AlRqC5tPQ9ur4V5afsYxzpGCIUcMjgSacIQAGT1H8OFBEneCLnGKkLB1jh7GyZuTipMiIPaSvm2aHRVsQnOTmVk3o1UiehU5XSyTJyWdImxWc209t8mG/S27azPN3jr0D4usmFiNC2L/8n0o/+szvWi0sGl7ENSTtIzpjhUpuT0Vs3P3U1eaEiRxBp9TXRFm1vlxzq71ZLZ3c7aBrb9apWHNnBXaHG9ml3TB/vfr/Amaa1V/vbpW9yu1LfTHKBaxhBW6zw3UsIs9NChb4h4PeHTqzrVz49z2pU6p8Mzjy3Du3gHNYpai</latexit>
ˆ
Y
<latexit sha1_base64="lQC4QzCPFGdhh6OYZgwQBeu9bYM=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVRJdqCsLbly4qGAf0hZJptM2NE3CZCKU0J0/4FZ/xr8Q/0D/wjvTFNQiOiHJmXPPuTP3XjfyvVha1lvOWFhcWl7JrxbW1jc2t4rbO/U4TATjNRb6oWi6Tsx9L+A16UmfNyPBnZHr84Y7vFDxxj0XsRcGN3Ic8c7I6Qdez2OOJKrRHjgyvZ3cFUtW2dLLnAd2BkrnL9CrGhZf0UYXIRgSjMARQBL24SCmpwUbFiLiOkiJE4Q8HeeYoEDehFScFA6xQ/r2adfK2ID2Kmes3YxO8ekV5DRxQJ6QdIKwOs3U8URnVuxvuVOdU91tTH83yzUiVmJA7F++mfK/PlWLRA+nugaPaoo0o6pjWZZEd0Xd3PxSlaQMEXEKdykuCDPtnPXZ1J5Y16566+j4u1YqVu1Zpk3woW5JA7Z/jnMe1I/K9nH56NouVc6mk0Yee9jHIc3zBBVcooqarvIRT3g2rgxhjI10KjVymWcX35bx8AlYEZMi</latexit>
ˆ
Y
<latexit sha1_base64="lQC4QzCPFGdhh6OYZgwQBeu9bYM=">AAACynicjVHLSsNAFD2Nr1pfVZdugkVwVRJdqCsLbly4qGAf0hZJptM2NE3CZCKU0J0/4FZ/xr8Q/0D/wjvTFNQiOiHJmXPPuTP3XjfyvVha1lvOWFhcWl7JrxbW1jc2t4rbO/U4TATjNRb6oWi6Tsx9L+A16UmfNyPBnZHr84Y7vFDxxj0XsRcGN3Ic8c7I6Qdez2OOJKrRHjgyvZ3cFUtW2dLLnAd2BkrnL9CrGhZf0UYXIRgSjMARQBL24SCmpwUbFiLiOkiJE4Q8HeeYoEDehFScFA6xQ/r2adfK2ID2Kmes3YxO8ekV5DRxQJ6QdIKwOs3U8URnVuxvuVOdU91tTH83yzUiVmJA7F++mfK/PlWLRA+nugaPaoo0o6pjWZZEd0Xd3PxSlaQMEXEKdykuCDPtnPXZ1J5Y16566+j4u1YqVu1Zpk3woW5JA7Z/jnMe1I/K9nH56NouVc6mk0Yee9jHIc3zBBVcooqarvIRT3g2rgxhjI10KjVymWcX35bx8AlYEZMi</latexit>
˜
Y
<latexit sha1_base64="9ttsIPPKTq7x0WRVba6+Oo8FE6A=">AAACzHicjVHLSsNAFD2Nr1pfVZdugkVwVRJdqCsLblxJBfuiLZKk0zp08iCZCKV06w+41X/xL8Q/0L/wzjQFtYhOSHLm3HvOzL3XjQRPpGW95YyFxaXllfxqYW19Y3OruL1TT8I09ljNC0UYN10nYYIHrCa5FKwZxczxXcEa7vBCxRv3LE54GNzIUcS6vjMIeJ97jiSq1ZFc9Ni4NbktlqyypZc5D+wMlM5foFc1LL6igx5CeEjhgyGAJCzgIKGnDRsWIuK6GBMXE+I6zjBBgbQpZTHKcIgd0ndAu3bGBrRXnolWe3SKoDcmpYkD0oSUFxNWp5k6nmpnxf7mPdae6m4j+ruZl0+sxB2xf+lmmf/VqVok+jjVNXCqKdKMqs7LXFLdFXVz80tVkhwi4hTuUTwm7GnlrM+m1iS6dtVbR8ffdaZi1d7LclN8qFvSgO2f45wH9aOyfVw+urZLlbPppJHHHvZxSPM8QQWXqKJG3j4e8YRn48qQxtiYTFONXKbZxbdlPHwCo/aUCw==</latexit>
˜
Y
<latexit sha1_base64="9ttsIPPKTq7x0WRVba6+Oo8FE6A=">AAACzHicjVHLSsNAFD2Nr1pfVZdugkVwVRJdqCsLblxJBfuiLZKk0zp08iCZCKV06w+41X/xL8Q/0L/wzjQFtYhOSHLm3HvOzL3XjQRPpGW95YyFxaXllfxqYW19Y3OruL1TT8I09ljNC0UYN10nYYIHrCa5FKwZxczxXcEa7vBCxRv3LE54GNzIUcS6vjMIeJ97jiSq1ZFc9Ni4NbktlqyypZc5D+wMlM5foFc1LL6igx5CeEjhgyGAJCzgIKGnDRsWIuK6GBMXE+I6zjBBgbQpZTHKcIgd0ndAu3bGBrRXnolWe3SKoDcmpYkD0oSUFxNWp5k6nmpnxf7mPdae6m4j+ruZl0+sxB2xf+lmmf/VqVok+jjVNXCqKdKMqs7LXFLdFXVz80tVkhwi4hTuUTwm7GnlrM+m1iS6dtVbR8ffdaZi1d7LclN8qFvSgO2f45wH9aOyfVw+urZLlbPppJHHHvZxSPM8QQWXqKJG3j4e8YRn48qQxtiYTFONXKbZxbdlPHwCo/aUCw==</latexit>
EMA update
Figure 1: Overview of our method. Our method is build upon the popular student-teacher frameworks
with CutMix operations. In addition to the existing modules in such a framework, we further introduce
a prototype-based predictor for the student model. The output
pprototype
s
of prototype-based predictor
will be supervised with the pseudo-label generated from the linear predictor of teacher model. Such
kind of consistency regularization will encourage the features from the same class to be closer than
the features of other classes and ease the difficulty of propagating label information from pixels to
pixels. This simple modification brings a significant improvement.
segmentation model from both the labeled and unlabeled images. We use
˜
Y
denote the segmentation
output and ˜
Y[a, b]indicates the output at the (a, b)coordinate.
Overview:
the overall structure of the proposed method is shown in Figure1, our approach is built
on top of the popular student-teacher framework for semi-supervised learning [
37
,
36
,
49
,
29
,
45
].
During the training procedure, the teacher model prediction will be selectively used as pseudo-labels
for supervising the student model. In other words, the back-propagation is performed on the student
model only. More specifically, the parameters of the teacher network are the exponential moving
average of the student network parameters [
37
]. Following the common practice [
36
], we also adopt
the weak-strong augmentation paradigm by feeding the teacher model weakly-augmented images and
the student strongly-augmented images. In the context of image segmentation, we take the normal
data augmentation (i.e., random crop and random horizontal flip of the input image) as the weak
augmentation and CutMix [44] as the strong data augmentation.
The key difference between our method and existing methods [
14
,
32
,
43
,
8
,
40
] is the use of
both a linear predictor (in both teacher and student models) and a prototype-based predictor (in
the student model only). As will be explained in the following section, the prediction from the
teacher model’s linear predictor will be used to create pseudo labels to supervise the training of the
prototype-based predictor of student model. This process acts as a regularization that could benefit
the label information propagation.
3.2 Prototype-based Predictor for Semantic Segmentation
Prototype-based classifier is a long-standing technique in machine learning [
22
,
4
]. From its early
form of the nearest neighbour classifier or the nearest mean classifier to prototypical networks in the
few-shot learning literature [
35
], its idea of using prototypes instead of a parameterized classifier has
been widely adopted in many fields. Very recently, prototype-based variety has been introduced into
the semantic segmentation task [
48
] and has been proved to be effective under a fully-supervised
setting. Formally, prototype-based classifier/predictors make the prediction by comparing test samples
with a set of prototypes. The prototype can be a sample feature or the average of a set of sample
features of the same class. Without loss of generality, we denote the prototype set as
P={(pi, yi)}
,
with
pi
indicate the prototype and
yi
is its associated class. Note that the number of prototypes could
be larger than the number of classes. In other words, one class can have multiple prototypes for
modelling its diversity. More formally, with the prototype set, the classification decision can be made
by using
˜y=yks.t. k = arg max
i
sim(x, pi),(1)
where
sim(·,·)
represents the similarity metric function, e.g., cosine distance.
˜y
means the class
assignment for the test data
x
. The posterior probability of assigning a sample to the
c
-th class can
3
also be estimated in prototype-based classifier via:
pprototype(y=c|x) =
expmaxi|yi=csim(pi, x)/T
PC
t=1 expmaxj|yj=tsim(pj, x)/T ,(2)
where
T
is the temperature parameter and can be empirically set. Note that Eq. 2 essentially uses the
maximal similarity between a sample and prototypes of a class as the similarity between a sample
and a class.
3.3 Consistency Between Linear Predictor and Prototype-based Predictor
Although both prototype-based classifiers and linear classifiers can be used for semantic segmenta-
tion [
48
], they have quite different characteristics due to the nature of their decision-making process.
Specifically, linear classifiers could allocate learnable parameters
3
for each class, while prototype-
based classifiers solely rely on a good feature representation such that samples from the same class
will be close to at least one within-class prototypes while stay far from prototypes from other classes.
Consequently, linear classifiers could leverage the learnable parameter to focus more on discrimina-
tive dimensions of a feature representation while suppressing irrelevant feature dimensions, i.e., by
assigning a higher or lower weight to different dimensions. In contrast, prototype-based classifiers
cannot leverage that and tend to require more discriminative feature representations.
The different characteristics of prototype-based and linear classifiers motivate us to design a loss to
encourage the consistency of their predictions on unlabeled data to regularize the feature representa-
tion. Our key insight is that a good feature should support either type of classifier to make correct
predictions. In addition to using two different types of classifiers, we also incorporate the CutMix [
44
]
strategy to enhance the above consistency regularization. CutMix augmentation is a popular ingredi-
ent in many state-of-the-art semi-supervised semantic segmentation methods [
8
,
26
,
40
]. Specially,
we first perform weak augmentation, e.g., random flip and crop operations, to the input images of the
teacher model and obtain the pseudo-labels from the linear classifier. Next, we perform the CutMix
operation by mixing two unlabeled images
mix(Ii, Ij)
and their associated prediction
mix(˜
Yi,˜
Yj)
.
The mixed image
mix(Ii, Ij)
is fed to the student model and the output from the prototype-based
classifier is then enforced to fit the pseudo-labels generated from mix(˜
Yi,˜
Yj).
Algorithm details:
As a semi-supervised segmentation algorithm, we apply different loss functions
for labeled images and unlabeled images.
For a batch of labeled images
{(Il
i, Y l
i)}Bl
i=1 Dl
, we train both the linear predictor and the
prototype-based predictor. The linear classifier
{wi}C
i=1
can produce a posterior probability estima-
tion plinear
s(Y[a, b] = c|Il
i)
plinear
s(Y[a, b] = c|Il
i) = exp(wT
c· Fl
i[a, b])
PC
j=1 exp(wT
j· Fl
i[a, b]),(3)
where
Fl
i[a, b] = f(A0(Il
i))
means the feature extracted at location
(a, b)
by first performing weak
data augmentation
A0
to
Il
i
and then feed it to the feature extractor
f
. Meanwhile, the posterior
probability of prototype-based predictor
pprototype
s(Y[a, b] = c|Il
i)
can also be estimated via Eq.
2. We use cosine similarity for
sim(·,·)
and empirically set the temperature hyperparameter
T
to
0.1
. Based on the ground truth label
Yl
i
, the student model will be optimized by the gradient
back-propagated from the two predictors simultaneously
Ll=Llinear
l+Lprototype
l,where (4)
Llinear
l=1
Bl
Bl
X
i
lceplinear
s(Y|Il
i), Y l
i;(5)
Lprototype
l=1
Bl
Bl
X
i
lcepprototype
s(Y|Il
i), Y l
i.(6)
3Learnable parameters in the context means parameters that can be updated via back-propagation.
4
摘要:

Semi-supervisedSemanticSegmentationwithPrototype-basedConsistencyRegularizationHai-MingXu1,LingqiaoLiu1,QiuchenBian2,ZhenYang31AustralianInstituteforMachineLearning,TheUniversityofAdelaide,2NortheasternUniversity,3HuaweiNoah'sArkLab{hai-ming.xu,lingqiao.liu}@adelaide.edu.au,bian.qiu@northeastern.ed...

展开>> 收起<<
Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization Hai-Ming Xu1 Lingqiao Liu1 Qiuchen Bian2 Zhen Yang3.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:6.29MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注