Dense FixMatch a simple semi-supervised learning method for pixel-wise prediction tasks Miquel Mart i Rabad an12 Alessandro Pieropan2_2

2025-05-06 0 0 6.18MB 12 页 10玖币

Dense FixMatch: a simple semi-supervised learning method

for pixel-wise prediction tasks

Miquel Mart´ı i Rabad´an∗1,2, Alessandro Pieropan2,

Hossein Azizpour1, and Atsuto Maki1

1KTH Royal Institute of Technology, Stockholm, Sweden

2Univrses AB, Stockholm, Sweden

Abstract

We propose Dense FixMatch, a simple method for

online semi-supervised learning of dense and struc-

tured prediction tasks combining pseudo-labeling

and consistency regularization via strong data aug-

mentation. We enable the application of FixMatch

in semi-supervised learning problems beyond im-

age classiﬁcation by adding a matching operation

on the pseudo-labels. This allows us to still use

the full strength of data augmentation pipelines,

including geometric transformations.

We evaluate it on semi-supervised semantic seg-

mentation on Cityscapes and Pascal VOC with dif-

ferent percentages of labeled data and ablate design

choices and hyper-parameters. Dense FixMatch

signiﬁcantly improves results compared to super-

vised learning using only labeled data, approaching

its performance with 1/4 of the labeled samples.

1 Introduction

Semi-supervised learning (SSL) has shown great

potential to reduce the annotation costs of train-

ing deep learning models. Modern methods achieve

competitive results at a fraction of the amount of

annotated samples required for standard supervised

learning [1, 2, 26]. The potential cost savings are

even larger for structured or dense prediction tasks,

such as object detection, instance or semantic seg-

mentation since the annotation cost for such tasks

is much larger than for image classiﬁcation.

∗Corresponding Author: miquelmr@kth.se

Figure 1: Dense FixMatch (blue) on unlabeled data

improves the performance of semi-supervised se-

mantic segmentation on Cityscapes val set using

DeepLabv3+ with ResNet-101 backbone over su-

pervised baselines (red) across diﬀerent amounts of

labeled samples. ?represents the mean over four

diﬀerent runs with random labeled data splits. Re-

sults for individual runs are shown with circles.

However, SSL methods have been mainly devel-

oped and studied with image-level classiﬁcation in

mind [28, 21, 31, 1, 26]. Only more recently, meth-

ods have appeared adapting or proposing solutions

to structured or dense tasks such as object de-

tection [38, 15, 27, 18, 32] or semantic segmenta-

tion [11, 22, 16, 39, 5, 14]. Still, most works have

focused on improving performance on speciﬁc tasks

and not aimed at ﬁnding methods that could be ap-

plied to diﬀerent tasks. Only a handful of methods

are generic enough to be used for multiple tasks

with no or few changes [11, 31, 38, 5, 27]. Design-

ing task-generic methods is important for ease of

portability to new tasks and the goal of our work,

as well as a must in multi-task learning scenarios.

1

arXiv:2210.09919v1 [cs.CV] 18 Oct 2022

Teacher

∘ -1

Consistency lossStudent

EMA

Pseudo-label

Figure 2: Dense FixMatch diagram for semantic segmentation. From an input image (top-left), two

diﬀerent views are created via αand A, the weak and strong augmentation pipelines respectively. The

squares represent the crops used for obtaining both views. Top: the ﬁrst view is used by the teacher

in the Mean Teacher [28] framework to generate pseudo-labels. These are matched to the second view

after applying the inverse of the weak augmentation α−1, and then the strong one A. Bottom: the

second view is passed to the student model to obtain predictions to train against the pseudo-labels via

the consistency loss, possible to deﬁne thanks to the shared structure and reference frame between both.

To this end, we perform simple but eﬀective mod-

iﬁcations to FixMatch [26] to adapt it for a larger

class of dense or structured task, staying as close

as possible to the original formulation. We call

our approach Dense FixMatch and summarise it

in Figure 2. We align the reference frame of the

pseudo-labels obtained from the weakly-augmented

view with that of the predictions obtained from

the strongly-augmented view. This way, we can

deﬁne a consistency loss at each output location

while still using the full set of possible augmenta-

tions. Using strong and varied augmentations has

been identiﬁed as a key component of self-training

with input-consistency [30, 11] since they allow ex-

ploring larger neighbourhoods of the training data

points in the input data manifold as well as in dif-

ferent directions. In addition, we incorporate the

Mean Teacher (MT) framework so that learning is

more robust to noisy pseudo-labels and imbalanced

class size [18, 33].

We evaluate our approach on semantic seg-

mentation with Cityscapes and Pascal VOC 2012

datasets and show the results in Figure 1 and

Table 1, outperforming supervised baselines across

diﬀerent labeled data regimes by a large margin

and achieving comparable results to other works

in the literature. We also compare diﬀerent

mini-batch sampling approaches to assess whether

it is feasible to use our method for semi-supervised

multi-task learning where separate labeled and

unlabeled data sampling is not possible [19].

Our contributions are as follows:

•We propose Dense FixMatch, a simple method

that adds a matching operation between

pseudo-labels and predictions to FixMatch

thereby enabling its use on semi-supervised

learning for any dense or structured task.

•We study its performance on semi-supervised

semantic segmentation on Cityscapes and Pas-

cal VOC 2012, showing improvements across

multiple labeled data regimes over supervised

baselines. For Cityscapes, we get improve-

ments of up to +0.1 mIoU for 93 and 186

labeled samples reaching 0.6697 mIoU and

0.7110 mIoU respectively, and +0.04 mIoU

when using all labeled samples and extra un-

labeled data reaching 0.8082 mIoU.

•We ablate our design choices and hyper-

parameters to give practitioners insights on

how to tune it for new tasks and datasets.

2

2 Related work

The success of semi-supervised learning [35] has

come mainly from its application to image classi-

ﬁcation with deep learning. Wei et al. [30] proved

that SSL methods based on (a) self-training and

(b) consistency regularization will achieve high ac-

curacy with respect to ground-truth labels with

the key to their success being to explore large

enough neighbourhoods of the pseudo-labeled ex-

amples in the input data manifold, for example

via aggressive data augmentation. Self-training

or pseudo-labeling methods rely on bootstrapping

current model predictions on unlabeled data and

using them as labels. Consistency regularization

relies on the assumption that small perturbations

of the data points in either input or latent space

should not change the output.

FixMatch [26] and Noisy Student self-

training [31] are two methods combining such

building blocks: the former follows an online

approach where pseudo-labels are generated

during training, and the latter has subsequent

pseudo-labeling and re-training phases. Both use

strong data augmentation to train against the

pseudo-labels. Other works have also used other

kinds of perturbations for the consistency objec-

tive, such as adversarial examples [21], network

perturbations [17, 28, 31] or MixUp [37, 1] as

well as other techniques to tackle distribution

misalignment between true- and pseudo-labels [2].

SSL applied to tasks other than image classi-

ﬁcation has also seen signiﬁcant developments in

recent years. For semi-supervised object detec-

tion, multiple works have used consistency regu-

larization and perturbations via data augmenta-

tion [15, 27, 18, 32]. For semi-supervised semantic

segmentation, the work in [11] found that strong

and varied perturbations are required and proposed

CutMix [36] as the strong augmentation. CCT [22]

enforces consistency between predictions perturb-

ing latent features. GCT [16] uses two diﬀerently

initialized networks for co-training and a ﬂaw detec-

tion module. CPS [5] instead enforces consistency

against hard pseudo-labels. Pseudoseg [39] uses

strong augmentation and fuses pseudo-labels from

decoder predictions with ones from GradCAM [25].

ST++ [34] does self-training with strong data aug-

mentation in the re-training phase while selecting

and prioritizing reliable images. AEL [14] focuses

on balancing the performance between classes via

diﬀerent task-speciﬁc strategies. U2PL [29] uses

unreliable pseudo-labels for negative learning.

In contrast, we follow FixMatch as close as pos-

sible to keep the beneﬁts of using online pseudo-

labeling and consistency regularization between

predictions on weakly and strongly augmented im-

ages. We add only a spatial matching operation to

enable its use in dense and structured tasks and the

MT framework for improving pseudo-label quality.

3 Dense FixMatch

We adapt FixMatch [26] for its use in structured

and dense prediction tasks in the semi-supervised

setting.

Our method assumes the standard framework of

semi-supervised learning where labeled samples XL

contribute to the supervised objective Lsand un-

labeled samples XUare used in an unsupervised

objective Lu, with the option to use the labeled

samples also for the latter. The unsupervised loss

weight λtrades oﬀ the contribution of both objec-

tives to the ﬁnal loss.

L=Ls(x,y, θ) + λLu(x, θ) (1)

To deﬁne the unsupervised or consistency objec-

tive, FixMatch uses image-level pseudo-labels ob-

tained from a weakly-augmented version of the un-

labeled images (via augmentation pipeline α) to su-

pervise learning on the strongly-augmented version

of the same images (via A). For image classiﬁ-

cation, the output is expected to be invariant to

the applied transformations and so the obtained

pseudo-label can be directly used for this purpose.

In contrast, this is not possible when the output

of the task at hand has a spatial structure related

to that of the input and thus will vary depend-

ing on the applied augmentations. This is the case

for dense or structured tasks such as semantic seg-

mentation or object detection, among others. For

those tasks, any geometric transformation of the in-

put equivariantly transforms its corresponding out-

put. Therefore, when using geometric transforma-

tions as part of the weak and strong augmentation

pipelines, the obtained pseudo-labels will not gen-

erally match pixel-to-pixel or at each location.

We adopt a simple approach to align the predic-

tions of one view (e.g. weak augmentation α) to

3

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DenseFixMatch:asimplesemi-supervisedlearningmethodforpixel-wisepredictiontasksMiquelMartiRabadan*1,2,AlessandroPieropan2,HosseinAzizpour1,andAtsutoMaki11KTHRoyalInstituteofTechnology,Stockholm,Sweden2UnivrsesAB,Stockholm,SwedenAbstractWeproposeDenseFixMatch,asimplemethodforonlinesemi-supervisedle...

展开>> 收起<<

Dense FixMatch a simple semi-supervised learning method for pixel-wise prediction tasks Miquel Mart i Rabad an12 Alessandro Pieropan2_2.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

相关推荐

更多

立即下载

分类：图书资源 价格：10玖币 属性：12 页 大小：6.18MB 格式：PDF 时间：2025-05-06

开通VIP享超值会员特权

多端同步记录
高速下载文档
免费文档工具
分享文档赚钱
每日登录抽奖
优质衍生服务

作者详情

近山遥水

初级编辑

文档 15020 粉丝 0

相关内容

更多

热门标签

人际关系配电装置动力学连接体力的合成高考理综全宋诗作者索引公务员考试

/ 12

评分收藏

立即下载

关于我们联系我们隐私政策用户协议免责申明会员服务协议
本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！ Copyright ©Jiubeiyunall rights reserved SITEMAP| 备案号：渝ICP备2024044455号| 渝公网安备50010702506394 | 违法与不良信息举报方式：微信:jiubeiyun2024,QQ:264159069,电话:15523442343,邮箱:jiubeiyun@126.com

客服

关注

二维码已失效
刷新

打开微信，点击“扫一扫”

安全高效便捷

免密登录