Dense FixMatch a simple semi-supervised learning method for pixel-wise prediction tasks Miquel Mart i Rabad an12 Alessandro Pieropan2_2

2025-05-06 0 0 6.18MB 12 页 10玖币
侵权投诉
Dense FixMatch: a simple semi-supervised learning method
for pixel-wise prediction tasks
Miquel Mart´ı i Rabad´an1,2, Alessandro Pieropan2,
Hossein Azizpour1, and Atsuto Maki1
1KTH Royal Institute of Technology, Stockholm, Sweden
2Univrses AB, Stockholm, Sweden
Abstract
We propose Dense FixMatch, a simple method for
online semi-supervised learning of dense and struc-
tured prediction tasks combining pseudo-labeling
and consistency regularization via strong data aug-
mentation. We enable the application of FixMatch
in semi-supervised learning problems beyond im-
age classification by adding a matching operation
on the pseudo-labels. This allows us to still use
the full strength of data augmentation pipelines,
including geometric transformations.
We evaluate it on semi-supervised semantic seg-
mentation on Cityscapes and Pascal VOC with dif-
ferent percentages of labeled data and ablate design
choices and hyper-parameters. Dense FixMatch
significantly improves results compared to super-
vised learning using only labeled data, approaching
its performance with 1/4 of the labeled samples.
1 Introduction
Semi-supervised learning (SSL) has shown great
potential to reduce the annotation costs of train-
ing deep learning models. Modern methods achieve
competitive results at a fraction of the amount of
annotated samples required for standard supervised
learning [1, 2, 26]. The potential cost savings are
even larger for structured or dense prediction tasks,
such as object detection, instance or semantic seg-
mentation since the annotation cost for such tasks
is much larger than for image classification.
Corresponding Author: miquelmr@kth.se
Figure 1: Dense FixMatch (blue) on unlabeled data
improves the performance of semi-supervised se-
mantic segmentation on Cityscapes val set using
DeepLabv3+ with ResNet-101 backbone over su-
pervised baselines (red) across different amounts of
labeled samples. ?represents the mean over four
different runs with random labeled data splits. Re-
sults for individual runs are shown with circles.
However, SSL methods have been mainly devel-
oped and studied with image-level classification in
mind [28, 21, 31, 1, 26]. Only more recently, meth-
ods have appeared adapting or proposing solutions
to structured or dense tasks such as object de-
tection [38, 15, 27, 18, 32] or semantic segmenta-
tion [11, 22, 16, 39, 5, 14]. Still, most works have
focused on improving performance on specific tasks
and not aimed at finding methods that could be ap-
plied to different tasks. Only a handful of methods
are generic enough to be used for multiple tasks
with no or few changes [11, 31, 38, 5, 27]. Design-
ing task-generic methods is important for ease of
portability to new tasks and the goal of our work,
as well as a must in multi-task learning scenarios.
1
arXiv:2210.09919v1 [cs.CV] 18 Oct 2022
Teacher
-1
Consistency lossStudent
EMA
Pseudo-label
Figure 2: Dense FixMatch diagram for semantic segmentation. From an input image (top-left), two
different views are created via αand A, the weak and strong augmentation pipelines respectively. The
squares represent the crops used for obtaining both views. Top: the first view is used by the teacher
in the Mean Teacher [28] framework to generate pseudo-labels. These are matched to the second view
after applying the inverse of the weak augmentation α1, and then the strong one A. Bottom: the
second view is passed to the student model to obtain predictions to train against the pseudo-labels via
the consistency loss, possible to define thanks to the shared structure and reference frame between both.
To this end, we perform simple but effective mod-
ifications to FixMatch [26] to adapt it for a larger
class of dense or structured task, staying as close
as possible to the original formulation. We call
our approach Dense FixMatch and summarise it
in Figure 2. We align the reference frame of the
pseudo-labels obtained from the weakly-augmented
view with that of the predictions obtained from
the strongly-augmented view. This way, we can
define a consistency loss at each output location
while still using the full set of possible augmenta-
tions. Using strong and varied augmentations has
been identified as a key component of self-training
with input-consistency [30, 11] since they allow ex-
ploring larger neighbourhoods of the training data
points in the input data manifold as well as in dif-
ferent directions. In addition, we incorporate the
Mean Teacher (MT) framework so that learning is
more robust to noisy pseudo-labels and imbalanced
class size [18, 33].
We evaluate our approach on semantic seg-
mentation with Cityscapes and Pascal VOC 2012
datasets and show the results in Figure 1 and
Table 1, outperforming supervised baselines across
different labeled data regimes by a large margin
and achieving comparable results to other works
in the literature. We also compare different
mini-batch sampling approaches to assess whether
it is feasible to use our method for semi-supervised
multi-task learning where separate labeled and
unlabeled data sampling is not possible [19].
Our contributions are as follows:
We propose Dense FixMatch, a simple method
that adds a matching operation between
pseudo-labels and predictions to FixMatch
thereby enabling its use on semi-supervised
learning for any dense or structured task.
We study its performance on semi-supervised
semantic segmentation on Cityscapes and Pas-
cal VOC 2012, showing improvements across
multiple labeled data regimes over supervised
baselines. For Cityscapes, we get improve-
ments of up to +0.1 mIoU for 93 and 186
labeled samples reaching 0.6697 mIoU and
0.7110 mIoU respectively, and +0.04 mIoU
when using all labeled samples and extra un-
labeled data reaching 0.8082 mIoU.
We ablate our design choices and hyper-
parameters to give practitioners insights on
how to tune it for new tasks and datasets.
2
2 Related work
The success of semi-supervised learning [35] has
come mainly from its application to image classi-
fication with deep learning. Wei et al. [30] proved
that SSL methods based on (a) self-training and
(b) consistency regularization will achieve high ac-
curacy with respect to ground-truth labels with
the key to their success being to explore large
enough neighbourhoods of the pseudo-labeled ex-
amples in the input data manifold, for example
via aggressive data augmentation. Self-training
or pseudo-labeling methods rely on bootstrapping
current model predictions on unlabeled data and
using them as labels. Consistency regularization
relies on the assumption that small perturbations
of the data points in either input or latent space
should not change the output.
FixMatch [26] and Noisy Student self-
training [31] are two methods combining such
building blocks: the former follows an online
approach where pseudo-labels are generated
during training, and the latter has subsequent
pseudo-labeling and re-training phases. Both use
strong data augmentation to train against the
pseudo-labels. Other works have also used other
kinds of perturbations for the consistency objec-
tive, such as adversarial examples [21], network
perturbations [17, 28, 31] or MixUp [37, 1] as
well as other techniques to tackle distribution
misalignment between true- and pseudo-labels [2].
SSL applied to tasks other than image classi-
fication has also seen significant developments in
recent years. For semi-supervised object detec-
tion, multiple works have used consistency regu-
larization and perturbations via data augmenta-
tion [15, 27, 18, 32]. For semi-supervised semantic
segmentation, the work in [11] found that strong
and varied perturbations are required and proposed
CutMix [36] as the strong augmentation. CCT [22]
enforces consistency between predictions perturb-
ing latent features. GCT [16] uses two differently
initialized networks for co-training and a flaw detec-
tion module. CPS [5] instead enforces consistency
against hard pseudo-labels. Pseudoseg [39] uses
strong augmentation and fuses pseudo-labels from
decoder predictions with ones from GradCAM [25].
ST++ [34] does self-training with strong data aug-
mentation in the re-training phase while selecting
and prioritizing reliable images. AEL [14] focuses
on balancing the performance between classes via
different task-specific strategies. U2PL [29] uses
unreliable pseudo-labels for negative learning.
In contrast, we follow FixMatch as close as pos-
sible to keep the benefits of using online pseudo-
labeling and consistency regularization between
predictions on weakly and strongly augmented im-
ages. We add only a spatial matching operation to
enable its use in dense and structured tasks and the
MT framework for improving pseudo-label quality.
3 Dense FixMatch
We adapt FixMatch [26] for its use in structured
and dense prediction tasks in the semi-supervised
setting.
Our method assumes the standard framework of
semi-supervised learning where labeled samples XL
contribute to the supervised objective Lsand un-
labeled samples XUare used in an unsupervised
objective Lu, with the option to use the labeled
samples also for the latter. The unsupervised loss
weight λtrades off the contribution of both objec-
tives to the final loss.
L=Ls(x,y, θ) + λLu(x, θ) (1)
To define the unsupervised or consistency objec-
tive, FixMatch uses image-level pseudo-labels ob-
tained from a weakly-augmented version of the un-
labeled images (via augmentation pipeline α) to su-
pervise learning on the strongly-augmented version
of the same images (via A). For image classifi-
cation, the output is expected to be invariant to
the applied transformations and so the obtained
pseudo-label can be directly used for this purpose.
In contrast, this is not possible when the output
of the task at hand has a spatial structure related
to that of the input and thus will vary depend-
ing on the applied augmentations. This is the case
for dense or structured tasks such as semantic seg-
mentation or object detection, among others. For
those tasks, any geometric transformation of the in-
put equivariantly transforms its corresponding out-
put. Therefore, when using geometric transforma-
tions as part of the weak and strong augmentation
pipelines, the obtained pseudo-labels will not gen-
erally match pixel-to-pixel or at each location.
We adopt a simple approach to align the predic-
tions of one view (e.g. weak augmentation α) to
3
摘要:

DenseFixMatch:asimplesemi-supervisedlearningmethodforpixel-wisepredictiontasksMiquelMartiRabadan*1,2,AlessandroPieropan2,HosseinAzizpour1,andAtsutoMaki11KTHRoyalInstituteofTechnology,Stockholm,Sweden2UnivrsesAB,Stockholm,SwedenAbstractWeproposeDenseFixMatch,asimplemethodforonlinesemi-supervisedle...

展开>> 收起<<
Dense FixMatch a simple semi-supervised learning method for pixel-wise prediction tasks Miquel Mart i Rabad an12 Alessandro Pieropan2_2.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:6.18MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注