Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation

2025-05-02 0 0 3.74MB 18 页 10玖币
侵权投诉
Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for
Medical Image Segmentation
Zefan Yanga,b,c, Di Lind, Dong Nia,b,c, Yi Wanga,b,c,
aNational-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements
and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, Guangdong, China
bSmart Medical Imaging, Learning and Engineering (SMILE) Lab, Shenzhen, Guangdong, China
cMedical UltraSound Image Computing (MUSIC) Lab, Shenzhen, Guangdong, China
dCollege of Intelligence and Computing, Tianjin University, Tianjin, China
Abstract
Scribble-supervised medical image segmentation tackles the limitation of sparse masks. Conventional approaches
alternate between: labeling pseudo-masks and optimizing network parameters. However, such iterative two-stage
paradigm is unwieldy and could be trapped in poor local optima since the networks undesirably regress to the erro-
neous pseudo-masks. To address these issues, we propose a non-iterative method where a stream of varying (pacing)
pseudo-masks teach a network via consistency training, named PacingPseudo. Our contributions are summarized as
follows. First, we design a non-iterative process. This process is achieved gracefully by a siamese architecture that
comparises two weight-sharing networks. The siamese architecture naturally allows a stream of pseudo-masks to as-
similate a stream of predicted-masks during training. Second, we make the consistency training eective with two
necessary designs: (i) entropy regularization to obtain high-confidence pseudo-masks for eective teaching; and (ii)
distorted augmentations to create discrepancy between the pseudo-mask and predicted-mask streams for consistency
regularization. Third, we devise a new memory bank mechanism that provides an extra source of ensemble features
to complement scarce labeled pixels. We evaluate the proposed PacingPseudo on public abdominal organ, cardiac
structure, and myocardium datasets, named CHAOS T1&T2, ACDC, and LVSC. Evaluation metrics include the Dice
similarity coecient (DSC) and the 95-th percentile of Hausdordistance (HD95). Experimental results show that
PacingPseudo achieves a 68.0% DSC and 14.1mm HD95 on CHAOS T1, 73.7% DSC and 12.2mm HD95 on CHAOS
T2, 82.9% DSC and 4.3mm HD95 on ACDC, and 61.4% DSC and 11.9mm HD95 on LVSC. These results improve
the baseline method by 3.1% in DSC and 14.2mm in HD95. These results also outcompete previous methods. The
fully-supervised method attains a 67.0% DSC and 16.7mm HD95 on CHAOS T1, 71.2% DSC and 12.6mm HD95 on
CHAOS T2, 84.0% DSC and 3.9mm HD95 on ACDC, and 72.9% DSC and 7.6mm HD95 on LVSC. PacingPseudo’s
performance is comparable to the fully-supervised method on CHAOS T1&T2 and ACDC. Overall, the above results
demonstrate the feasibility of PacingPseudo for the challenging scribble-supervised segmentation tasks. The source
code is publicly available at https: // github. com/ zefanyang/ pacingpseudo .
Keywords: Scribble-supervised learning, medical image segmentation, consistency training, pseudo-mask, siamese
architecture, memory bank
Corresponding author: Yi Wang
Email addresses: 2016222016@email.szu.edu.cn (Zefan Yang),
ande.lin1988@gmail.com (Di Lin), nidong@szu.edu.cn (Dong Ni),
onewang@szu.edu.cn (Yi Wang)
Preprint submitted to Expert Systems With Applications October 2, 2023
arXiv:2210.10956v2 [cs.CV] 28 Sep 2023
1. Introduction
The success of deep learning in semantic segmentation
still relies on great amounts of fully annotated masks (Lit-
jens et al., 2017; Shen et al., 2017; Isensee et al., 2021;
Han et al., 2023; Uslu and Bharath, 2023; Qi et al., 2022).
Annotating the segmentation masks inflicts high cost in
the field of medical imaging because of the expertise
and laborious workload needed in the process. Scribble-
supervised medical image segmentation, which trains net-
works supervised by scribble annotations only, can be a
feasible way to reduce that burden. Created by dragging
a cursor inside target regions, scribbles are flexible to edit
structures (Tajbakhsh et al., 2020), but could only provide
sparse labeled pixels while leaving vast regions unlabeled,
posing a primary challenge in algorithm design.
Conventional scribble-supervised segmentation ap-
proaches (Lin et al., 2016; Can et al., 2018) iterate be-
tween two stages: labeling pseudo-masks and optimizing
network parameters; with the masks fixed, optimize the
parameters, and vice versa. However, such paradigm has
two major drawbacks. Firstly, it could be trapped in poor
local optima, due to the reason that the networks prob-
ably regress to errors in the initial pseudo-masks and are
unable to considerably reduce such mistakes in later itera-
tions. Secondly, it is unwieldy especially when applied on
large datasets. To bypass the iterative process, recent stud-
ies have attained a non-iterative one. These non-iterative
approaches, which use either a regularizer (Tang et al.,
2018a,b) or knowledge from full masks (Valvano et al.,
2021) or mixed pseudo-masks (Zhang and Zhuang, 2022;
Luo et al., 2022a), overlooked pure pseudo-masks, as op-
posed to those artificially mixed ones, for network train-
ing.
We argue that such stream can be useful and ask: In a
non-iterative method, how and to what extent, can pure
pseudo-masks supervised by scribbles teach a network?
We attempt to answer the first part of the question by
means of a siamese architecture (Bromley et al., 1993)
which has two weight-sharing neural networks applied on
two inputs, based on the following analysis. (i) Set up a
non-iterative paradigm. This paradigm, with the siamese
architecture, can be achieved by translating the iterative
two-stage process into: one network generating pseudo-
masks supervised by scribbles (i.e., labeling) to assimi-
late the predicted-masks of the other network (i.e., opti-
mizing) during training. (ii) Use pseudo-masks to teach a
network. The pseudo-mask, supervised by scribbles, acts
to regularize network parameters via consistency regular-
ization (a regularizer) that maximizes similarity between
it and the predicted-mask. An advantage is that these
pseudo-masks are being diversified than of fixed qual-
ity, due to the continually updated network parameters
which are dierent when mapping images at each training
step. Each image’s pseudo-masks are thereby varying be-
tween epochs (“pacing”). Fig. 1 shows the predictions of
“PacingPseudo” gradually approximate the ground-truths
as the network learns from more equivalently improving
pseudo-masks through the training process.
To answer the second part of the question, which is
about improving PacingPseudo’s level of performance,
we leverage insights on pseudo-labeling and augmenta-
tion from consistency training. Firstly, since labeled pix-
els are scarce in scribble-supervised segmentation, out-
put pseudo-masks remain uncertain. Xie et al. (2020a);
Berthelot et al. (2019b,a); Sohn et al. (2020) use arti-
ficial post-processing (e.g., thresholding, sharpening, or
argmax) to obtain high-confidence pseudo labels, whereas
MeanTeacher (Yu et al., 2019) takes self-ensembling
model’s predictions as pseudo-masks. However, we em-
pirically find these approaches are of limited eective-
ness in our task, but the entropy regularization (Grand-
valet and Bengio, 2004), that regularizes pseudo-masks
end-to-end, performs satisfactorily. We then provide anal-
ysis about these findings. Secondly, augmentation is crit-
ical as it creates discrepancy between the pseudo-mask
and predicted-mask branches to enable consistency reg-
ularization. Previous studies have promoted advanced
augmentation techniques (Berthelot et al., 2019a; Xie
et al., 2020a; Sohn et al., 2020) or spatial augmentations
(Bortsova et al., 2019; Patel and Dolz, 2022). In con-
trast, inspired by recent findings in representation learning
(Chen et al., 2020; Grill et al., 2020) where augmentation
serves a similar objective to create dierent views of an
image (a positive pair) for assimilation, our study inves-
tigates a composition of distorted augmentations, which
can be suitable and more convenient for consistency-
training-based scribble-supervised segmentation.
We benchmark PacingPseudo on three public med-
ical image datasets: CHAOS T1 and T2 (abdomi-
nal multi-organs) (Kavur et al., 2021), ACDC (car-
diac structures) (Bernard et al., 2018), and LVSC (my-
2
Figure 1: Two examples showing the evolution of training-time inference predictions. Left (two columns): scribble annotations and ground-
truth masks. Middle (four columns): the predictions of our PacingPseudo updating over epochs. Right (one column): the predictions in the final
epoch (epoch 400) of the baseline (i.e., a network trained by a partial cross-entropy loss). It can be observed that the predictions of PacingPseudo
gradually approximate the ground-truth masks while those of the baseline present inaccuracies. The images are from the training sets and training
is supervised solely by scribbles.
ocardium) Suinesiaputra et al. (2014). Despite its simplic-
ity, PacingPseudo improves the baseline by large margins
and consistently outcompetes previous methods in the cat-
egories of consistency training, iterative training and non-
iterative training. In some cases, PacingPseudo achieves
comparable performance with its fully-supervised coun-
terparts using ground-truth segmentation masks.
In conclusion, we list our contributions as follows:
We design a non-iterative paradigm to bypass the
iterative two-stage paradigm proposed by previous
methods (Lin et al., 2016; Can et al., 2018). We opt
for the siamese architecture that naturally does “la-
beling” and “optimizing” during training, allowing
a stream of pseudo-masks with decreasing errors to
reinforce network learning.
We make pure pseudo-masks sucient for scribble-
supervised learning to avoid using redundant
pseudo-mask manipulation operations introduced by
previous methods (Luo et al., 2022a; Zhang and
Zhuang, 2022; Lee and Jeong, 2020). We utilize en-
tropy regularization to obtain high-confidence, ac-
curate pseudo-masks. The pseudo-masks teach a
network via consistency training. We use distorted
augmentations to create discrepancy for consistency
training. We further study an open question about
the influence of the stop-gradient operation.
We develop a memory bank mechanism, whereby an
extra source of information, the ensemble of embed-
ded labeled pixels across images, is introduced to
complement scarce labeled supervision.
2. Related Work
2.1. Iterative vs. Non-Iterative Weakly-Supervised Seg-
mentation
In this section, we revisit weakly-supervised segmenta-
tion studies from an iterative or non-iterative perspective
to position our study. Conventional iterative methods pre-
process pseudo-masks for network training several times
relying on dierent techniques. For instance, Lin et al.
(2016); Can et al. (2018) use graph cuts (Boykov and
Kolmogorov, 2004) or dense conditional random fields
(DCRFs) Kr¨
ahenb¨
uhl and Koltun (2011) to refine pseudo-
masks (network inference predictions); Khoreva et al.
(2017) designs heuristic prior rules to de-noise pseudo-
masks for better precision; (Papandreou et al., 2015) in-
corporates background and foreground biases given weak
labels via expectation-maximization steps; (Roth et al.,
2021) extends extreme points to pseudo-masks to super-
vise network training. Other than pre-processing, (Dai
et al., 2015) selects a small portion of candidate masks as
supervision via a cost for network training in each epoch;
3
Zhao et al. (2018) uses a two-step process where a detec-
tor generates proposals to be segmented.
In addition to ours, non-iterative methods do have
been proposed for weakly-supervised segmentation (Tang
et al., 2018a,b; Kervadec et al., 2019; Lee and Jeong,
2020; Dolz et al., 2021; Valvano et al., 2021; Patel and
Dolz, 2022; Zhang and Zhuang, 2022; Luo et al., 2022a).
Some studies add a regularizer to bypass pre-processing,
based on shallow approaches (e.g., graph cuts) (Tang
et al., 2018a,b) or cardinality Kervadec et al. (2019). An-
other category of studies transfer knowledge from full
masks during training. While (Dolz et al., 2021) con-
strains weakly-supervised predictions to be similar to
fully-supervised ones, full masks train an auxiliary dis-
criminator in (Valvano et al., 2021). Beyond above ap-
proaches, (Zhang and Zhuang, 2022; Luo et al., 2022a)
use cutout or mixed (i.e., linearly interpolated) pseudo-
masks for network training. In contrast, our study argues
that pure pseudo-masks (but not those artificial mixtures)
can already be eective enough to teach a network and de-
sign our method inheriting spirits from recent consistency
training.
2.2. Consistency Training
Two aspects have been purposefully emphasized in
consistency training mechanisms: pseudo-labeling to re-
duce uncertainty in pseudo-masks, and augmentation
defining the neighborhood of an image to create dis-
crepancy to enable consistency regularization. Regard-
ing pseudo-labeling, (Berthelot et al., 2019b,a; Yu et al.,
2019; Xie et al., 2020a; Sohn et al., 2020; Zou et al., 2020)
obtain high-confidence pseudo labels using at least one
of the following artificial post-processing operations: (i)
thresholding: eliminates a distribution whose maximum
probability is smaller than a threshold; (ii) sharpening:
uses a temperature to sharpen a distribution; (iii) argmax:
truncate a distribution to one-hot encoding. But since
none of these proves eective in our task, we seek end-
to-end entropy regularization (Grandvalet and Bengio,
2004). Owing to its simplicity, the entropy regulariza-
tion has been investigated in medical imaging (Dolz et al.,
2021; Luo et al., 2022b). However, while (Dolz et al.,
2021) reports collapse when incorporating it in point-
supervised segmentation, (Luo et al., 2022b) only tests
its ecacy upon the baseline. In contrast, we show the
entropy regularization not only generally improves over
the baseline in scribble-supervised segmentation, but also
can regularize low-entropy (i.e., high-confidence) pseudo-
masks to reinforce network learning via consistency train-
ing.
In terms of augmentation in consistency training, while
(Xie et al., 2020a; Berthelot et al., 2019a; Sohn et al.,
2020) favor advanced augmentation techniques (e.g.,
RandAugment (Cubuk et al., 2020), CTAugment (Berth-
elot et al., 2019a)), some studies (Bortsova et al., 2019;
Li et al., 2020; Laradji et al., 2021; Patel and Dolz, 2022)
apply a spatial augmentation and its inverse version, and
impose transformation equivariance. However, we note
augmentation to obtain dierent views of the same image
is not just required in consistency training, but also es-
sential in representation learning (He et al., 2020; Chen
et al., 2020; Grill et al., 2020; Chen and He, 2021). Both
consistency training and representation learning share an
objective that dierent views of the same image should be
similar in output space. Inspired by this insight, our study
explores distortion augmentation, recently popularized in
the representation-learning community (Chen et al., 2020;
Grill et al., 2020), for scribble-supervised segmentation in
medical imaging.
3. Method
Our framework (Fig. 2) uses two weight-sharing neu-
ral networks, denoted as fθ(·). An input image xun-
dergoes ω(·) and then β(·) to produce two augmented
views: a commonly-augmented view ω(x) and a further-
augmented view βω(x). The predictions of ω(x) serve
as pseudo-masks. Labeled pixels in pseudo-masks are pe-
nalized by a partial cross-entropy loss Lpce described in
Section 3.1; unlabeled pixels are regularized by an en-
tropy regularization loss Lent described in Section 3.3.
To use the pseudo-masks to guide network training, a
consistency regularization loss Lcr described in Section
3.2 maximizes similarity between the predicted-masks of
βω(x) and the pseudo-masks. To regularize network
training, we incorporate a memory bank, an auxiliary loss
Laux, and a memory loss Lmdescribed in Section 3.4. The
overall loss function is described in Section 3.5. The ar-
chitecture of the network fθ(·) is described in Section 3.6.
Training details are described in Section 3.7.
4
摘要:

Non-IterativeScribble-SupervisedLearningwithPacingPseudo-MasksforMedicalImageSegmentationZefanYanga,b,c,DiLind,DongNia,b,c,YiWanga,b,c,∗aNational-RegionalKeyTechnologyEngineeringLaboratoryforMedicalUltrasound,GuangdongKeyLaboratoryforBiomedicalMeasurementsandUltrasoundImaging,SchoolofBiomedicalEngin...

展开>> 收起<<
Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:3.74MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注