Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation

2025-05-02 0 0 3.74MB 18 页 10玖币

Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for

Medical Image Segmentation

Zefan Yanga,b,c, Di Lind, Dong Nia,b,c, Yi Wanga,b,c,∗

aNational-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements

and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, Guangdong, China

bSmart Medical Imaging, Learning and Engineering (SMILE) Lab, Shenzhen, Guangdong, China

cMedical UltraSound Image Computing (MUSIC) Lab, Shenzhen, Guangdong, China

dCollege of Intelligence and Computing, Tianjin University, Tianjin, China

Abstract

Scribble-supervised medical image segmentation tackles the limitation of sparse masks. Conventional approaches

alternate between: labeling pseudo-masks and optimizing network parameters. However, such iterative two-stage

paradigm is unwieldy and could be trapped in poor local optima since the networks undesirably regress to the erro-

neous pseudo-masks. To address these issues, we propose a non-iterative method where a stream of varying (pacing)

pseudo-masks teach a network via consistency training, named PacingPseudo. Our contributions are summarized as

follows. First, we design a non-iterative process. This process is achieved gracefully by a siamese architecture that

comparises two weight-sharing networks. The siamese architecture naturally allows a stream of pseudo-masks to as-

similate a stream of predicted-masks during training. Second, we make the consistency training eﬀective with two

necessary designs: (i) entropy regularization to obtain high-conﬁdence pseudo-masks for eﬀective teaching; and (ii)

distorted augmentations to create discrepancy between the pseudo-mask and predicted-mask streams for consistency

regularization. Third, we devise a new memory bank mechanism that provides an extra source of ensemble features

to complement scarce labeled pixels. We evaluate the proposed PacingPseudo on public abdominal organ, cardiac

structure, and myocardium datasets, named CHAOS T1&T2, ACDC, and LVSC. Evaluation metrics include the Dice

similarity coeﬃcient (DSC) and the 95-th percentile of Hausdorﬀdistance (HD95). Experimental results show that

PacingPseudo achieves a 68.0% DSC and 14.1mm HD95 on CHAOS T1, 73.7% DSC and 12.2mm HD95 on CHAOS

T2, 82.9% DSC and 4.3mm HD95 on ACDC, and 61.4% DSC and 11.9mm HD95 on LVSC. These results improve

the baseline method by ≥3.1% in DSC and ≥14.2mm in HD95. These results also outcompete previous methods. The

fully-supervised method attains a 67.0% DSC and 16.7mm HD95 on CHAOS T1, 71.2% DSC and 12.6mm HD95 on

CHAOS T2, 84.0% DSC and 3.9mm HD95 on ACDC, and 72.9% DSC and 7.6mm HD95 on LVSC. PacingPseudo’s

performance is comparable to the fully-supervised method on CHAOS T1&T2 and ACDC. Overall, the above results

demonstrate the feasibility of PacingPseudo for the challenging scribble-supervised segmentation tasks. The source

code is publicly available at https: // github. com/ zefanyang/ pacingpseudo .

Keywords: Scribble-supervised learning, medical image segmentation, consistency training, pseudo-mask, siamese

architecture, memory bank

∗Corresponding author: Yi Wang

Email addresses: 2016222016@email.szu.edu.cn (Zefan Yang),

ande.lin1988@gmail.com (Di Lin), nidong@szu.edu.cn (Dong Ni),

onewang@szu.edu.cn (Yi Wang)

Preprint submitted to Expert Systems With Applications October 2, 2023

arXiv:2210.10956v2 [cs.CV] 28 Sep 2023

1. Introduction

The success of deep learning in semantic segmentation

still relies on great amounts of fully annotated masks (Lit-

jens et al., 2017; Shen et al., 2017; Isensee et al., 2021;

Han et al., 2023; Uslu and Bharath, 2023; Qi et al., 2022).

Annotating the segmentation masks inﬂicts high cost in

the ﬁeld of medical imaging because of the expertise

and laborious workload needed in the process. Scribble-

supervised medical image segmentation, which trains net-

works supervised by scribble annotations only, can be a

feasible way to reduce that burden. Created by dragging

a cursor inside target regions, scribbles are ﬂexible to edit

structures (Tajbakhsh et al., 2020), but could only provide

sparse labeled pixels while leaving vast regions unlabeled,

posing a primary challenge in algorithm design.

Conventional scribble-supervised segmentation ap-

proaches (Lin et al., 2016; Can et al., 2018) iterate be-

tween two stages: labeling pseudo-masks and optimizing

network parameters; with the masks ﬁxed, optimize the

parameters, and vice versa. However, such paradigm has

two major drawbacks. Firstly, it could be trapped in poor

local optima, due to the reason that the networks prob-

ably regress to errors in the initial pseudo-masks and are

unable to considerably reduce such mistakes in later itera-

tions. Secondly, it is unwieldy especially when applied on

large datasets. To bypass the iterative process, recent stud-

ies have attained a non-iterative one. These non-iterative

approaches, which use either a regularizer (Tang et al.,

2018a,b) or knowledge from full masks (Valvano et al.,

2021) or mixed pseudo-masks (Zhang and Zhuang, 2022;

Luo et al., 2022a), overlooked pure pseudo-masks, as op-

posed to those artiﬁcially mixed ones, for network train-

ing.

We argue that such stream can be useful and ask: In a

non-iterative method, how and to what extent, can pure

pseudo-masks supervised by scribbles teach a network?

We attempt to answer the ﬁrst part of the question by

means of a siamese architecture (Bromley et al., 1993)

which has two weight-sharing neural networks applied on

two inputs, based on the following analysis. (i) Set up a

non-iterative paradigm. This paradigm, with the siamese

architecture, can be achieved by translating the iterative

two-stage process into: one network generating pseudo-

masks supervised by scribbles (i.e., labeling) to assimi-

late the predicted-masks of the other network (i.e., opti-

mizing) during training. (ii) Use pseudo-masks to teach a

network. The pseudo-mask, supervised by scribbles, acts

to regularize network parameters via consistency regular-

ization (a regularizer) that maximizes similarity between

it and the predicted-mask. An advantage is that these

pseudo-masks are being diversiﬁed than of ﬁxed qual-

ity, due to the continually updated network parameters

which are diﬀerent when mapping images at each training

step. Each image’s pseudo-masks are thereby varying be-

tween epochs (“pacing”). Fig. 1 shows the predictions of

“PacingPseudo” gradually approximate the ground-truths

as the network learns from more equivalently improving

pseudo-masks through the training process.

To answer the second part of the question, which is

about improving PacingPseudo’s level of performance,

we leverage insights on pseudo-labeling and augmenta-

tion from consistency training. Firstly, since labeled pix-

els are scarce in scribble-supervised segmentation, out-

put pseudo-masks remain uncertain. Xie et al. (2020a);

Berthelot et al. (2019b,a); Sohn et al. (2020) use arti-

ﬁcial post-processing (e.g., thresholding, sharpening, or

argmax) to obtain high-conﬁdence pseudo labels, whereas

MeanTeacher (Yu et al., 2019) takes self-ensembling

model’s predictions as pseudo-masks. However, we em-

pirically ﬁnd these approaches are of limited eﬀective-

ness in our task, but the entropy regularization (Grand-

valet and Bengio, 2004), that regularizes pseudo-masks

end-to-end, performs satisfactorily. We then provide anal-

ysis about these ﬁndings. Secondly, augmentation is crit-

ical as it creates discrepancy between the pseudo-mask

and predicted-mask branches to enable consistency reg-

ularization. Previous studies have promoted advanced

augmentation techniques (Berthelot et al., 2019a; Xie

et al., 2020a; Sohn et al., 2020) or spatial augmentations

(Bortsova et al., 2019; Patel and Dolz, 2022). In con-

trast, inspired by recent ﬁndings in representation learning

(Chen et al., 2020; Grill et al., 2020) where augmentation

serves a similar objective to create diﬀerent views of an

image (a positive pair) for assimilation, our study inves-

tigates a composition of distorted augmentations, which

can be suitable and more convenient for consistency-

training-based scribble-supervised segmentation.

We benchmark PacingPseudo on three public med-

ical image datasets: CHAOS T1 and T2 (abdomi-

nal multi-organs) (Kavur et al., 2021), ACDC (car-

diac structures) (Bernard et al., 2018), and LVSC (my-

Figure 1: Two examples showing the evolution of training-time inference predictions. Left (two columns): scribble annotations and ground-

truth masks. Middle (four columns): the predictions of our PacingPseudo updating over epochs. Right (one column): the predictions in the ﬁnal

epoch (epoch 400) of the baseline (i.e., a network trained by a partial cross-entropy loss). It can be observed that the predictions of PacingPseudo

gradually approximate the ground-truth masks while those of the baseline present inaccuracies. The images are from the training sets and training

is supervised solely by scribbles.

ocardium) Suinesiaputra et al. (2014). Despite its simplic-

ity, PacingPseudo improves the baseline by large margins

and consistently outcompetes previous methods in the cat-

egories of consistency training, iterative training and non-

iterative training. In some cases, PacingPseudo achieves

comparable performance with its fully-supervised coun-

terparts using ground-truth segmentation masks.

In conclusion, we list our contributions as follows:

•We design a non-iterative paradigm to bypass the

iterative two-stage paradigm proposed by previous

methods (Lin et al., 2016; Can et al., 2018). We opt

for the siamese architecture that naturally does “la-

beling” and “optimizing” during training, allowing

a stream of pseudo-masks with decreasing errors to

reinforce network learning.

•We make pure pseudo-masks suﬃcient for scribble-

supervised learning to avoid using redundant

pseudo-mask manipulation operations introduced by

previous methods (Luo et al., 2022a; Zhang and

Zhuang, 2022; Lee and Jeong, 2020). We utilize en-

tropy regularization to obtain high-conﬁdence, ac-

curate pseudo-masks. The pseudo-masks teach a

network via consistency training. We use distorted

augmentations to create discrepancy for consistency

training. We further study an open question about

the inﬂuence of the stop-gradient operation.

•We develop a memory bank mechanism, whereby an

extra source of information, the ensemble of embed-

ded labeled pixels across images, is introduced to

complement scarce labeled supervision.

2. Related Work

2.1. Iterative vs. Non-Iterative Weakly-Supervised Seg-

mentation

In this section, we revisit weakly-supervised segmenta-

tion studies from an iterative or non-iterative perspective

to position our study. Conventional iterative methods pre-

process pseudo-masks for network training several times

relying on diﬀerent techniques. For instance, Lin et al.

(2016); Can et al. (2018) use graph cuts (Boykov and

Kolmogorov, 2004) or dense conditional random ﬁelds

(DCRFs) Kr¨

ahenb¨

uhl and Koltun (2011) to reﬁne pseudo-

masks (network inference predictions); Khoreva et al.

(2017) designs heuristic prior rules to de-noise pseudo-

masks for better precision; (Papandreou et al., 2015) in-

corporates background and foreground biases given weak

labels via expectation-maximization steps; (Roth et al.,

2021) extends extreme points to pseudo-masks to super-

vise network training. Other than pre-processing, (Dai

et al., 2015) selects a small portion of candidate masks as

supervision via a cost for network training in each epoch;

Zhao et al. (2018) uses a two-step process where a detec-

tor generates proposals to be segmented.

In addition to ours, non-iterative methods do have

been proposed for weakly-supervised segmentation (Tang

et al., 2018a,b; Kervadec et al., 2019; Lee and Jeong,

2020; Dolz et al., 2021; Valvano et al., 2021; Patel and

Dolz, 2022; Zhang and Zhuang, 2022; Luo et al., 2022a).

Some studies add a regularizer to bypass pre-processing,

based on shallow approaches (e.g., graph cuts) (Tang

et al., 2018a,b) or cardinality Kervadec et al. (2019). An-

other category of studies transfer knowledge from full

masks during training. While (Dolz et al., 2021) con-

strains weakly-supervised predictions to be similar to

fully-supervised ones, full masks train an auxiliary dis-

criminator in (Valvano et al., 2021). Beyond above ap-

proaches, (Zhang and Zhuang, 2022; Luo et al., 2022a)

use cutout or mixed (i.e., linearly interpolated) pseudo-

masks for network training. In contrast, our study argues

that pure pseudo-masks (but not those artiﬁcial mixtures)

can already be eﬀective enough to teach a network and de-

sign our method inheriting spirits from recent consistency

training.

2.2. Consistency Training

Two aspects have been purposefully emphasized in

consistency training mechanisms: pseudo-labeling to re-

duce uncertainty in pseudo-masks, and augmentation

deﬁning the neighborhood of an image to create dis-

crepancy to enable consistency regularization. Regard-

ing pseudo-labeling, (Berthelot et al., 2019b,a; Yu et al.,

2019; Xie et al., 2020a; Sohn et al., 2020; Zou et al., 2020)

obtain high-conﬁdence pseudo labels using at least one

of the following artiﬁcial post-processing operations: (i)

thresholding: eliminates a distribution whose maximum

probability is smaller than a threshold; (ii) sharpening:

uses a temperature to sharpen a distribution; (iii) argmax:

truncate a distribution to one-hot encoding. But since

none of these proves eﬀective in our task, we seek end-

to-end entropy regularization (Grandvalet and Bengio,

2004). Owing to its simplicity, the entropy regulariza-

tion has been investigated in medical imaging (Dolz et al.,

2021; Luo et al., 2022b). However, while (Dolz et al.,

2021) reports collapse when incorporating it in point-

supervised segmentation, (Luo et al., 2022b) only tests

its eﬃcacy upon the baseline. In contrast, we show the

entropy regularization not only generally improves over

the baseline in scribble-supervised segmentation, but also

can regularize low-entropy (i.e., high-conﬁdence) pseudo-

masks to reinforce network learning via consistency train-

ing.

In terms of augmentation in consistency training, while

(Xie et al., 2020a; Berthelot et al., 2019a; Sohn et al.,

2020) favor advanced augmentation techniques (e.g.,

RandAugment (Cubuk et al., 2020), CTAugment (Berth-

elot et al., 2019a)), some studies (Bortsova et al., 2019;

Li et al., 2020; Laradji et al., 2021; Patel and Dolz, 2022)

apply a spatial augmentation and its inverse version, and

impose transformation equivariance. However, we note

augmentation to obtain diﬀerent views of the same image

is not just required in consistency training, but also es-

sential in representation learning (He et al., 2020; Chen

et al., 2020; Grill et al., 2020; Chen and He, 2021). Both

consistency training and representation learning share an

objective that diﬀerent views of the same image should be

similar in output space. Inspired by this insight, our study

explores distortion augmentation, recently popularized in

the representation-learning community (Chen et al., 2020;

Grill et al., 2020), for scribble-supervised segmentation in

medical imaging.

3. Method

Our framework (Fig. 2) uses two weight-sharing neu-

ral networks, denoted as fθ(·). An input image xun-

dergoes ω(·) and then β(·) to produce two augmented

views: a commonly-augmented view ω(x) and a further-

augmented view β◦ω(x). The predictions of ω(x) serve

as pseudo-masks. Labeled pixels in pseudo-masks are pe-

nalized by a partial cross-entropy loss Lpce described in

Section 3.1; unlabeled pixels are regularized by an en-

tropy regularization loss Lent described in Section 3.3.

To use the pseudo-masks to guide network training, a

consistency regularization loss Lcr described in Section

3.2 maximizes similarity between the predicted-masks of

β◦ω(x) and the pseudo-masks. To regularize network

training, we incorporate a memory bank, an auxiliary loss

Laux, and a memory loss Lmdescribed in Section 3.4. The

overall loss function is described in Section 3.5. The ar-

chitecture of the network fθ(·) is described in Section 3.6.

Training details are described in Section 3.7.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Non-IterativeScribble-SupervisedLearningwithPacingPseudo-MasksforMedicalImageSegmentationZefanYanga,b,c,DiLind,DongNia,b,c,YiWanga,b,c,∗aNational-RegionalKeyTechnologyEngineeringLaboratoryforMedicalUltrasound,GuangdongKeyLaboratoryforBiomedicalMeasurementsandUltrasoundImaging,SchoolofBiomedicalEngin...

展开>> 收起<<

Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: