Complementary consistency semi-supervised learning for 3D left atrial image segmentation_2

2025-04-27 0 0 7.45MB 13 页 10玖币

侵权投诉

Complementary consistency semi-supervised learning for 3D left atrial image segmentation

Hejun Huanga, Zuguo Chena,b,∗, Chaoyang Chena, Ming Lua, Ying Zoua

aSchool of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

bShenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

Abstract

A network based on complementary consistency training, called CC-Net, has been proposed for semi-supervised left atrium image

segmentation. CC-Net eﬃciently utilizes unlabeled data from the perspective of complementary information to address the problem of

limited ability of existing semi-supervised segmentation algorithms to extract information from unlabeled data. The complementary

symmetric structure of CC-Net includes a main model and two auxiliary models. The complementary model inter-perturbations between

the main and auxiliary models force consistency to form complementary consistency. The complementary information obtained by the

two auxiliary models helps the main model to eﬀectively focus on ambiguous areas, while enforcing consistency between the models

is advantageous in obtaining decision boundaries with low uncertainty. CC-Net has been validated on two public datasets. In the case

of speciﬁc proportions of labeled data, compared with current advanced algorithms, CC-Net has the best semi-supervised segmentation

performance. Our code is publicly available at https://github.com/Cuthbert-Huang/CC-Net.

Keywords: Complementary consistency, Semi-supervised segmentation, Complementary auxiliary models, Uncertainty

1. Introduction

Atrial ﬁbrillation (AF) is the most common arrhythmia and has a

signiﬁcant impact on global mortality, becoming one of the major

burdens of global healthcare (Guglielmo et al., 2019). The struc-

ture of the left atrium (LA) is necessary information for clinicians

to diagnose and treat atrial ﬁbrillation (Ikenouchi et al., 2021).

Traditional methods of manually segmenting LA have great lim-

itations due to their strong empirical dependence and susceptibil-

ity to errors (Xiong et al., 2021). Deep learning-based methods

have been developed for automatic segmentation of LA. For ex-

ample, a multi-task learning framework was constructed to share

features among tasks and achieve accurate segmentation (Chen

et al., 2019). An attention-based hierarchical aggregation net-

work (HAANet) was proposed, using hierarchical aggregation to

enhance the network’s feature fusion ability and attention mech-

anisms to improve the extraction of eﬀective features (Li et al.,

2019). These supervised learning methods have demonstrated

good segmentation performance but require a large amount of an-

notated data for training. 3D medical image annotation data is

scarce due to diﬃculty and high cost of annotation. Therefore,

how to achieve good segmentation performance with fewer anno-

tated data remains a pressing problem.

Semi-supervised learning typically refers to a method that uti-

lizes a large amount of unlabeled data and a small amount of la-

beled data to jointly learn, and is better suited for scenarios where

labeled data is diﬃcult to obtain (van Engelen and Hoos, 2020).

Semi-supervised learning is particularly eﬀective in the segmenta-

tion of the left atrium in 3D. Based on the mean teacher model, Yu

et al. (2019) uses an uncertainty map to guide the student model to

gradually learn reliable information from the teacher model, result-

ing in good left atrium segmentation results. Li et al. (2020) uses

a signed distance map regression to introduce shape and position

prior information, while using a discriminator as a regularization

term to enhance segmentation stability. Luo et al. (2021) recog-

nizes the disturbance between regression and prediction tasks, and

∗Corresponding author.

Email address: zg.chen@hnust.edu.cn (Zuguo Chen)

constructs a bi-task consistency loss through task conversion to

learn unlabeled data, enhancing the model’s generalization ability.

Although the above works have achieved good left atrium segmen-

tation performance, they have not been able to learn information

from diﬃcult areas of unlabeled data. MC-Net+(Wu et al., 2022)

constructs the mutual consistency between three diﬀerent upsam-

pling decoders, generating low-entropy predictions for areas of un-

certainty, and achieves eﬀective results. However, the probabil-

ity maps generated by diﬀerent upsampling methods only contain

conservative learnable information, and due to the shared encoder,

the learnable information is weakened as the training progresses,

leading to MC-Net+being unable to obtain correct segmentation

results in critical areas of uncertainty (see the comparison between

our method and MC-Net+in the Performance on the LA dataset

section 4.2 for details).

The article argues that accurate segmentation is achieved by

combining high-level semantic information with high-resolution

detail information. Focusing more on high-level semantic infor-

mation means expanding the boundaries of the deterministic seg-

mentation region, that is, reducing false negative rates. Focusing

more on high-resolution detail information means reducing the un-

certainty of correct segmentation boundaries, that is, increasing

true positive rates. Can the model be adjusted to focus more on

high-level semantic information or more on high-resolution detail

information to obtain probability maps with rich learnable infor-

mation? Skip connections play an important role in V-Net, helping

to restore high-resolution detail information lost during encoding

during upsampling (Milletari et al., 2016). Complementary A and

Complementary B are obtained by changing whether the skip con-

nection is used in a certain layer of the V-Net decoder. Figure 1

compares the segmentation results on the LA dataset after training

with 10% labeled data using Complementary A, Complementary

B, and V-Net. Complementary A gives up some high-resolution

detail information and focuses more on high-level semantic in-

formation. The second row of Figure1(c) indicates that Comple-

mentary A has wider segmentation boundaries in the challenging

branch area (indicated by the arrow in the ﬁgure). The third row of

Figure 1(c) clearly shows that the segmentation region of Comple-

mentary A basically wraps around the true label. Complementary

Preprint submitted to Computerized Medical Imaging and Graphics April 5, 2023

arXiv:2210.01438v5 [eess.IV] 4 Apr 2023

B gives up some high-level semantic information and focuses more

on high-resolution detail information. The second row of Figure

1(d) indicates that Complementary B has more reliable segmen-

tation boundaries in the challenging branch area (indicated by the

arrow in the ﬁgure). The third row of Figure 1(d) clearly shows that

the segmentation region of Complementary B is basically wrapped

by the true label. The results above show that there is rich learn-

able information between the probability maps generated by the

Complementary A and Complementary B models.

Therefore, this paper proposes a new network based on comple-

mentary consistency training by using V-Net as the main model

and constructing two complementary auxiliary models. The two

auxiliary models form a complementary symmetric structure by

changing whether to use skip connections in a certain layer of the

V-Net decoder. Drawing on cross-pseudo-supervision (Chen et al.,

2021) and mutual consistency (Wu et al., 2021), a sharpening func-

tion is used to convert the probability maps generated by the two

auxiliary models into pseudo-labels to strengthen the training of

the main model. At the same time, the high-quality probability

maps generated by the main model are also converted into pseudo-

labels to guide the training of the auxiliary models. The pertur-

bation between the main model and the auxiliary models forms

a complementary consistency training. The Dice loss is used as

the supervised loss for labeled input data, and the model consis-

tency regularization loss is used as the unsupervised loss for all

input data. After training, only the main model is used for testing,

greatly reducing the number of network parameters during testing

while achieving ﬁne segmentation results.Consequently, the con-

tributions and novelty of this paper are summarised as follows:

•This paper utilizes two complementary auxiliary models that

are constructed by alternating the use of skip connections.

This creates a model disturbance from a complementary in-

formation perspective, eﬀectively utilizing unlabeled data.

•The use of model consistency methods allows the main model

to learn complementary information from the auxiliary mod-

els.

•An independent encoder structure is proposed for comple-

mentary consistency learning.

This method is validated on two public datasets and the re-

sults show that it eﬀectively increases the utilization of unlabeled

data and achieves excellent semi-supervised segmentation perfor-

mance.

2. Related work

2.1. Semi-supervised segmentation

Due to the high cost and diﬃculty in obtaining labeled data in

segmentation tasks, semi-supervised segmentation has been vigor-

ously developed to learn useful information from unlabeled data

as much as possible. Zhai et al. (2022) deﬁned two asymmetric

generators and a discriminator for adversarial learning, in which

the discriminator ﬁlters the mask generated by the generators to

provide supervised information for the network to learn from unla-

beled data. Xiao et al. (2022) added a teacher model that combines

CNN and Transformer structures on the basis of the mean teacher

model, aiming to guide the student model to learn more informa-

tion. Zhang et al. (2022) proposed a dual correction method to

improve the quality of pseudo labels and obtain better segmenta-

tion performance. This paper constructs a complementary auxil-

iary model to help the main model explore the ambiguous area of

unlabeled data, and the complementary consistency between the

main model and the auxiliary model eﬀectively learns from unla-

beled data.

2.2. Consistency regularization

Consistency regularization is a common and eﬀective method

in semi-supervised learning. To address inherent perturbations

among related tasks, Luo et al. (2021) introduced the dual-task

consistency between the segmentation map derived from the level

set and directly predicted segmentation map. Liu et al. (2022)

added a classiﬁcation model on top of the segmentation model

and constructed a contrastive consistency loss using the class vec-

tors obtained from the classiﬁcation model. Wang et al. (2022a)

added spatial context perturbations to the input on top of model-

level perturbations, resulting in a dual-consistency segmentation

network. Hu et al. (2022) embedded self-attention in the mean

teacher model and encouraged the attention maps to remain con-

sistent across feature layers, forming attention-guided consistency.

In the cross-modal domain, Chen et al. (2022b) used cross-modal

consistency between non-paired CT and MRI to learn modality-

independent information. Ouali et al. (2020) proposed cross-

consistency to enforce consistency between the primary decoder

and auxiliary decoder in decision-making for low-density regions.

To address perturbations between diﬀerent levels of image en-

hancement, Zhong et al. (2021) introduced pixel-wise contrastive

consistency based on label consistency property and contrastive

feature space property between pixels. Inspired by mutual con-

sistency, our approach uses complementary consistency based on

model consistency to enable the main model to learn complemen-

tary information from the auxiliary model.

2.3. Multi-view training

The purpose of multi-view training is to utilize the redundant in-

formation between diﬀerent views to improve learning eﬃciency

(Yan et al., 2021). Chen et al. (2018) constructs three divergent

models and generates pseudo-labels using a voting mechanism.

With a large amount of unlabeled data and the introduction of

noise, this method yields good results. Xia et al. (2020) performs

multi-view collaborative training by rotating the input and using

the uncertainty estimates from each view to obtain accurate seg-

mentation. This method is simple to implement, but requires a

relatively large number of views to achieve ideal performance,

leading to redundant learning. Zheng et al. (2022) splits labeled

data into complementary subsets and trains two models with each

subset, eﬀectively improving the network’s ability to explore am-

biguous areas. Our approach constructs two complementary aux-

iliary models to guide the main model’s attention to ambiguous

areas from two complementary views. Additionally, leveraging

the multi-view information leads to low-entropy predictions.

2.4. Uncertainty estimation

Uncertainty estimation helps reduce the randomness of predic-

tions, and is crucial for learning reliable information from un-

labeled data (Chen et al., 2022a). In (Yu et al., 2019), Monte

Carlo sampling is used to obtain uncertainty maps from the teacher

model, which guides the student model to gradually acquire reli-

able information. Zheng et al. (2022) uses uncertainty maps as

weights for the loss to learn high-conﬁdence information. Wu

et al. (2022) constructs three diﬀerent upsampling decoders and

uses mutual learning to obtain low-uncertainty predictions. Wang

et al. (2022b) uses a triple uncertainty-guided framework, allow-

ing the student model to obtain more reliable knowledge from the

teacher model for all three tasks. Our method reduces uncertainty

Figure 1: Comparison of segmentation results of Complementary A, Complementary B, and V-Net after training with 10% labeled data, where Complementary A is

obtained from the second and fourth layers of the V-Net decoder without skip-connection, and Complementary B is obtained from the ﬁrst and third layers of the V-Net

decoder without skip-connection.

in complementary information through mutual learning between

the complementary auxiliary models and the main model.

3. Materials and Methods

3.1. Dataset and pre-processing

This article evaluates the proposed method using the LA dataset

and the Pancreas-CT dataset.

The LA dataset (Xiong et al., 2021) used in this study is ob-

tained from the 2018 Atrial Segmentation Challenge and consists

of 154 3D LGE-MRIs from 60 diagnosed atrial ﬁbrillation pa-

tients. Each 3D LGE-MRI scan has an isotropic resolution of

0.625×0.625×0.625mm3and spatial dimensions of 576×576×88

or 640 ×640 ×88 pixels. The segmentation labels were manually

obtained by three trained observers and stored in NRRD format.

Since only 100 labeled images were available, this study followed

the settings of (Yu et al., 2019; Li et al., 2020; Luo et al., 2021; Wu

et al., 2022), where the 100 images were split into 80 for training

and 20 for validation. The proposed method’s performance was

compared with other methods using the same validation set.

The Pancreas-CT dataset (Clark et al., 2013) consists of 82 3D

contrast-enhanced CT scans from 53 male and 27 female patients.

In 2020, cases #25 and #70 were found to be duplicates of case

#2 with minor cropping diﬀerences and were removed from the

dataset. The CT scans have a resolution of 512 ×512 pixels, with

pixel sizes and slice thicknesses between 1.5-2.5 millimeters. Fol-

lowing [31], we resampled the voxel sizes to a uniform isotropic

resolution of 1.0×1.0×1.0mm3and truncated the Hounsﬁeld Units

(HU) to the range of [-125, 275]. We used 60 samples for training

and reported the performance on the remaining 20 samples.

3.2. Network overall architecture

The overall architecture of our proposed method is illustrated in

Figure 2. It consists of a main model and two auxiliary models.

The main model employs V-Net (Milletari et al., 2016), and the

encoders of the two auxiliary models are identical to that of the

main model. The decoders of the auxiliary models are constructed

as a complementary symmetric structure using skip connections in

an interleaved manner. Speciﬁcally, the second and fourth layers

of the decoder in auxiliary model 1 do not have skip connections,

while the ﬁrst and third layers of the decoder in auxiliary model

2 do not have skip connections. The segmentation network takes

3D medical images as input. If the input is labeled data, the main

and auxiliary models are supervisedly trained using real labels. If

the input is unlabeled data, the two auxiliary models learn comple-

mentary information of the same input through the complemen-

tary symmetric structure to eﬀectively utilize the unlabeled data.

The probability maps generated by the main model and auxiliary

models are sharpened to obtain pseudo labels. The complemen-

tary information learned by one of the auxiliary models is used to

guide the training of the main model and the other auxiliary model

through the pseudo label. At the same time, the pseudo labels

generated by the main model guide the training of both auxiliary

models. This leads to the formation of a complementary consis-

tency training network (CC-Net) consisting of a main model and

two complementary auxiliary models. It is noteworthy that the two

auxiliary models are only used for training, and the main model is

used for inference, which greatly improves the eﬃciency of infer-

ence. The next two subsections will describe in detail the com-

plementary symmetric structure and complementary consistency

training.

3.3. Complementary symmetric structure

Inspired by (Wang et al., 2022b), this paper proposes a method

that utilizes two complementary auxiliary models to learn comple-

mentary information and eﬀectively utilize unannotated data. As

shown in Figure 3, the main model and auxiliary models adopt in-

dependent encoders. The three encoders have the same structure,

denoted as AM, AAux1 and AAux2 respectively. For the input x, each

encoder has 5 encoding blocks and can produce 5 outputs after 5

encodings.

aAux 1j

i=AAux1

j(AAux1

j−1(xi))),

aMj

i=AM

j(AM

j−1(xi))),

aAux 2j

i=AAux2

j(AAux2

j−1(xi)))

(1)

where aAux1j

irepresents the j-th(0 <j≤5) result produced by the

i-th data input to the ﬁrst auxiliary model encoder, aMj

irepresents

the j-th(0 <j≤5) result produced by the i-th data input to the

main model encoder, and aAux2 j

irepresents the j-th(0 <j≤5)

result produced by the i-th data input to the second auxiliary model

encoder. AAux1

jrepresents the j-th(0 <j≤5) encoding block

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Complementaryconsistencysemi-supervisedlearningfor3DleftatrialimagesegmentationHejunHuanga,ZuguoChena,b,,ChaoyangChena,MingLua,YingZouaaSchoolofInformationandElectricalEngineering,HunanUniversityofScienceandTechnology,Xiangtan411201,ChinabShenzhenInstituteofAdvancedTechnology,ChineseAcademyofScienc...

展开>> 收起<<

Complementary consistency semi-supervised learning for 3D left atrial image segmentation_2.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Complementary consistency semi-supervised learning for 3D left atrial image segmentation_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: