Complementary consistency semi-supervised learning for 3D left atrial image segmentation_2

2025-04-27 0 0 7.45MB 13 页 10玖币
侵权投诉
Complementary consistency semi-supervised learning for 3D left atrial image segmentation
Hejun Huanga, Zuguo Chena,b,, Chaoyang Chena, Ming Lua, Ying Zoua
aSchool of Information and Electrical Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
bShenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Abstract
A network based on complementary consistency training, called CC-Net, has been proposed for semi-supervised left atrium image
segmentation. CC-Net eciently utilizes unlabeled data from the perspective of complementary information to address the problem of
limited ability of existing semi-supervised segmentation algorithms to extract information from unlabeled data. The complementary
symmetric structure of CC-Net includes a main model and two auxiliary models. The complementary model inter-perturbations between
the main and auxiliary models force consistency to form complementary consistency. The complementary information obtained by the
two auxiliary models helps the main model to eectively focus on ambiguous areas, while enforcing consistency between the models
is advantageous in obtaining decision boundaries with low uncertainty. CC-Net has been validated on two public datasets. In the case
of specific proportions of labeled data, compared with current advanced algorithms, CC-Net has the best semi-supervised segmentation
performance. Our code is publicly available at https://github.com/Cuthbert-Huang/CC-Net.
Keywords: Complementary consistency, Semi-supervised segmentation, Complementary auxiliary models, Uncertainty
1. Introduction
Atrial fibrillation (AF) is the most common arrhythmia and has a
significant impact on global mortality, becoming one of the major
burdens of global healthcare (Guglielmo et al., 2019). The struc-
ture of the left atrium (LA) is necessary information for clinicians
to diagnose and treat atrial fibrillation (Ikenouchi et al., 2021).
Traditional methods of manually segmenting LA have great lim-
itations due to their strong empirical dependence and susceptibil-
ity to errors (Xiong et al., 2021). Deep learning-based methods
have been developed for automatic segmentation of LA. For ex-
ample, a multi-task learning framework was constructed to share
features among tasks and achieve accurate segmentation (Chen
et al., 2019). An attention-based hierarchical aggregation net-
work (HAANet) was proposed, using hierarchical aggregation to
enhance the network’s feature fusion ability and attention mech-
anisms to improve the extraction of eective features (Li et al.,
2019). These supervised learning methods have demonstrated
good segmentation performance but require a large amount of an-
notated data for training. 3D medical image annotation data is
scarce due to diculty and high cost of annotation. Therefore,
how to achieve good segmentation performance with fewer anno-
tated data remains a pressing problem.
Semi-supervised learning typically refers to a method that uti-
lizes a large amount of unlabeled data and a small amount of la-
beled data to jointly learn, and is better suited for scenarios where
labeled data is dicult to obtain (van Engelen and Hoos, 2020).
Semi-supervised learning is particularly eective in the segmenta-
tion of the left atrium in 3D. Based on the mean teacher model, Yu
et al. (2019) uses an uncertainty map to guide the student model to
gradually learn reliable information from the teacher model, result-
ing in good left atrium segmentation results. Li et al. (2020) uses
a signed distance map regression to introduce shape and position
prior information, while using a discriminator as a regularization
term to enhance segmentation stability. Luo et al. (2021) recog-
nizes the disturbance between regression and prediction tasks, and
Corresponding author.
Email address: zg.chen@hnust.edu.cn (Zuguo Chen)
constructs a bi-task consistency loss through task conversion to
learn unlabeled data, enhancing the model’s generalization ability.
Although the above works have achieved good left atrium segmen-
tation performance, they have not been able to learn information
from dicult areas of unlabeled data. MC-Net+(Wu et al., 2022)
constructs the mutual consistency between three dierent upsam-
pling decoders, generating low-entropy predictions for areas of un-
certainty, and achieves eective results. However, the probabil-
ity maps generated by dierent upsampling methods only contain
conservative learnable information, and due to the shared encoder,
the learnable information is weakened as the training progresses,
leading to MC-Net+being unable to obtain correct segmentation
results in critical areas of uncertainty (see the comparison between
our method and MC-Net+in the Performance on the LA dataset
section 4.2 for details).
The article argues that accurate segmentation is achieved by
combining high-level semantic information with high-resolution
detail information. Focusing more on high-level semantic infor-
mation means expanding the boundaries of the deterministic seg-
mentation region, that is, reducing false negative rates. Focusing
more on high-resolution detail information means reducing the un-
certainty of correct segmentation boundaries, that is, increasing
true positive rates. Can the model be adjusted to focus more on
high-level semantic information or more on high-resolution detail
information to obtain probability maps with rich learnable infor-
mation? Skip connections play an important role in V-Net, helping
to restore high-resolution detail information lost during encoding
during upsampling (Milletari et al., 2016). Complementary A and
Complementary B are obtained by changing whether the skip con-
nection is used in a certain layer of the V-Net decoder. Figure 1
compares the segmentation results on the LA dataset after training
with 10% labeled data using Complementary A, Complementary
B, and V-Net. Complementary A gives up some high-resolution
detail information and focuses more on high-level semantic in-
formation. The second row of Figure1(c) indicates that Comple-
mentary A has wider segmentation boundaries in the challenging
branch area (indicated by the arrow in the figure). The third row of
Figure 1(c) clearly shows that the segmentation region of Comple-
mentary A basically wraps around the true label. Complementary
Preprint submitted to Computerized Medical Imaging and Graphics April 5, 2023
arXiv:2210.01438v5 [eess.IV] 4 Apr 2023
B gives up some high-level semantic information and focuses more
on high-resolution detail information. The second row of Figure
1(d) indicates that Complementary B has more reliable segmen-
tation boundaries in the challenging branch area (indicated by the
arrow in the figure). The third row of Figure 1(d) clearly shows that
the segmentation region of Complementary B is basically wrapped
by the true label. The results above show that there is rich learn-
able information between the probability maps generated by the
Complementary A and Complementary B models.
Therefore, this paper proposes a new network based on comple-
mentary consistency training by using V-Net as the main model
and constructing two complementary auxiliary models. The two
auxiliary models form a complementary symmetric structure by
changing whether to use skip connections in a certain layer of the
V-Net decoder. Drawing on cross-pseudo-supervision (Chen et al.,
2021) and mutual consistency (Wu et al., 2021), a sharpening func-
tion is used to convert the probability maps generated by the two
auxiliary models into pseudo-labels to strengthen the training of
the main model. At the same time, the high-quality probability
maps generated by the main model are also converted into pseudo-
labels to guide the training of the auxiliary models. The pertur-
bation between the main model and the auxiliary models forms
a complementary consistency training. The Dice loss is used as
the supervised loss for labeled input data, and the model consis-
tency regularization loss is used as the unsupervised loss for all
input data. After training, only the main model is used for testing,
greatly reducing the number of network parameters during testing
while achieving fine segmentation results.Consequently, the con-
tributions and novelty of this paper are summarised as follows:
This paper utilizes two complementary auxiliary models that
are constructed by alternating the use of skip connections.
This creates a model disturbance from a complementary in-
formation perspective, eectively utilizing unlabeled data.
The use of model consistency methods allows the main model
to learn complementary information from the auxiliary mod-
els.
An independent encoder structure is proposed for comple-
mentary consistency learning.
This method is validated on two public datasets and the re-
sults show that it eectively increases the utilization of unlabeled
data and achieves excellent semi-supervised segmentation perfor-
mance.
2. Related work
2.1. Semi-supervised segmentation
Due to the high cost and diculty in obtaining labeled data in
segmentation tasks, semi-supervised segmentation has been vigor-
ously developed to learn useful information from unlabeled data
as much as possible. Zhai et al. (2022) defined two asymmetric
generators and a discriminator for adversarial learning, in which
the discriminator filters the mask generated by the generators to
provide supervised information for the network to learn from unla-
beled data. Xiao et al. (2022) added a teacher model that combines
CNN and Transformer structures on the basis of the mean teacher
model, aiming to guide the student model to learn more informa-
tion. Zhang et al. (2022) proposed a dual correction method to
improve the quality of pseudo labels and obtain better segmenta-
tion performance. This paper constructs a complementary auxil-
iary model to help the main model explore the ambiguous area of
unlabeled data, and the complementary consistency between the
main model and the auxiliary model eectively learns from unla-
beled data.
2.2. Consistency regularization
Consistency regularization is a common and eective method
in semi-supervised learning. To address inherent perturbations
among related tasks, Luo et al. (2021) introduced the dual-task
consistency between the segmentation map derived from the level
set and directly predicted segmentation map. Liu et al. (2022)
added a classification model on top of the segmentation model
and constructed a contrastive consistency loss using the class vec-
tors obtained from the classification model. Wang et al. (2022a)
added spatial context perturbations to the input on top of model-
level perturbations, resulting in a dual-consistency segmentation
network. Hu et al. (2022) embedded self-attention in the mean
teacher model and encouraged the attention maps to remain con-
sistent across feature layers, forming attention-guided consistency.
In the cross-modal domain, Chen et al. (2022b) used cross-modal
consistency between non-paired CT and MRI to learn modality-
independent information. Ouali et al. (2020) proposed cross-
consistency to enforce consistency between the primary decoder
and auxiliary decoder in decision-making for low-density regions.
To address perturbations between dierent levels of image en-
hancement, Zhong et al. (2021) introduced pixel-wise contrastive
consistency based on label consistency property and contrastive
feature space property between pixels. Inspired by mutual con-
sistency, our approach uses complementary consistency based on
model consistency to enable the main model to learn complemen-
tary information from the auxiliary model.
2.3. Multi-view training
The purpose of multi-view training is to utilize the redundant in-
formation between dierent views to improve learning eciency
(Yan et al., 2021). Chen et al. (2018) constructs three divergent
models and generates pseudo-labels using a voting mechanism.
With a large amount of unlabeled data and the introduction of
noise, this method yields good results. Xia et al. (2020) performs
multi-view collaborative training by rotating the input and using
the uncertainty estimates from each view to obtain accurate seg-
mentation. This method is simple to implement, but requires a
relatively large number of views to achieve ideal performance,
leading to redundant learning. Zheng et al. (2022) splits labeled
data into complementary subsets and trains two models with each
subset, eectively improving the network’s ability to explore am-
biguous areas. Our approach constructs two complementary aux-
iliary models to guide the main model’s attention to ambiguous
areas from two complementary views. Additionally, leveraging
the multi-view information leads to low-entropy predictions.
2.4. Uncertainty estimation
Uncertainty estimation helps reduce the randomness of predic-
tions, and is crucial for learning reliable information from un-
labeled data (Chen et al., 2022a). In (Yu et al., 2019), Monte
Carlo sampling is used to obtain uncertainty maps from the teacher
model, which guides the student model to gradually acquire reli-
able information. Zheng et al. (2022) uses uncertainty maps as
weights for the loss to learn high-confidence information. Wu
et al. (2022) constructs three dierent upsampling decoders and
uses mutual learning to obtain low-uncertainty predictions. Wang
et al. (2022b) uses a triple uncertainty-guided framework, allow-
ing the student model to obtain more reliable knowledge from the
teacher model for all three tasks. Our method reduces uncertainty
2
Figure 1: Comparison of segmentation results of Complementary A, Complementary B, and V-Net after training with 10% labeled data, where Complementary A is
obtained from the second and fourth layers of the V-Net decoder without skip-connection, and Complementary B is obtained from the first and third layers of the V-Net
decoder without skip-connection.
in complementary information through mutual learning between
the complementary auxiliary models and the main model.
3. Materials and Methods
3.1. Dataset and pre-processing
This article evaluates the proposed method using the LA dataset
and the Pancreas-CT dataset.
The LA dataset (Xiong et al., 2021) used in this study is ob-
tained from the 2018 Atrial Segmentation Challenge and consists
of 154 3D LGE-MRIs from 60 diagnosed atrial fibrillation pa-
tients. Each 3D LGE-MRI scan has an isotropic resolution of
0.625×0.625×0.625mm3and spatial dimensions of 576×576×88
or 640 ×640 ×88 pixels. The segmentation labels were manually
obtained by three trained observers and stored in NRRD format.
Since only 100 labeled images were available, this study followed
the settings of (Yu et al., 2019; Li et al., 2020; Luo et al., 2021; Wu
et al., 2022), where the 100 images were split into 80 for training
and 20 for validation. The proposed method’s performance was
compared with other methods using the same validation set.
The Pancreas-CT dataset (Clark et al., 2013) consists of 82 3D
contrast-enhanced CT scans from 53 male and 27 female patients.
In 2020, cases #25 and #70 were found to be duplicates of case
#2 with minor cropping dierences and were removed from the
dataset. The CT scans have a resolution of 512 ×512 pixels, with
pixel sizes and slice thicknesses between 1.5-2.5 millimeters. Fol-
lowing [31], we resampled the voxel sizes to a uniform isotropic
resolution of 1.0×1.0×1.0mm3and truncated the Hounsfield Units
(HU) to the range of [-125, 275]. We used 60 samples for training
and reported the performance on the remaining 20 samples.
3.2. Network overall architecture
The overall architecture of our proposed method is illustrated in
Figure 2. It consists of a main model and two auxiliary models.
The main model employs V-Net (Milletari et al., 2016), and the
encoders of the two auxiliary models are identical to that of the
main model. The decoders of the auxiliary models are constructed
as a complementary symmetric structure using skip connections in
an interleaved manner. Specifically, the second and fourth layers
of the decoder in auxiliary model 1 do not have skip connections,
while the first and third layers of the decoder in auxiliary model
2 do not have skip connections. The segmentation network takes
3D medical images as input. If the input is labeled data, the main
and auxiliary models are supervisedly trained using real labels. If
the input is unlabeled data, the two auxiliary models learn comple-
mentary information of the same input through the complemen-
tary symmetric structure to eectively utilize the unlabeled data.
The probability maps generated by the main model and auxiliary
models are sharpened to obtain pseudo labels. The complemen-
tary information learned by one of the auxiliary models is used to
guide the training of the main model and the other auxiliary model
through the pseudo label. At the same time, the pseudo labels
generated by the main model guide the training of both auxiliary
models. This leads to the formation of a complementary consis-
tency training network (CC-Net) consisting of a main model and
two complementary auxiliary models. It is noteworthy that the two
auxiliary models are only used for training, and the main model is
used for inference, which greatly improves the eciency of infer-
ence. The next two subsections will describe in detail the com-
plementary symmetric structure and complementary consistency
training.
3.3. Complementary symmetric structure
Inspired by (Wang et al., 2022b), this paper proposes a method
that utilizes two complementary auxiliary models to learn comple-
mentary information and eectively utilize unannotated data. As
shown in Figure 3, the main model and auxiliary models adopt in-
dependent encoders. The three encoders have the same structure,
denoted as AM, AAux1 and AAux2 respectively. For the input x, each
encoder has 5 encoding blocks and can produce 5 outputs after 5
encodings.
aAux 1j
i=AAux1
j(AAux1
j1(xi))),
aMj
i=AM
j(AM
j1(xi))),
aAux 2j
i=AAux2
j(AAux2
j1(xi)))
(1)
where aAux1j
irepresents the j-th(0 <j5) result produced by the
i-th data input to the first auxiliary model encoder, aMj
irepresents
the j-th(0 <j5) result produced by the i-th data input to the
main model encoder, and aAux2 j
irepresents the j-th(0 <j5)
result produced by the i-th data input to the second auxiliary model
encoder. AAux1
jrepresents the j-th(0 <j5) encoding block
3
摘要:

Complementaryconsistencysemi-supervisedlearningfor3DleftatrialimagesegmentationHejunHuanga,ZuguoChena,b,,ChaoyangChena,MingLua,YingZouaaSchoolofInformationandElectricalEngineering,HunanUniversityofScienceandTechnology,Xiangtan411201,ChinabShenzhenInstituteofAdvancedTechnology,ChineseAcademyofScienc...

展开>> 收起<<
Complementary consistency semi-supervised learning for 3D left atrial image segmentation_2.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:7.45MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注