Diversity-Promoting Ensemble for Medical Image Segmentation

2025-05-03 0 0 1.54MB 8 页 10玖币
侵权投诉
Diversity-Promoting Ensemble for Medical Image Segmentation
Mariana-Iuliana Georgescu
University of Bucharest
Romania
Radu Tudor Ionescu
University of Bucharest
Romania
raducu.ionescu@gmail.com
Andreea-Iuliana Miron
“Carol Davila” University of Medicine
and Pharmacy, Colţea Hospital
Romania
ABSTRACT
Medical image segmentation is an actively studied task in med-
ical imaging, where the precision of the annotations is of utter
importance towards accurate diagnosis and treatment. In recent
years, the task has been approached with various deep learning
systems, among the most popular models being U-Net. In this work,
we propose a novel strategy to generate ensembles of dierent
architectures for medical image segmentation, by leveraging the
diversity (decorrelation) of the models forming the ensemble. More
specically, we utilize the Dice score among model pairs to esti-
mate the correlation between the outputs of the two models forming
each pair. To promote diversity, we select models with low Dice
scores among each other. We carry out gastro-intestinal tract image
segmentation experiments to compare our diversity-promoting en-
semble (DiPE) with another strategy to create ensembles based on
selecting the top scoring U-Net models. Our empirical results show
that DiPE surpasses both individual models as well as the ensemble
creation strategy based on selecting the top scoring models.
CCS CONCEPTS
Computing methodologies
Supervised learning;Image pro-
cessing;Image segmentation;
Applied computing
Health in-
formatics;
KEYWORDS
medical imaging; medical image segmentation; model ensemble;
neural network ensemble; deep learning; neural networks; voting-
based ensemble; plurality voting.
ACM Reference Format:
Mariana-Iuliana Georgescu, Radu Tudor Ionescu, and Andreea-Iuliana Miron.
2023. Diversity-Promoting Ensemble for Medical Image Segmentation. In
The 37th ACM/SIGAPP Symposium on Applied Computing (SAC ’23), March
27-April 2, 2023, Tallinn, Estonia. ACM, New York, NY, USA, Article 12927.99,
8 pages. https://doi.org/10.1145/3555776.3577682
1 INTRODUCTION
Physicians extensively use medical imaging techniques, e.g. Com-
puted Tomography (CT), Magnetic Resonance Imaging (MRI) and
Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
SAC ’23, March 27-April 2, 2023, Tallinn, Estonia
©2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9517-5/23/03. . . $15.00
https://doi.org/10.1145/3555776.3577682
Optical Coherence Tomography (OCT) [
28
], as one of the least inva-
sive investigation alternatives to diagnose lesions inside the human
body. Segmenting (delimiting) regions of interest, such as organs
or tumors, is often required for precise diagnosis and treatment.
For example, a precise segmentation of a malignant tumor can lead
to an accurate calibration of the radiation dosage in radiotherapy
[
14
,
20
,
25
,
29
]. In recent years, the medical image segmentation task
has been approached with various deep learning systems, ranging
from convolutional neural networks (CNNs) [
1
,
22
,
25
] to trans-
formers [
4
,
8
,
11
]. Among these, U-Net [
22
] remains one of the most
popular methods. Although U-Net was introduced in 2015, it con-
sistently received updates [
1
,
21
,
26
,
35
], keeping its performance
at a competitive level. However, using a single neural network to
perform segmentation is not always the best solution. Indeed, con-
structing ensembles of multiple neural networks is an extensively
validated method [2, 7, 19, 33] to boost accuracy.
Since the precision of the medical image segmentation output
is of utter importance towards accurate diagnosis and treatment,
we focus on combining multiple U-Net architectures to address
the task. We conjecture that decorrelated models lead to a supe-
rior ensemble, since decorrelated models can better complement
each other’s decisions. To this end, we propose a novel strategy
to construct ensembles of dierent models for medical image seg-
mentation by promoting the diversity (decorrelation) of the models
comprising the ensemble, while also giving equal importance to ac-
curacy. To measure the correlation among two models, we compute
the Dice score between the outputs of the respective models. We
then construct the ensemble in a bottom-up fashion, starting from
the best model and gradually adding the least correlated models
with those already included, one by one. At the same time, our
ensemble creation strategy assigns equal importance to the perfor-
mance level of the model to be added at each step. Since we select
models with lower Dice scores at each step, our strategy promotes
the diversity among the models comprising the ensemble, hence
bearing the name Diversity-Promoting Ensemble (DiPE).
We conduct image segmentation experiments on the gastro-
intestinal tract data set provided by the UW-Madison Carbone
Cancer Center [
17
]. We evaluate nine individual U-Net models
based on three dierent backbones (ResNet-34 [
12
], EcientNet-B0
[
27
], EcientNet-B1 [
27
]) with or without multi-head convolutional
attention [
9
]. Along with the individual models, we evaluate two
strategies to create voting-based ensembles, namely
(𝑖)
a baseline
(conventional) strategy selecting the top scoring models and
(𝑖𝑖)
our strategy promoting diversity among the selected models. The
empirical results indicate that our strategy, DiPE, outperforms both
individual models, as well as the baseline ensemble.
In summary, our contribution is twofold:
We introduce a diversity-promoting strategy to create an
ensemble of medical image segmentation models that are
arXiv:2210.12388v2 [eess.IV] 21 Dec 2022
SAC ’23, March 27-April 2, 2023, Tallinn, Estonia Mariana-Iuliana Georgescu, Radu Tudor Ionescu, and Andreea-Iuliana Miron
low correlated among each other, by leveraging the Dice
score between the outputs of various models.
We provide empirical evidence showing that our diversity-
promoting ensemble leads to superior performance levels
compared with individual models and the conventional strat-
egy selecting the top scoring models.
2 RELATED WORK
Medical image segmentation can be divided into two tasks, with
respect to the input image. Indeed, there are works that tackle the
segmentation task on 2D images [
18
,
22
,
29
,
32
], while others rely
on 3D images [
2
,
3
,
5
,
10
,
14
,
15
,
18
,
19
,
25
,
30
,
32
]. The works using
2D images as input naturally produce 2D slices as output, while
the works using entire 3D volumes as input produce 3D volumes
as output.
Perhaps the most popular architecture for 2D segmentation is U-
Net [
22
]. U-Net is a fully convolutional (conv) network designed for
medical image segmentation. The architecture follows a “U” shape
and is composed of a contracting and an expansive path. Each step
of the expansive path is composed of an upsampling operation, a
convolution layer which halves the number of feature maps, and a
concatenation with the corresponding cropped feature maps from
the contracting path. Seo et al. [
25
] proposed the mU-Net model,
a modied version of the U-Net architecture. mU-Net [
25
] adds a
residual path to the deconvolution operations, and an additional
convolutional layer to the skip connections in order to extract high-
level global features of small objects.
Chen et al. [
3
] proposed the voxel-wise residual network (VoxRes-
Net), a 3D CNN formed of 25 layers with residual connections.
Multimodal and multi-level contextual information is introduced
into the VoxResNet model. The multimodal information is added
by concatenating multimodal data before giving it as input to the
model. To improve the 3D segmentation performance of brain le-
sions, Kamnitsas et al. [
15
] employed a 3D CNN comprising 11
layers with parallel convolutional pathways for multi-scale process-
ing. Rather than modifying the layers of their architecture, Zhao et
al. [
32
] inserted a lesion-related spatial attention mechanism into
the network.
In order to help physicians obtain better segmentation results,
Luo et al. [
18
] proposed interactive segmentation to further improve
the performance of CNN models, even to unseen objects.
Closer to our study, the work of Gibson et al. [
10
] shares the same
target task, being focused on multi-organ abdominal segmentation.
Gibson et al. [
10
] presented a registration-free approach based on
Dense V-Networks for multi-organ abdominal segmentation of 3D
images. They also proposed a batch-wise spatial dropout to lower
the memory usage and processing time of dropout.
Dierent from the aforementioned works, which are trained
in a fully-supervised learning setting, there are several works [
6
,
34
] proposing weakly-supervised learning frameworks. Zhou et
al. [
34
] found that data sets having only one organ annotated as the
positive class, leaving the other organs as part of the background,
attain misleading results in multi-organ segmentation, since the
background class contains many organs. In order to alleviate this
problem, Zhou et al. [
34
] proposed a prior-aware neural network,
incorporating anatomical priors on abdominal organ sizes into the
training objective.
Similar to our approach proposing an ensemble of multiple net-
works to improve the segmentation results, Lyksborg [
19
] proposed
to use a model for each of the axial, sagittal and coronal planes, fus-
ing the corresponding segmentations into a single 3D segmentation.
Baldeon et al. [
2
] proposed AdaEn-Net, an ensemble of networks
that boosts the segmentation performance. AdaEn-Net [
2
] rstly
employs an ensemble of 2D and 3D models to predict the output
segmentation. Then, it trains the 2D-3D ensemble on
𝑘
-folds, ob-
taining
𝑘
models. The nal segmentation mask is the average of
the 𝑘models forming the nal ensemble.
Dierent from previous works, such as [
2
,
19
], which directly
combined models into ensembles without taking into account their
output correlation, we propose a novel ensemble creation algorithm
which promotes the diversity among the models comprising the
ensemble.
3 METHOD
3.1 Neural Architectures
To address our medical image segmentation task, we employ the
well known U-Net architecture [
22
]. The U-Net architecture is a
fully convolutional network that belongs to the family of encoder-
decoder neural networks. In the encoding part, the spatial informa-
tion is downsampled through convolution and pooling operations.
In the decoding part, the spatial information is upsampled back to
the original size via convolution transpose. High-resolution fea-
tures from the encoder are passed through skip connections and
concatenated to the corresponding features from the decoder, thus
infusing high-resolution information into the decoder. The intro-
duction of skip connections gives the network its “U” shape. We
further present our changes to the U-Net model, leading to a total
of nine distinct model variants forming the basis of our ensemble.
3.1.1 Backbone Variations. To build an ensemble of a diverse set of
models, we rst introduce variations in terms of the backbone archi-
tecture. Therefore, we try the following three encoder architectures:
ResNet-34 [12], EcientNet-B0 [27], and EcientNet-B1 [27]. We
choose ResNet-34 due to its fairly good trade-o between running
time and accuracy level. The reason behind adding EcientNet-B0
and EcientNet-B1 into our study is the superior performance
levels of these models compared to ResNet-34.
The residual network (ResNet) architecture was proposed by
He et al. [
12
]. ResNet models are composed of residual blocks. A
residual block consists of a few stacked conv layers and a skip
connection from the rst layer to the last layer of the block. Skip
connections allow the training of very deep neural networks, alle-
viating the vanishing gradient problem. He et al. [
12
] proposed ve
ResNet variants of dierent depth, namely ResNet-18, ResNet-34,
ResNet-50, ResNet-101 and ResNet-152. Among these, we select
ResNet-34 to serve as backbone for some of our U-Net models.
The EcientNet architecture was introduced by Tan et al. [
27
]
to eciently scale convolutional neural networks. Tan et al. [
27
]
demonstrated that, in order to obtain better performance under
a certain computational budget, all three components of the net-
work, namely the depth, the width and the resolution, should be
摘要:

Diversity-PromotingEnsembleforMedicalImageSegmentationMariana-IulianaGeorgescuUniversityofBucharestRomaniaRaduTudorIonescu∗UniversityofBucharestRomaniaraducu.ionescu@gmail.comAndreea-IulianaMiron“CarolDavila”UniversityofMedicineandPharmacy,ColţeaHospitalRomaniaABSTRACTMedicalimagesegmentationisanact...

展开>> 收起<<
Diversity-Promoting Ensemble for Medical Image Segmentation.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.54MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注