Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep Neural Networks Samra Irshada Douglas P.S. Gomesb Seong Tae Kimc

2025-05-08 0 0 3.61MB 21 页 10玖币
侵权投诉
Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep
Neural Networks
Samra Irshada,, Douglas P.S. Gomesb, Seong Tae Kimc
aSwinburne University of Technology, Hawthorn, Australia
bVictoria University, Melbourne, Australia
cKyung Hee University, Yongin-si, Gyeonggi-do, South Korea
Abstract
Background and Objective: Quantitative assessment of the abdominal region from clinically acquired CT scans requires the
simultaneous segmentation of abdominal organs. Therefore, for the past two decades, automatic abdominal image segmentation
has been the subject of intensive research to facilitate the health professionals easing the clinical workflow. Thanks to the
availability of high-performance and powerful computational resources, deep learning-based methods have resulted in state-of-
the-art performance for the segmentation of 3D abdominal CT scans. However, the complex characterization of organs with
fuzzy and weak boundaries prevents the deep learning methods from accurately segmenting these anatomical organs. Specifically,
the voxels on the boundary of organs are more vulnerable to misprediction due to the highly-varying intensity of inter-organ
boundaries, and the misprediction of these voxels is detrimental to overall segmentation performance. This paper investigates
the possibility of improving the abdominal image segmentation performance of the existing 3D encoder-decoder networks by
leveraging organ-boundary prediction as a complementary task.
Method: To address the problem of abdominal multi-organ segmentation, we train the 3D encoder-decoder network to
simultaneously segment the abdominal organs and their corresponding boundaries in CT scans via multi-task learning. The
network is trained end-to-end using a loss function that combines two task-specific losses, i.e., complete organ segmentation loss
and boundary prediction loss. We explore two dierent network topologies based on the extent of weights shared between the
two tasks within a unified multi-task framework. In the first topology, the whole-organ prediction task and the boundary detection
task share all the layers in the encoder-decoder network except for the last task-specific prediction layers. In contrast, the second
topology employs a single shared encoder but two separate task-specific decoders. To evaluate the utilization of complementary
boundary prediction task in improving the abdominal multi-organ segmentation, we use three state-of-the-art encoder-decoder
networks: 3D UNet, 3D UNet++, and 3D Attention-UNet.
Results: The eectiveness of utilizing the organs’ boundary information for abdominal multi-organ segmentation is evaluated on
two publically available abdominal CT datasets: Pancreas-CT and the BTCV dataset. The improvements shown in segmentation
results (evaluated via Dice Score, Average HausdorDistance, Recall, and Precision) reveal the advantage of the multi-task training
that forces the network to pay attention to ambiguous boundaries of organs. A maximum relative improvement of 3.5% and 3.6%
is observed in Mean Dice Score for Pancreas-CT and BTCV datasets, respectively. All source codes are publically available on
https://github.com/samra-irshad/3d-boundary-constrained-networks.
Keywords:
Abdominal multi-organ segmentation, Fully convolutional neural networks, Boundary-constrained segmentation, Multi-task
learning
1. Introduction
Multi-organ segmentation on abdominal Computed
Tomography (CT) scans is an essential prerequisite for
computer-assisted surgery and organ transplantation [1], [2].
Particularly, quantitative assessment of abdominal regions
enables accurate organ dose calculation, required in numerous
Corresponding Author.
Email address: sam.ershad@yahoo.com (Samra Irshad)
radiotherapy treatment options. Erroneous delineation
of abdominal organs prevents harnessing the benefits of
radiotherapeutic advancements. In clinical practice, physicians
delineate abdominal organs using manual segmentation
tools, which are time-consuming, observer-dependent, and
error-prone. With the increased use of imaging facilities and
production of a large number of abdominal CT scans, the
utilization of automated, robust, and ecient organ-delineation
tools has become compulsory [2], [3], [4]. Automatic
segmentation tools delineate the abdominal structures much
Preprint submitted to Elsevier October 11, 2022
arXiv:2210.04285v1 [eess.IV] 9 Oct 2022
faster and overcome the issues like variability in human
expertise and inherent subjectivity.
Abdominal CT scans often present weak inter-organ
boundaries characterized by regions of similar voxel intensities,
which in turn results in low-contrast representations. Such
appearances are usually caused by the representation of
abdominal soft tissues in a narrow band of Hounsfield
(HU) values. Another factor that enhances the already
complex representation of abdominal organs is the existence of
artifacts occurring due to blood flow, respiratory, and cardiac
motion. Accurate delineation of abdominal organs with unclear
boundaries and complex geometrical shapes is one of the
ongoing challenges that hurdles the abdominal-related clinical
diagnosis.
(a) (b) (c)
Fig. 1. Exemplary 2D abdominal CT image showing the visual characteristics
of organs. (a) 2D abdominal image, (b) Abdominal organs annotated on CT
image: pancreas ( ), spleen ( ), liver ( ), stomach ( ), gallbladder (
), (c) 3D multi-organ voxel map.
Earlier methods proposed for the abdominal multi-organ
segmentation mainly were based on multi-atlas [5], [6] or
statistical models [7], [8]. Some methods also made use
of handcrafted or learned features to segment abdominal
organs [9], [10]. However, the recent Fully Convolutional
Network (FCN) based approaches have presented better results
due to the improved organ representation learning [2] [11].
Being able to preserve the image structure and provision of
ecient learning as well as inference, FCN-based methods
are currently considered state-of-the-art for abdominal multi-
organ segmentation [2], [12], [13], [14]. Specifically, these
networks follow the encoder-decoder architectural design [15].
In such networks, the shallow layers in the encoder aim to
extract low-level features, and the deep layers encode high-level
features. While the mirrored-decoder maps back the learned
features to generate an output of the same size as input with skip
connections assisting in retaining the crucial features extracted
in the encoding path [15].
Existing FCN-based methods for abdominal multi-
organ segmentation employ either 2D or 3D convolutional
architectures [13], [12]. 2D methods process the CT scans in a
slice-by-slice fashion and predict the organ labels on individual
slices [13]. Despite being memory- and parameter-ecient,
2D methods are unable to make full use of 3D contextual
information [2]. 3D methods make use of rich volumetric
context by processing the whole CT volume and generating
voxel-maps in a single forward propagation pass, leading
to better abdominal CT segmentation performance than 2D
approaches [16], [17].
The existing 3D methods have primarily focused on
designing better architectures for improved abdominal multi-
organ representation learning [12], [2]. However, they treat all
the anatomical parts within a single organ equally since they
solely rely on voxel-level information and do not specifically
focus on improving the segmentation of voxels in vulnerable
regions/parts of organs. As an example, we highlight some
of the important characteristics of abdominal organs in Fig. 1.
From Figures 1a and 1b, it can be noticed that the adjacent
organs have weak contours which sometimes touch each other.
As an example, observe the low-contrasted and touching
boundaries between stomach ( ) and pancreas ( ).
Moreover, 3D multi-organ visualization in Fig. 1c shows that
the adjacent positioning of organs in the abdominal cavity
aggravates the complex spatial relationship among the organs.
Simultaneously segmenting the abdominal organs with soft
contours and complex spatial relationships is a challenging task.
The boundaries of anatomical regions in medical scans
serve as an important cue for facilitating manual and
automated delineation [18]. Numerous existing deep learning-
based studies leveraged learning of features corresponding to
boundary of regions for improved medical image segmentation
via multitask learning paradigm [19], [20], [21], [22], [23]. In
recent years, deep multitask learning paradigm has been widely
used due to its potential to solve multiple tasks in one forward
propagation and ability to learn better representations because
of the multiple supervisory signals [24], [25]. In this paper,
we propose to improve the segmentation of abdominal organs
on CT scans by enhancing the segmentation of boundary of
organs. Particularly, we train the 3D deep learning networks
to simultaneously predict the boundary and the entire region of
organs. The inclusion of boundary information is motivated
by the fact that the voxels on the boundary of organs are
more vulnerable to misprediction because of their ambiguous
appearance and complex relationship with adjacent organs.
Specifically, our work makes the following contributions:
(i) We develop an end-to-end trainable 3D multi-task learning
framework that simultaneously predicts the voxel-labels of
abdominal organs and their corresponding boundaries. By
integrating the boundary features, our proposed boundary-
constrained 3D deep learning framework focuses on the
accurate prediction of the edges of organs in addition to
whole organs.
(ii) Instead of relying on a single network topology,
we explore and compare two network topologies for
conducting multi-task learning. In the first topology, the
whole encoder-decoder network is shared with separate
task-specific prediction layers at the end for predicting
boundaries and entire organs’ maps. In the second
topology, an encoder is shared with separate task-specific
decoders for decoding the features, jointly learned by
the shared encoder to predict the boundary and organ
probability maps. With an extensive comparison, we
reveal that integration of boundary features invariably
improves the multi-organ segmentation performance,
independent of the multi-task network design.
2
(iii) We utilize three state-of-the-art 3D encoder-decoder
architectures, i.e., UNet [26], UNet++ [27], and Attention-
UNet [28] as baseline networks for evaluating the eect
of incorporating boundary information. We modifiy each
baseline architecture according to our proposed multi-
task topologies. We demonstrate significant performance
improvements with a negligible increase in trainable
parameters.
(iv) We validate the performance of baseline and counterpart
boundary-constrained models on two publically available
datasets (Pancreas-CT [29] and BTCV [30]) using Dice
Score, Average HausdorDistance, Recall, and Precision.
Furthermore, we conduct additional experiments to
evaluate the improvement in the segmentation of
regions around the boundaries. The results show
that the boundary-constrained networks learn feature
representations that focus on the accurate organs
segmentation and the challenging parts around the border
of the organs.
The rest of the article is organized as follows. In section 2,
we review the existing methods for abdominal multi-organ
segmentation. Section 3 describes our framework for
incorporating the boundary information into the 3D fully
convolutional networks, including the multi-task loss function
and the details of boundary-constrained network topologies.
Next, we describe the dataset specifications and implementation
details in section 4. We then present the experimental results,
comparisons with existing single-task approaches, and in-
depth performance analysis of boundary-constrained models in
section 5. Finally, we discuss the important highlights and
some directions for future work in section 6 and present the
conclusion in section 7.
2. Related Work
Segmentation of anatomical structures from abdominal scans
is a prerequisite for various high-level CT-based clinical
applications. Existing computerized tools for abdominal
image segmentation are either based on deep learning or
non-deep learning methods. In this section, we first briefly
discuss the non-deep learning methods (section 2.1) and then
present a review of deep learning-based methods for abdominal
multi-organ segmentation (section 2.2). We conclude this
section with a discussion on multi-task deep neural networks
being employed for complementary boundary learning task to
improve medical image segmentation (section 2.3).
2.1. Non-deep learning-based abdominal organs segmentation
Earlier methods proposed for abdominal multi-organ
segmentation have primarily utilized registration-based
approaches [7], [8]. Among the registration-based approaches,
the widely used ones include statistical shape models [7],
[8] and multi-atlas label fusion techniques [5], [6]. The
development of statistical models requires registration of
training images for estimating the shape or appearance of
anatomical organs followed by fitting constructed models to
test images for generating segmentations [31], [32]. Multi
atlas-based methods utilize an atlas created using multiple
labelled images in the training set, and the test image is
segmented by propagating the reference segmentations.
Atlases are constructed by capturing the prior anatomical
knowledge relevant to target organs. However, it is dicult to
build an adequate model to capture the large variability of the
deformable organs with limited data [33]. Furthermore, the
performance of both these approaches is restricted by image
registration accuracy.
Registration-free approaches train a classifier using either
handcrafted or learned features to segment abdominal images
[9]. Extraction of robust and deformation-invariant features
relies on expert knowledge about abdominal organs [34].
Having the ability to learn the features automatically, FCN-
based methods, have rapidly replaced the traditional solutions
that require image registration or handcrafted features and have
shown improved performance for abdominal CT segmentation
[2], [12], [13], [35].
2.2. Fully Convolutional Networks for abdominal multi-organ
segmentation
In recent years, Fully Convolution Network (FCN) and its
variants (e.g., UNet [15]) have become a common choice
for medical image segmentation. This dominancy can be
attributed to their ability to learn eective task representations
and ecient inference. UNet has an encoder-decoder style
architecture and consists of skip connections, joining the
encoding and decoding layers on the same level. Despite
being trained from scratch, UNet demonstrated state-of-the-
art performance for various medical image segmentation tasks
[36], [37]. Built on top of UNet, several other modified
architectures were subsequently proposed, e.g., UNet++ [27],
Attention-UNet [28], etc.
Existing deep learning-based studies for abdominal multi-
organ segmentation have utilized 2D or 3D convolutional
networks. 2D methods are less parameter-intensive;
however, they cannot exploit the 3D contextual information
and eventually provide sub-accurate organ-delineation
performance. 3D convolutional networks are facilitated with
3D convolutions, 3D pooling, and 3D normalization to exploit
the rich volumetric context and generate dense voxel-wise
predictions [26]. Advances in ecient 3D convolutional
implementation and increased GPU memory have enabled
the adoption of 3D convolutional models for abdominal
multi-organ segmentation [38], [3].
Roth et al. [16] proposed a cascaded architecture based on
two 3D UNets where the first UNet is trained to separate the
abdominal area from the background, and the latter utilized
the output from the first UNet to simultaneously segment the
abdominal organs. Peng et al. [39] delineated abdominal organs
using 3D UNet with residual-learning based units (ResNets) to
calculate patient-specific CT organ dose. In another study [2],
abdominal organs are segmented using a 3D FCN with dilated
convolutions based densely connected units. Heinrich et al.
[11] leveraged 3D deformable convolutions to spatially adapt
3
(a) (b)
Fig. 2. Multi-task topologies of 3D boundary-constrained network. (a) Multi-task topology with shared encoder-decoder network and task-specific prediction layers,
and (b) Multi-task topology with shared encoder and task-specific decoders.
the receptive field for abdominal multi-organ segmentation.
In [40], abdominal scans were segmented using a 3D deeply
supervised patch-based UNet with grid-based attention gates
to encourage the network to focus on useful salient features
propagated through the skip connections. Some existing
methods have employed post-processing steps, including level-
sets [3] and graph-cut [4] to refine initial segmentation maps
obtained from 3D deep convolutional networks.
Through the eorts mentioned above, the existing 3D
methods have mostly emphasized developing better deep
learning architectures and did not attempt to improve the
segmentation of challenging parts of abdominal organs, e.g.,
voxels that belong to the contour of organs and regions within
the vicinity of organ-contour. The fuzzy appearance of the
boundary of organs and low contrast between the adjacent
abdominal structures makes the voxels belonging to these
regions more susceptible to wrong label prediction.
2.3. Boundary-constrained medical image segmentation
Several existing deep learning-based medical image
segmentation methods have utilized the boundary information
of regions of interest to overcome the misprediction of
boundary pixels [19], [20], [21], [41]. In these methods,
the networks are trained in a multi-task learning fashion to
simultaneously predict the probability maps of entire organs
and their corresponding boundaries. Most of these methods
have resorted to the hard-parameter sharing technique, where
a single network contains shared and task-specific parameters
and is jointly trained to solve multiple tasks.
Chen et al. [19] segmented the glands and their
corresponding boundaries via multi-task training. By training
the model to learn the co-representations, the model achieved
better gland segmentation performance than the single-task
models. In [42], a dual-decoder-based network is presented that
simultaneously detects the boundaries and predicts the semantic
labels of cells. Features from the boundary-decoding path
were concatenated with those learned in the entire cell region
decoding path via additional skip connections. This led to the
improved histopathological image segmentation performance.
In [43], boundary and distance maps were used for improved
polyp and optic disk segmentation, respectively. Tan et al. [20]
proposed a multi-task medical image segmentation network
consisting of a single encoder and separate dedicated arms for
decoding regions and boundaries. The study was evaluated on
numerous applications, including MR femur and CT kidney
segmentation. Zhang et al., [44] presented a edge-based deeply
supervised network for predicting the regions of interest and
their corresponding boundaries. The method was validated for
retinal, x-ray, and CT image segmentation. Wang et al. [45]
proposed a two-parallel stream model in which each of the two
streams was trained to segment region and detect boundary
followed by fusion of contour and region prediction maps.
Lee et al. [41] proposed a framework that predicts boundary
keypoint maps and makes use of adversarial loss for improved
boundary preserving in medical image segmentation.
Given the challenge presented by voxels on the organs’
boundaries and the evidence in the literature that focusing
on boundaries is beneficial for performance, we integrate
the organs boundary prediction as an auxiliary task into the
training of state-of-the-art 3D medical image segmentation
networks. Since the design choice of network topology impacts
the learning process, we explore two multi-task network
designs and analyze their performance. The boundary co-
training resulted in improved performance on abdominal CT
segmentation tasks compared to the several state-of-the-art 3D
fully convolutional baseline architectures.
3. Proposed Method
In this section, we first describe the boundary-constrained
loss for training the 3D encoder-decoder network to
simultaneously predict the boundaries and entire abdominal
organ regions via multi-task learning (Section 3.1), followed
by an exhibition of our proposed multi-task network topologies
(Section 3.2). After that, we discuss the architecture of the 3D
networks that we have as baselines in our work (Section 3.3).
Finally, we present the architectural design of the counterpart
3D boundary-constrained models (Section 3.4).
4
3.1. Boundary-Constrained Loss
Consider a 3D encoder-decoder network trained to predict
the voxel labels of the abdominal CT scan with W×H×Z
dimensions, where W,H, and Zdenote the length, width,
and depth of the scan, respectively. Such a network takes
an abdominal multi-organ CT scan as an input and outputs a
labelled voxel map of the same size as the input. To utilize
the boundary information of abdominal organs for improved
representation learning, we train the network to predict the 3D
organ-semantic masks and 3D organ-boundaries in one forward
propagation pass. We formulate this problem using a multi-
task learning paradigm where multiple tasks are learned jointly
using shared and task-specific representations. The loss Lfor
this multi-task learning problem is a weighted combination
of per-task losses, organ segmentation loss LRS and organ
boundary detection loss LBD. We use multi-class dice loss [46]
for evaluating the performance of the multi-organ segmentation
task, given as
LRS =
C1
X
c=0
2 (ˆyi,c×yi,c)
ˆy2
i,c+y2
i,c
(1)
where, ˆyi,cand yi,cdenote the 3D multi-organ probability map
and ground-truth mask, respectively, of the ith abdominal CT
scan. Cdenotes the number of organ classes.
ˆyi=ˆp(xi;θs) (2)
ˆp(xi;θs)=
N1
X
n=0
ˆp(xi,n;θs)=
W1
X
w=0
H1
X
h=0
Z1
X
z=0
ˆp(xi,w,h,z;θs) (3)
where ˆp(xi,n) represents the label probability of nth voxel in ith
scan and Nrefers to the total number of voxels in a scan.
To evaluate the model’s performance in predicting the
boundaries, we use binary cross-entropy loss (shown in Eq.
Eq. 4). Binary cross-entropy loss for predicting 3D boundaries
is given as
LBD =
N1
X
n=0
ei,nlog(ˆei,n)+(1 ei,n) log(1 ˆei,n)
=
W1
X
w=0
H1
X
h=0
Z1
X
z=0
p(xi,w,h,z;θs) log( ˆp(xi,w,h,z;θs))
+(1 p(xi,w,h,z;θs)) log(1 ˆp(xi,w,h,z;θs))
(4)
ˆeiand eirepresent the edge probability map and the
corresponding ground-truth. ˆp(xi,w,h,z) represents the edge
probability of the nth voxel in ith scan. θsrepresents the weights
of the entire deep multi-task encoder-decoder network.
The combined total loss Lis minimized with respect to the
parameters θs, as shown in Eq. 5. Thus our goal is to evaluate
if a network can learn more robust features and subsequently
produce improved organ segmentations by being trained to
explicitly recognize the boundaries.
L(θs)=
M
X
i=1
LRS +λ
M
X
i=1
LBD (5)
Mand λrepresents the total number of CT scans in the training
set and the weight assigned to the edge detection loss in Eq. 5,
respectively.
We hypothesize that the additional boundary loss (LBD)
would impose a larger penalty on erroneous contour voxels,
and it subsequently pushes the optimization of the segmentation
network towards the solutions with more accurate boundaries.
Thus, one would potentialize the ability of a boundary-
constrained network to extract features that account for the
semantic abdominal organ regions and boundaries.
3.2. Boundary-Constrained Network Topologies
Multi-task learning is generally formulated via hard-
parameter sharing and soft-parameter sharing. In the hard-
parameter sharing paradigm, multiple tasks share a subset of
jointly optimized parameters, whereas task-specific parameters
are optimized separately. In soft-parameter sharing, each
task is parameterized using its own set of parameters which
are jointly regularized using constraints [47]. In practice,
hard-parameter sharing approaches incur much less parameter
and computational cost. In our work, we formulate the
multi-task learning problem via hard-parameter sharing to
train the encoder-decoder network to do multiple tasks, i.e.,
organ segmentation and boundary detection. For deep neural
networks, the hard-parameter sharing approach is realized by
sharing some network layers between the tasks while keeping
some layers task-specific.
We explore two dierent network topologies to conduct
multi-task training, as shown in Figures 2a and 2b. The
motivation to explore multiple topologies is to investigate the
impact of sharing the larger and smaller number of parameters
in the network between the two tasks. We explain these multi-
task topologies below.
3.2.1. Task-Specific Output Layers (TSOL)
The first multi-task topology that we explore is formulated
by appending two separate prediction layers for predicting the
boundaries and semantic organ masks. This topology employs
an encoder-decoder network whose weights are shared between
the tasks, except for the last output layers, as shown in Fig. 2a.
Technically, it encourages the use of compact and tightly shared
feature representations. As evident, this configuration has
negligibly fewer more parameters than the single-task network.
We denote this configuration as TSOL.
3.2.2. Task-Specific Decoders (TSD)
In second mutli-task topology, we modify the 3D encoder-
decoder model to have a single shared encoder but two
separate decoding arms for predicting the semantic regions
and boundaries. The sibling-decoding arms upsample the
region and boundary maps separately. This type of formulation
ensures sparse representation sharing amongst the two tasks
since decoders have been parameterized separately, as shown in
Fig. 2b. The presence of two synthesis paths results in having
significantly more parameters than its counterpart single-task
network. We refer to this configuration as TSD.
5
摘要:

ImprovedAbdominalMulti-OrganSegmentationvia3DBoundary-ConstrainedDeepNeuralNetworksSamraIrshada,,DouglasP.S.Gomesb,SeongTaeKimcaSwinburneUniversityofTechnology,Hawthorn,AustraliabVictoriaUniversity,Melbourne,AustraliacKyungHeeUniversity,Yongin-si,Gyeonggi-do,SouthKoreaAbstractBackgroundandObjective...

展开>> 收起<<
Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep Neural Networks Samra Irshada Douglas P.S. Gomesb Seong Tae Kimc.pdf

共21页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:21 页 大小:3.61MB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 21
客服
关注