Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep Neural Networks Samra Irshada Douglas P.S. Gomesb Seong Tae Kimc

2025-05-08 0 0 3.61MB 21 页 10玖币

侵权投诉

Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep

Neural Networks

Samra Irshada,∗, Douglas P.S. Gomesb, Seong Tae Kimc

aSwinburne University of Technology, Hawthorn, Australia

bVictoria University, Melbourne, Australia

cKyung Hee University, Yongin-si, Gyeonggi-do, South Korea

Abstract

Background and Objective: Quantitative assessment of the abdominal region from clinically acquired CT scans requires the

simultaneous segmentation of abdominal organs. Therefore, for the past two decades, automatic abdominal image segmentation

has been the subject of intensive research to facilitate the health professionals easing the clinical workﬂow. Thanks to the

availability of high-performance and powerful computational resources, deep learning-based methods have resulted in state-of-

the-art performance for the segmentation of 3D abdominal CT scans. However, the complex characterization of organs with

fuzzy and weak boundaries prevents the deep learning methods from accurately segmenting these anatomical organs. Speciﬁcally,

the voxels on the boundary of organs are more vulnerable to misprediction due to the highly-varying intensity of inter-organ

boundaries, and the misprediction of these voxels is detrimental to overall segmentation performance. This paper investigates

the possibility of improving the abdominal image segmentation performance of the existing 3D encoder-decoder networks by

leveraging organ-boundary prediction as a complementary task.

Method: To address the problem of abdominal multi-organ segmentation, we train the 3D encoder-decoder network to

simultaneously segment the abdominal organs and their corresponding boundaries in CT scans via multi-task learning. The

network is trained end-to-end using a loss function that combines two task-speciﬁc losses, i.e., complete organ segmentation loss

and boundary prediction loss. We explore two diﬀerent network topologies based on the extent of weights shared between the

two tasks within a uniﬁed multi-task framework. In the ﬁrst topology, the whole-organ prediction task and the boundary detection

task share all the layers in the encoder-decoder network except for the last task-speciﬁc prediction layers. In contrast, the second

topology employs a single shared encoder but two separate task-speciﬁc decoders. To evaluate the utilization of complementary

boundary prediction task in improving the abdominal multi-organ segmentation, we use three state-of-the-art encoder-decoder

networks: 3D UNet, 3D UNet++, and 3D Attention-UNet.

Results: The eﬀectiveness of utilizing the organs’ boundary information for abdominal multi-organ segmentation is evaluated on

two publically available abdominal CT datasets: Pancreas-CT and the BTCV dataset. The improvements shown in segmentation

results (evaluated via Dice Score, Average HausdorﬀDistance, Recall, and Precision) reveal the advantage of the multi-task training

that forces the network to pay attention to ambiguous boundaries of organs. A maximum relative improvement of 3.5% and 3.6%

is observed in Mean Dice Score for Pancreas-CT and BTCV datasets, respectively. All source codes are publically available on

https://github.com/samra-irshad/3d-boundary-constrained-networks.

Keywords:

Abdominal multi-organ segmentation, Fully convolutional neural networks, Boundary-constrained segmentation, Multi-task

learning

1. Introduction

Multi-organ segmentation on abdominal Computed

Tomography (CT) scans is an essential prerequisite for

computer-assisted surgery and organ transplantation [1], [2].

Particularly, quantitative assessment of abdominal regions

enables accurate organ dose calculation, required in numerous

∗Corresponding Author.

Email address: sam.ershad@yahoo.com (Samra Irshad)

radiotherapy treatment options. Erroneous delineation

of abdominal organs prevents harnessing the beneﬁts of

radiotherapeutic advancements. In clinical practice, physicians

delineate abdominal organs using manual segmentation

tools, which are time-consuming, observer-dependent, and

error-prone. With the increased use of imaging facilities and

production of a large number of abdominal CT scans, the

utilization of automated, robust, and eﬃcient organ-delineation

tools has become compulsory [2], [3], [4]. Automatic

segmentation tools delineate the abdominal structures much

Preprint submitted to Elsevier October 11, 2022

arXiv:2210.04285v1 [eess.IV] 9 Oct 2022

faster and overcome the issues like variability in human

expertise and inherent subjectivity.

Abdominal CT scans often present weak inter-organ

boundaries characterized by regions of similar voxel intensities,

which in turn results in low-contrast representations. Such

appearances are usually caused by the representation of

abdominal soft tissues in a narrow band of Hounsﬁeld

(HU) values. Another factor that enhances the already

complex representation of abdominal organs is the existence of

artifacts occurring due to blood ﬂow, respiratory, and cardiac

motion. Accurate delineation of abdominal organs with unclear

boundaries and complex geometrical shapes is one of the

ongoing challenges that hurdles the abdominal-related clinical

diagnosis.

(a) (b) (c)

Fig. 1. Exemplary 2D abdominal CT image showing the visual characteristics

of organs. (a) 2D abdominal image, (b) Abdominal organs annotated on CT

image: pancreas ( ), spleen ( ), liver ( ), stomach ( ), gallbladder (

), (c) 3D multi-organ voxel map.

Earlier methods proposed for the abdominal multi-organ

segmentation mainly were based on multi-atlas [5], [6] or

statistical models [7], [8]. Some methods also made use

of handcrafted or learned features to segment abdominal

organs [9], [10]. However, the recent Fully Convolutional

Network (FCN) based approaches have presented better results

due to the improved organ representation learning [2] [11].

Being able to preserve the image structure and provision of

eﬃcient learning as well as inference, FCN-based methods

are currently considered state-of-the-art for abdominal multi-

organ segmentation [2], [12], [13], [14]. Speciﬁcally, these

networks follow the encoder-decoder architectural design [15].

In such networks, the shallow layers in the encoder aim to

extract low-level features, and the deep layers encode high-level

features. While the mirrored-decoder maps back the learned

features to generate an output of the same size as input with skip

connections assisting in retaining the crucial features extracted

in the encoding path [15].

Existing FCN-based methods for abdominal multi-

organ segmentation employ either 2D or 3D convolutional

architectures [13], [12]. 2D methods process the CT scans in a

slice-by-slice fashion and predict the organ labels on individual

slices [13]. Despite being memory- and parameter-eﬃcient,

2D methods are unable to make full use of 3D contextual

information [2]. 3D methods make use of rich volumetric

context by processing the whole CT volume and generating

voxel-maps in a single forward propagation pass, leading

to better abdominal CT segmentation performance than 2D

approaches [16], [17].

The existing 3D methods have primarily focused on

designing better architectures for improved abdominal multi-

organ representation learning [12], [2]. However, they treat all

the anatomical parts within a single organ equally since they

solely rely on voxel-level information and do not speciﬁcally

focus on improving the segmentation of voxels in vulnerable

regions/parts of organs. As an example, we highlight some

of the important characteristics of abdominal organs in Fig. 1.

From Figures 1a and 1b, it can be noticed that the adjacent

organs have weak contours which sometimes touch each other.

As an example, observe the low-contrasted and touching

boundaries between stomach ( ) and pancreas ( ).

Moreover, 3D multi-organ visualization in Fig. 1c shows that

the adjacent positioning of organs in the abdominal cavity

aggravates the complex spatial relationship among the organs.

Simultaneously segmenting the abdominal organs with soft

contours and complex spatial relationships is a challenging task.

The boundaries of anatomical regions in medical scans

serve as an important cue for facilitating manual and

automated delineation [18]. Numerous existing deep learning-

based studies leveraged learning of features corresponding to

boundary of regions for improved medical image segmentation

via multitask learning paradigm [19], [20], [21], [22], [23]. In

recent years, deep multitask learning paradigm has been widely

used due to its potential to solve multiple tasks in one forward

propagation and ability to learn better representations because

of the multiple supervisory signals [24], [25]. In this paper,

we propose to improve the segmentation of abdominal organs

on CT scans by enhancing the segmentation of boundary of

organs. Particularly, we train the 3D deep learning networks

to simultaneously predict the boundary and the entire region of

organs. The inclusion of boundary information is motivated

by the fact that the voxels on the boundary of organs are

more vulnerable to misprediction because of their ambiguous

appearance and complex relationship with adjacent organs.

Speciﬁcally, our work makes the following contributions:

(i) We develop an end-to-end trainable 3D multi-task learning

framework that simultaneously predicts the voxel-labels of

abdominal organs and their corresponding boundaries. By

integrating the boundary features, our proposed boundary-

constrained 3D deep learning framework focuses on the

accurate prediction of the edges of organs in addition to

whole organs.

(ii) Instead of relying on a single network topology,

we explore and compare two network topologies for

conducting multi-task learning. In the ﬁrst topology, the

whole encoder-decoder network is shared with separate

task-speciﬁc prediction layers at the end for predicting

boundaries and entire organs’ maps. In the second

topology, an encoder is shared with separate task-speciﬁc

decoders for decoding the features, jointly learned by

the shared encoder to predict the boundary and organ

probability maps. With an extensive comparison, we

reveal that integration of boundary features invariably

improves the multi-organ segmentation performance,

independent of the multi-task network design.

(iii) We utilize three state-of-the-art 3D encoder-decoder

architectures, i.e., UNet [26], UNet++ [27], and Attention-

UNet [28] as baseline networks for evaluating the eﬀect

of incorporating boundary information. We modiﬁy each

baseline architecture according to our proposed multi-

task topologies. We demonstrate signiﬁcant performance

improvements with a negligible increase in trainable

parameters.

(iv) We validate the performance of baseline and counterpart

boundary-constrained models on two publically available

datasets (Pancreas-CT [29] and BTCV [30]) using Dice

Score, Average HausdorﬀDistance, Recall, and Precision.

Furthermore, we conduct additional experiments to

evaluate the improvement in the segmentation of

regions around the boundaries. The results show

that the boundary-constrained networks learn feature

representations that focus on the accurate organs

segmentation and the challenging parts around the border

of the organs.

The rest of the article is organized as follows. In section 2,

we review the existing methods for abdominal multi-organ

segmentation. Section 3 describes our framework for

incorporating the boundary information into the 3D fully

convolutional networks, including the multi-task loss function

and the details of boundary-constrained network topologies.

Next, we describe the dataset speciﬁcations and implementation

details in section 4. We then present the experimental results,

comparisons with existing single-task approaches, and in-

depth performance analysis of boundary-constrained models in

section 5. Finally, we discuss the important highlights and

some directions for future work in section 6 and present the

conclusion in section 7.

2. Related Work

Segmentation of anatomical structures from abdominal scans

is a prerequisite for various high-level CT-based clinical

applications. Existing computerized tools for abdominal

image segmentation are either based on deep learning or

non-deep learning methods. In this section, we ﬁrst brieﬂy

discuss the non-deep learning methods (section 2.1) and then

present a review of deep learning-based methods for abdominal

multi-organ segmentation (section 2.2). We conclude this

section with a discussion on multi-task deep neural networks

being employed for complementary boundary learning task to

improve medical image segmentation (section 2.3).

2.1. Non-deep learning-based abdominal organs segmentation

Earlier methods proposed for abdominal multi-organ

segmentation have primarily utilized registration-based

approaches [7], [8]. Among the registration-based approaches,

the widely used ones include statistical shape models [7],

[8] and multi-atlas label fusion techniques [5], [6]. The

development of statistical models requires registration of

training images for estimating the shape or appearance of

anatomical organs followed by ﬁtting constructed models to

test images for generating segmentations [31], [32]. Multi

atlas-based methods utilize an atlas created using multiple

labelled images in the training set, and the test image is

segmented by propagating the reference segmentations.

Atlases are constructed by capturing the prior anatomical

knowledge relevant to target organs. However, it is diﬃcult to

build an adequate model to capture the large variability of the

deformable organs with limited data [33]. Furthermore, the

performance of both these approaches is restricted by image

registration accuracy.

Registration-free approaches train a classiﬁer using either

handcrafted or learned features to segment abdominal images

[9]. Extraction of robust and deformation-invariant features

relies on expert knowledge about abdominal organs [34].

Having the ability to learn the features automatically, FCN-

based methods, have rapidly replaced the traditional solutions

that require image registration or handcrafted features and have

shown improved performance for abdominal CT segmentation

[2], [12], [13], [35].

2.2. Fully Convolutional Networks for abdominal multi-organ

segmentation

In recent years, Fully Convolution Network (FCN) and its

variants (e.g., UNet [15]) have become a common choice

for medical image segmentation. This dominancy can be

attributed to their ability to learn eﬀective task representations

and eﬃcient inference. UNet has an encoder-decoder style

architecture and consists of skip connections, joining the

encoding and decoding layers on the same level. Despite

being trained from scratch, UNet demonstrated state-of-the-

art performance for various medical image segmentation tasks

[36], [37]. Built on top of UNet, several other modiﬁed

architectures were subsequently proposed, e.g., UNet++ [27],

Attention-UNet [28], etc.

Existing deep learning-based studies for abdominal multi-

organ segmentation have utilized 2D or 3D convolutional

networks. 2D methods are less parameter-intensive;

however, they cannot exploit the 3D contextual information

and eventually provide sub-accurate organ-delineation

performance. 3D convolutional networks are facilitated with

3D convolutions, 3D pooling, and 3D normalization to exploit

the rich volumetric context and generate dense voxel-wise

predictions [26]. Advances in eﬃcient 3D convolutional

implementation and increased GPU memory have enabled

the adoption of 3D convolutional models for abdominal

multi-organ segmentation [38], [3].

Roth et al. [16] proposed a cascaded architecture based on

two 3D UNets where the ﬁrst UNet is trained to separate the

abdominal area from the background, and the latter utilized

the output from the ﬁrst UNet to simultaneously segment the

abdominal organs. Peng et al. [39] delineated abdominal organs

using 3D UNet with residual-learning based units (ResNets) to

calculate patient-speciﬁc CT organ dose. In another study [2],

abdominal organs are segmented using a 3D FCN with dilated

convolutions based densely connected units. Heinrich et al.

[11] leveraged 3D deformable convolutions to spatially adapt

(a) (b)

Fig. 2. Multi-task topologies of 3D boundary-constrained network. (a) Multi-task topology with shared encoder-decoder network and task-speciﬁc prediction layers,

and (b) Multi-task topology with shared encoder and task-speciﬁc decoders.

the receptive ﬁeld for abdominal multi-organ segmentation.

In [40], abdominal scans were segmented using a 3D deeply

supervised patch-based UNet with grid-based attention gates

to encourage the network to focus on useful salient features

propagated through the skip connections. Some existing

methods have employed post-processing steps, including level-

sets [3] and graph-cut [4] to reﬁne initial segmentation maps

obtained from 3D deep convolutional networks.

Through the eﬀorts mentioned above, the existing 3D

methods have mostly emphasized developing better deep

learning architectures and did not attempt to improve the

segmentation of challenging parts of abdominal organs, e.g.,

voxels that belong to the contour of organs and regions within

the vicinity of organ-contour. The fuzzy appearance of the

boundary of organs and low contrast between the adjacent

abdominal structures makes the voxels belonging to these

regions more susceptible to wrong label prediction.

2.3. Boundary-constrained medical image segmentation

Several existing deep learning-based medical image

segmentation methods have utilized the boundary information

of regions of interest to overcome the misprediction of

boundary pixels [19], [20], [21], [41]. In these methods,

the networks are trained in a multi-task learning fashion to

simultaneously predict the probability maps of entire organs

and their corresponding boundaries. Most of these methods

have resorted to the hard-parameter sharing technique, where

a single network contains shared and task-speciﬁc parameters

and is jointly trained to solve multiple tasks.

Chen et al. [19] segmented the glands and their

corresponding boundaries via multi-task training. By training

the model to learn the co-representations, the model achieved

better gland segmentation performance than the single-task

models. In [42], a dual-decoder-based network is presented that

simultaneously detects the boundaries and predicts the semantic

labels of cells. Features from the boundary-decoding path

were concatenated with those learned in the entire cell region

decoding path via additional skip connections. This led to the

improved histopathological image segmentation performance.

In [43], boundary and distance maps were used for improved

polyp and optic disk segmentation, respectively. Tan et al. [20]

proposed a multi-task medical image segmentation network

consisting of a single encoder and separate dedicated arms for

decoding regions and boundaries. The study was evaluated on

numerous applications, including MR femur and CT kidney

segmentation. Zhang et al., [44] presented a edge-based deeply

supervised network for predicting the regions of interest and

their corresponding boundaries. The method was validated for

retinal, x-ray, and CT image segmentation. Wang et al. [45]

proposed a two-parallel stream model in which each of the two

streams was trained to segment region and detect boundary

followed by fusion of contour and region prediction maps.

Lee et al. [41] proposed a framework that predicts boundary

keypoint maps and makes use of adversarial loss for improved

boundary preserving in medical image segmentation.

Given the challenge presented by voxels on the organs’

boundaries and the evidence in the literature that focusing

on boundaries is beneﬁcial for performance, we integrate

the organs boundary prediction as an auxiliary task into the

training of state-of-the-art 3D medical image segmentation

networks. Since the design choice of network topology impacts

the learning process, we explore two multi-task network

designs and analyze their performance. The boundary co-

training resulted in improved performance on abdominal CT

segmentation tasks compared to the several state-of-the-art 3D

fully convolutional baseline architectures.

3. Proposed Method

In this section, we ﬁrst describe the boundary-constrained

loss for training the 3D encoder-decoder network to

simultaneously predict the boundaries and entire abdominal

organ regions via multi-task learning (Section 3.1), followed

by an exhibition of our proposed multi-task network topologies

(Section 3.2). After that, we discuss the architecture of the 3D

networks that we have as baselines in our work (Section 3.3).

Finally, we present the architectural design of the counterpart

3D boundary-constrained models (Section 3.4).

3.1. Boundary-Constrained Loss

Consider a 3D encoder-decoder network trained to predict

the voxel labels of the abdominal CT scan with W×H×Z

dimensions, where W,H, and Zdenote the length, width,

and depth of the scan, respectively. Such a network takes

an abdominal multi-organ CT scan as an input and outputs a

labelled voxel map of the same size as the input. To utilize

the boundary information of abdominal organs for improved

representation learning, we train the network to predict the 3D

organ-semantic masks and 3D organ-boundaries in one forward

propagation pass. We formulate this problem using a multi-

task learning paradigm where multiple tasks are learned jointly

using shared and task-speciﬁc representations. The loss Lfor

this multi-task learning problem is a weighted combination

of per-task losses, organ segmentation loss LRS and organ

boundary detection loss LBD. We use multi-class dice loss [46]

for evaluating the performance of the multi-organ segmentation

task, given as

LRS =

C−1

c=0

2 (ˆyi,c×yi,c)

ˆy2

i,c+y2

i,c

(1)

where, ˆyi,cand yi,cdenote the 3D multi-organ probability map

and ground-truth mask, respectively, of the ith abdominal CT

scan. Cdenotes the number of organ classes.

ˆyi=ˆp(xi;θs) (2)

ˆp(xi;θs)=

N−1

n=0

ˆp(xi,n;θs)=

W−1

w=0

H−1

h=0

Z−1

z=0

ˆp(xi,w,h,z;θs) (3)

where ˆp(xi,n) represents the label probability of nth voxel in ith

scan and Nrefers to the total number of voxels in a scan.

To evaluate the model’s performance in predicting the

boundaries, we use binary cross-entropy loss (shown in Eq.

Eq. 4). Binary cross-entropy loss for predicting 3D boundaries

is given as

LBD =−

N−1

n=0

ei,nlog(ˆei,n)+(1 −ei,n) log(1 −ˆei,n)

=−

W−1

w=0

H−1

h=0

Z−1

z=0

p(xi,w,h,z;θs) log( ˆp(xi,w,h,z;θs))

+(1 −p(xi,w,h,z;θs)) log(1 −ˆp(xi,w,h,z;θs))

(4)

ˆeiand eirepresent the edge probability map and the

corresponding ground-truth. ˆp(xi,w,h,z) represents the edge

probability of the nth voxel in ith scan. θsrepresents the weights

of the entire deep multi-task encoder-decoder network.

The combined total loss Lis minimized with respect to the

parameters θs, as shown in Eq. 5. Thus our goal is to evaluate

if a network can learn more robust features and subsequently

produce improved organ segmentations by being trained to

explicitly recognize the boundaries.

L(θs)=

i=1

LRS +λ

i=1

LBD (5)

Mand λrepresents the total number of CT scans in the training

set and the weight assigned to the edge detection loss in Eq. 5,

respectively.

We hypothesize that the additional boundary loss (LBD)

would impose a larger penalty on erroneous contour voxels,

and it subsequently pushes the optimization of the segmentation

network towards the solutions with more accurate boundaries.

Thus, one would potentialize the ability of a boundary-

constrained network to extract features that account for the

semantic abdominal organ regions and boundaries.

3.2. Boundary-Constrained Network Topologies

Multi-task learning is generally formulated via hard-

parameter sharing and soft-parameter sharing. In the hard-

parameter sharing paradigm, multiple tasks share a subset of

jointly optimized parameters, whereas task-speciﬁc parameters

are optimized separately. In soft-parameter sharing, each

task is parameterized using its own set of parameters which

are jointly regularized using constraints [47]. In practice,

hard-parameter sharing approaches incur much less parameter

and computational cost. In our work, we formulate the

multi-task learning problem via hard-parameter sharing to

train the encoder-decoder network to do multiple tasks, i.e.,

organ segmentation and boundary detection. For deep neural

networks, the hard-parameter sharing approach is realized by

sharing some network layers between the tasks while keeping

some layers task-speciﬁc.

We explore two diﬀerent network topologies to conduct

multi-task training, as shown in Figures 2a and 2b. The

motivation to explore multiple topologies is to investigate the

impact of sharing the larger and smaller number of parameters

in the network between the two tasks. We explain these multi-

task topologies below.

3.2.1. Task-Speciﬁc Output Layers (TSOL)

The ﬁrst multi-task topology that we explore is formulated

by appending two separate prediction layers for predicting the

boundaries and semantic organ masks. This topology employs

an encoder-decoder network whose weights are shared between

the tasks, except for the last output layers, as shown in Fig. 2a.

Technically, it encourages the use of compact and tightly shared

feature representations. As evident, this conﬁguration has

negligibly fewer more parameters than the single-task network.

We denote this conﬁguration as TSOL.

3.2.2. Task-Speciﬁc Decoders (TSD)

In second mutli-task topology, we modify the 3D encoder-

decoder model to have a single shared encoder but two

separate decoding arms for predicting the semantic regions

and boundaries. The sibling-decoding arms upsample the

region and boundary maps separately. This type of formulation

ensures sparse representation sharing amongst the two tasks

since decoders have been parameterized separately, as shown in

Fig. 2b. The presence of two synthesis paths results in having

signiﬁcantly more parameters than its counterpart single-task

network. We refer to this conﬁguration as TSD.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ImprovedAbdominalMulti-OrganSegmentationvia3DBoundary-ConstrainedDeepNeuralNetworksSamraIrshada,,DouglasP.S.Gomesb,SeongTaeKimcaSwinburneUniversityofTechnology,Hawthorn,AustraliabVictoriaUniversity,Melbourne,AustraliacKyungHeeUniversity,Yongin-si,Gyeonggi-do,SouthKoreaAbstractBackgroundandObjective...

展开>> 收起<<

Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep Neural Networks Samra Irshada Douglas P.S. Gomesb Seong Tae Kimc.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Improved Abdominal Multi-Organ Segmentation via 3D Boundary-Constrained Deep Neural Networks Samra Irshada Douglas P.S. Gomesb Seong Tae Kimc

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: