faster and overcome the issues like variability in human
expertise and inherent subjectivity.
Abdominal CT scans often present weak inter-organ
boundaries characterized by regions of similar voxel intensities,
which in turn results in low-contrast representations. Such
appearances are usually caused by the representation of
abdominal soft tissues in a narrow band of Hounsfield
(HU) values. Another factor that enhances the already
complex representation of abdominal organs is the existence of
artifacts occurring due to blood flow, respiratory, and cardiac
motion. Accurate delineation of abdominal organs with unclear
boundaries and complex geometrical shapes is one of the
ongoing challenges that hurdles the abdominal-related clinical
diagnosis.
(a) (b) (c)
Fig. 1. Exemplary 2D abdominal CT image showing the visual characteristics
of organs. (a) 2D abdominal image, (b) Abdominal organs annotated on CT
image: pancreas ( ), spleen ( ), liver ( ), stomach ( ), gallbladder (
), (c) 3D multi-organ voxel map.
Earlier methods proposed for the abdominal multi-organ
segmentation mainly were based on multi-atlas [5], [6] or
statistical models [7], [8]. Some methods also made use
of handcrafted or learned features to segment abdominal
organs [9], [10]. However, the recent Fully Convolutional
Network (FCN) based approaches have presented better results
due to the improved organ representation learning [2] [11].
Being able to preserve the image structure and provision of
efficient learning as well as inference, FCN-based methods
are currently considered state-of-the-art for abdominal multi-
organ segmentation [2], [12], [13], [14]. Specifically, these
networks follow the encoder-decoder architectural design [15].
In such networks, the shallow layers in the encoder aim to
extract low-level features, and the deep layers encode high-level
features. While the mirrored-decoder maps back the learned
features to generate an output of the same size as input with skip
connections assisting in retaining the crucial features extracted
in the encoding path [15].
Existing FCN-based methods for abdominal multi-
organ segmentation employ either 2D or 3D convolutional
architectures [13], [12]. 2D methods process the CT scans in a
slice-by-slice fashion and predict the organ labels on individual
slices [13]. Despite being memory- and parameter-efficient,
2D methods are unable to make full use of 3D contextual
information [2]. 3D methods make use of rich volumetric
context by processing the whole CT volume and generating
voxel-maps in a single forward propagation pass, leading
to better abdominal CT segmentation performance than 2D
approaches [16], [17].
The existing 3D methods have primarily focused on
designing better architectures for improved abdominal multi-
organ representation learning [12], [2]. However, they treat all
the anatomical parts within a single organ equally since they
solely rely on voxel-level information and do not specifically
focus on improving the segmentation of voxels in vulnerable
regions/parts of organs. As an example, we highlight some
of the important characteristics of abdominal organs in Fig. 1.
From Figures 1a and 1b, it can be noticed that the adjacent
organs have weak contours which sometimes touch each other.
As an example, observe the low-contrasted and touching
boundaries between stomach ( ) and pancreas ( ).
Moreover, 3D multi-organ visualization in Fig. 1c shows that
the adjacent positioning of organs in the abdominal cavity
aggravates the complex spatial relationship among the organs.
Simultaneously segmenting the abdominal organs with soft
contours and complex spatial relationships is a challenging task.
The boundaries of anatomical regions in medical scans
serve as an important cue for facilitating manual and
automated delineation [18]. Numerous existing deep learning-
based studies leveraged learning of features corresponding to
boundary of regions for improved medical image segmentation
via multitask learning paradigm [19], [20], [21], [22], [23]. In
recent years, deep multitask learning paradigm has been widely
used due to its potential to solve multiple tasks in one forward
propagation and ability to learn better representations because
of the multiple supervisory signals [24], [25]. In this paper,
we propose to improve the segmentation of abdominal organs
on CT scans by enhancing the segmentation of boundary of
organs. Particularly, we train the 3D deep learning networks
to simultaneously predict the boundary and the entire region of
organs. The inclusion of boundary information is motivated
by the fact that the voxels on the boundary of organs are
more vulnerable to misprediction because of their ambiguous
appearance and complex relationship with adjacent organs.
Specifically, our work makes the following contributions:
(i) We develop an end-to-end trainable 3D multi-task learning
framework that simultaneously predicts the voxel-labels of
abdominal organs and their corresponding boundaries. By
integrating the boundary features, our proposed boundary-
constrained 3D deep learning framework focuses on the
accurate prediction of the edges of organs in addition to
whole organs.
(ii) Instead of relying on a single network topology,
we explore and compare two network topologies for
conducting multi-task learning. In the first topology, the
whole encoder-decoder network is shared with separate
task-specific prediction layers at the end for predicting
boundaries and entire organs’ maps. In the second
topology, an encoder is shared with separate task-specific
decoders for decoding the features, jointly learned by
the shared encoder to predict the boundary and organ
probability maps. With an extensive comparison, we
reveal that integration of boundary features invariably
improves the multi-organ segmentation performance,
independent of the multi-task network design.
2