2 N. Tomar et al.
for the polyp miss-rate is either the polyp was not visible during the examination
or was not recognized despite being in the visual field because of the faster
colonoscope withdrawal time. Deep learning based algorithms can highlight the
presence of pre-cancerous tissue in the colon and have the potential to improve
the diagnostic performance of endoscopists. Improving the polyp detection rate
as well as its accurate segmentation is an unmet clinical need. In practice, precise
polyp segmentation provides important information in the early detection of
colorectal cancer via their shape, texture, and location information.
Tomar et al. [17] proposed a feedback attention network for biomedical im-
age segmentation where they utilized the previous epoch mask with the current
training epoch in an iterative fashion to further improve the performance. Fan
et al. [3] used Res2Net-based [4] backbone where they used a parallel partial
decoder and parallel reverse attention mechanism for the accurate polyp seg-
mentation. Jha et al. [9] proposed an efficient architecture where they utilized
the strength of the residual block, atrous spatial pyramidal pooling, with squeeze
and excitation block for polyp segmentation. Shen et al. [15] proposed a hard
region enhancement network (HRENet) that consists of an informative context
enhancement (ICE) module and trained the model on edge and structure con-
sistency aware loss (ESCLoss) to improve the polyp segmentation on the precise
edge. Zhao et al. [21] proposed a multi-scale subtraction network (MSNet) for
automatic polyp segmentation. Despite of several architectures proposed in the
literature, most existing methods often neglect the encoder and tend to focus
more on the decoder part of the network, which led to the loss of significant
features from the encoder part. In our proposed method, we focus more on the
encoder part of the network by utilizing different scales features which are passed
through multiple dilated convolutions to capture more enlarged features, leading
to improved polyp segmentation. Unlike other decoders, the design of our decoder
is straightforward. It utilizes simple sequences of layers such as an upsampling
layer, concatenation, residual block and an attention layer. We introduce the
novel deep learning architecture, DilatedSegNet, to address the critical need for
clinical integration of polyp segmentation routine, which is real-time and retains
high accuracy. The main contribution of the study are as follows:
1. We introduce a novel network named DilatedSegNet for polyp segmentation.
The architecture begins with a pre-trained ResNet50 [5] and utilizes dilated
convolution [19] pooling block to increase the receptive field for capturing
more diverse and reliable features for a better delineation.
2. DilatedSegNet showed outstanding performance by outperforming nine stan-
dard benchmarking methods with two widely used publicly available polyp
segmentation datasets.
3. Extensive experimental results and cross-dataset test results on two unseen
datasets showed the better generalizability capability of the DilateSegNet.
Explored deep features showed via heatmaps that the proposed network
model is focusing on the target polyp regions and their boundaries, proving
visual interpretability of the model.