ConnectedUNets++: Mass Segmentation from Whole Mammographic Images 3
constructs the segmentation map from the encoded data. The encoder and de-
coder are linked by a series of skip connections which are the most innovative
component of the U-Net architecture since they enable the network to recover
spatial data that has been lost due to pooling procedures. Abdelhafiz et al. [12]
used a vanilla U-Net model to segment mass lesions in whole mammograms.
To segment suspicious regions in mammograms, Ravitha Rajalakshmi et al. [13]
presented a deeply supervised U-Net model (DS U-Net) combined with a dense
Conditional Random Field (CRF). Li et al. [14] proposed a Conditional Resid-
ual U-Net, named CRUNet, to improve the performance of the basic U-Net for
breast mass segmentation.
Though U-Net is among the most popular and successful deep learning mod-
els for biomedical image segmentation, several improvements are still possible.
Specifically, the concatenation of encoder and decoder features reveals a signif-
icant semantic gap despite the preservation of dispersed spatial features, which
is a shortcoming of the simple skip connections. To deal with this issue, Ibte-
haz et al. [15] proposed the MultiResUNet architecture by incorporating some
convolutional layers along with shortcut connections in U-Net. Instead of sim-
ply concatenating the feature maps from the encoder stage to the decoder stage,
they first pass them through a chain of convolutional layers and then concatenate
them with the decoder features, which makes learning substantially easier. This
idea is inspired from the image-to-image conversion using convolutional neural
networks [16], where pooling layers are not favorable for the loss of informa-
tion. MultiResUNet has shown excellent results on different biomedical images,
however, the authors did not experiment with mammograms.
Based on the U-Net architecture, Baccouche et al. [8] proposed an improved
architecture that connects two simple U-Nets, called Connected-UNets. In addi-
tion to the original idea of the U-Net architecture, which includes skip connec-
tions between the encoder and decoder networks, it cascades a second U-Net and
adds skip connections between the decoder of the first U-Net and the encoder
of the second U-Net. The key idea was to recovering fine-grained characteristics
lost in U-Net’s encoding process. However, the authors first used YOLO [17] to
detect the location of masses in mammograms, and then applied their method
to segment only correctly localized masses. Such an approach is not optimum
in practical settings where it is desirable to simultaneously localize and segment
masses in whole mammograms rather than processing cropped mammograms.
Several modifications of the U-Net architecture have also been proposed by
incorporating an attention mechanism, which has shown to be extremely effec-
tive in medical image segmentation. Oktay et al. [18] proposed a new attention
U-Net by adding an attention gate into the conventional U-Net. This enhanced
the accuracy of the predictions. However, they didn’t evaluate their model for
breast mass segmentation. Similarly, Li et al. [19] built an attention dense U-
Net for breast mass segmentation, which was compared to U-Net [6], Attention
U-Net [18], and DenseNet [20]. In another study by Sun et al. [9], an attention-
guided dense upsampling network, called AUNet, was built for breast mass seg-
mentation in full mammograms. The major drawback of the papers mentioned