shelf (COTS) hardware, also known as CubeSats, could be
used for disaster response in the case of floods. The authors
proposed the use of deep learning algorithms to produce
multi-class segmentation with high accuracy on-board of a
very cheap satellite hardware. Although the deep learning
approaches proposed for this challenging task have produced
great results in terms of accuracy, they are black-box and are
extremely difficult to audit [14].
In this paper, a new prototype-based approach is proposed
called interpretable deep semantic segmentation (IDSS), which
extends the recently introduced explainable Deep Neural Net-
work (xDNN) [15] through a new clustering and decision
mechanism. The prototype-based nature of IDSS allows clear
interpretability of its decision mechanism [16]. This is im-
portant for the human-in-the-loop process involved in this
application. Results demonstrate that IDSS is able to surpass
xDNN and U-Net in terms of IoU and recall for water
detection.
II. LITERATURE REVIEW
A. Water/Flood mapping
Water/Flood mapping requires semantic segmentation for
allocating a class label to each multidimensional pixel. Tradi-
tionally, different water index-based methods are being used to
determine liquid water. They represent ratios of different spec-
tral bands from the raw satellite signal that can characterize
the water absorption. The rationale behind the usefulness of
the water indices is that the water absorbs energy in the near
infrared (NIR) and short-wave infrared (SWIR) wavelengths,
making it significantly different from other objects on the
ground [17]. The most widely used water index is NDWI
[10]. Other water indices also exists such as MNDWI [11],
WNDWI [18] and AWEI [19]. However, such methods often
require manual setting of thresholds, which is challenging and
the choice does influence the result significantly.
Machine learning methods were also applied for flood map-
ping. For example, K-means was used to perform clustering
based on Synthetic Aperture Radar (SAR) optical features and
thresholds were applied to further perform classification by
[20]. The SVM classifier was used as a reference label since
it is considered to provide better results than the thresholding
methods [21]. KNN was used to perform per-pixel classifica-
tion based on the water index by [22].
More recently, with the rapid development of deep learning,
convolutional neural networks started to be used more widely
for Earth Observation and flood mapping, in particular. For
example, the fully convolutional neural network Resnet50
trained on Sentinel-1 satellite images was used to segment
permanent water and flood by [23], U-Net and a simple CNN
were trained on Sentinel-2 satellite images used to perform the
onboard flood segmentation task by [13]. The main advanatge
of using deep convolutional networks is their high levels
of accuracy measured primarily by IoU (intersection over
union) and recall characteristics for this particular problem.
They also offer powerful latent features extraction capability.
However, the downside is that they require large amount of
labeled training data and computational power and most of
all, they lack interpretability. They are often considered as
”black box” models because they have millions of abstract
model parameters (weights) which have no direct physical
meaning and are hard to interpret or check if affected by noise
or adversarial actions especially when transmitted. Attempts
have been made to provide some explainability, but these
are mostly post-hoc partial solutions or surrogate models.
Therefore, currently, there is a powerful trend aiming to
develop explainable or interpretable-by-design alternatives that
are as powerful as such Deep Neural Networks, yet offer
human-intelligible models [24], [15], [14].
The method proposed in this paper offers visual and linguis-
tic forms of interpretation based on prototypes, offering layers
with clear interpretation as well as a linguistic IF... THEN
rules and clear decision making process based on similarity.
These forms of representation of the model can be inspected
by a human and have clear meaning. The prototypes can also
be visualized using RGB colours as well as raw features such
as NIR, SWIR, etc. which are easy to interpret by a human.
In this next sub-section, prototype-based machine learning
methods are briefly reviewed because they are underestimated
and often overlooked, but they do offer clear advantages
in terms of interpretability and high levels of accuracy and
performance.
B. Prototype-based models
Prototype-based machine learning methods have similar
concepts as some methods from cognitive psychology and
neuroscience in regards to comparisons of new observations
or stimuli to a set of prototypes [25]. Prototype-based machine
learning models have been attracting much attention due
to their easy to understand decision making processes and
interpretability. K nearest neighbor [26] and K-means [27]
are the most representative algorithms based on prototypes.
The most typical prototype-based neural network algorithm is
the Radial-basis Function (RBF) method [28] which is like
a bridge between neural networks and linguistic IF...THEN
rules. There are also other prototype-based machine learning
algorithms such as fuzzy c-means clustering [29] and Learning
Vector Quantization (LVQ) [30].
Recently, with the rapid development of deep learning
algorithms, prototype-based models have a tendency to inte-
grate with neural networks. Such as ProtoPNet [31], xDNN
[15] and a nonparametric segmentation framework [32]. The
IDSS method proposed in this paper differs from these works
because xDNN [15] does not perform clustering to generate
prototypes and has not been applied to flood mapping while
the work by [32] only look for prototypes in the embedding
space, which greatly reduces the interpretability of the model.
The method proposed in this paper not only outperforms the
state-of-the-art deep convolutional networks such as UNet and
SCNN as well as various water-indices-based models, but also,
greatly improves the interpretability of the model by further
finding the mean value in the raw features space corresponding