Boosting Kidney Stone Identification in Endoscopic
Images Using Two-Step Transfer Learning
Francisco Lopez-Tiro1,2,3, Juan Pablo Betancur-Rengifo3, Arturo Ruiz-Sanchez3, Ivan Reyes-Amezcua3,4,
Jonathan El-Beze5, Jacques Hubert5, Michel Daudon6, Gilberto Ochoa-Ruiz*,1,3, Christian Daul*,2
Abstract—Knowing the cause of kidney stone formation is
crucial to establish treatments that prevent recurrence. There are
currently different approaches for determining the kidney stone
type. However, the reference ex-vivo identification procedure can
take up to several weeks, while an in-vivo visual recognition
requires highly trained specialists. Machine learning models
have been developed to provide urologists with an automated
classification of kidney stones during an ureteroscopy; however,
there is a general lack in terms of quality of the training
data and methods. In this work, a two-step transfer learning
approach is used to train the kidney stone classifier. The proposed
approach transfers knowledge learned on a set of images of
kidney stones acquired with a CCD camera (ex-vivo dataset)
to a final model that classifies images from endoscopic images
(ex-vivo dataset). The results show that learning features from
different domains with similar information helps to improve
the performance of a model that performs classification in real
conditions (for instance, uncontrolled lighting conditions and
blur). Finally, in comparison to models that are trained from
scratch or by initializing ImageNet weights, the obtained results
suggest that the two-step approach extracts features improving
the identification of kidney stones in endoscopic images.
Index Terms—Transfer learning, kidney stones, deep learning
I. INTRODUCTION
The formation of kidney stones in the urinary tract is a
public health issue [1], [2]. In industrialized countries, 10%
of the population suffers from an episode of kidney stones
during their lifetime. Recent studies have determined that the
risk of recurrence increases up to 40% in less than 5 years
[3], [4]. Thus, determining the root cause of kidney stone
formation is crucial to avoid relapses through personalized
treatments [3], [5], [6]. Therefore, different approaches for
visually identifying some of the most common types (or
classes) of kidney stones have been proposed in recent years
[7], [8].
The Morpho-Constitutional Analysis (MCA) is currently
the reference method for the identification of the type of the
extracted kidney stone fragments [9]. This ex-vivo procedure
consists of two complementary analyses on the extracted
kidney stone parts, which were fragmented with a laser.
1Tecnologico de Monterrey, School of Sciences and Engineering, Mexico
2CRAN (UMR 7039, Universit´
e de Lorraine and CNRS), Nancy, France
3CV-INSIDE Lab Member, Mexico
4CINVESTAV, Computer Sciences Department, Mexico
5CHU Nancy, Service d’urologie de Brabois, Vandœuvre-l`
es-Nancy, France
6Hˆ
opital Tenon, AP-HP, Paris, France
*Corresponding authors:
gilberto.ochoa@tec.mx, christian.daul@univ-lorraine.fr
The fragments are visually inspected under a microscope to
observe the colors and textures of their surface and sec-
tion. Then, an infrared-spectrophotometry analysis enables
to identify the molecular and crystalline composition of the
different areas (layers) of the kidney stone [10]. However,
in numerous hospitals the MCA results are only available
after some weeks. This delay makes it difficult to establish an
immediate and appropriate treatment for the patient. On the
other hand, removing large kidney stone fragments is often
difficult in practice. Moreover, the biochemical composition
can be altered by the laser during the fragmentation [11],
making the MCA procedure challenging in some cases.
Endoscopic Stone Recognition (ESR) is a promising tech-
nique to immediately determine the type of kidney stones dur-
ing the ureteroscopy (i.e., in-vivo recognition). The advantage
of ESR is twice: kidney stones can be pulverized (dusting
procedure with a laser) instead fragmented, and an appropriate
treatment can be immediately defined. ESR is only based on
a visual inspection of in-vivo endoscopic images observed
on a screen. For trained urologists, ESR results are strongly
correlated with those of MCA [12]. However, only a few
highly trained experts are nowadays able to recognize the type
of kidney stones using only endoscopic images. Moreover, the
visual classification by urologists is operator dependent and
subjective, and the required experience is long to acquire [13].
Studies have been recently proposed to automate ESR [12],
[14], [15]. These Deep Learning (DL) based methods led
to promising results. However, one of the most common
challenges in these DL-based methods for classifying kidney
stones is the lack of a large image set for the model training.
In addition, the similarity of the data distribution is another
important factor to obtain an adequate model. Consequently,
this suggests a trade-off between the amount of available
data and the data distribution to fit the network weights
adequately. The majority of the DL-based models report fine-
tuning of weights learned from distributions other than those
from kidney stone images (commonly from ImageNet [16]).
Transfer Learning (TL) is used when features learned from
a given domain (or class of images) can bring appropriate
knowledge to another domain for which the available image
set is too small to train a large model from scratch [17], [22].
In the context of ureteroscopy, a large dataset of in-vivo images
is currently not available and collecting such a large database
of endoscopic images during ureteroscopies is a long term
work. However, in the context of this work, images of ex-vivo
arXiv:2210.13654v1 [cs.CV] 24 Oct 2022