
Loc-VAE: Learning Structurally Localized
Representation from 3D Brain MR Images
for Content-Based Image Retrieval
Kei Nishimaki1, Kumpei Ikuta1, Yuto Onga1, Hitoshi Iyatomi1, Kenichi Oishi2
for the Alzheimer’s Disease Neuroimaging Initiative*
1Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, Tokyo, Japan
2Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, USA
{kei.nishimaki.1106, kunpei.ikuta, yuuto.onnga.23}@gmail.com, iyatomi@hosei.ac.jp, koishi2@jhmi.edu
Abstract—Content-based image retrieval (CBIR) systems are
an emerging technology that supports reading and interpret-
ing medical images. Since 3D brain MR images are high
dimensional, dimensionality reduction is necessary for CBIR
using machine learning techniques. In addition, for a reliable
CBIR system, each dimension in the resulting low-dimensional
representation must be associated with a neurologically inter-
pretable region. We propose a localized variational autoencoder
(Loc-VAE) that provides neuroanatomically interpretable low-
dimensional representation from 3D brain MR images for
clinical CBIR. Loc-VAE is based on β-VAE with the additional
constraint that each dimension of the low-dimensional represen-
tation corresponds to a local region of the brain. The proposed
Loc-VAE is capable of acquiring representation that preserves
disease features and is highly localized, even under high-
dimensional compression ratios (4096:1). The low-dimensional
representation obtained by Loc-VAE improved the locality
measure of each dimension by 4.61 points compared to na¨
ıve
β-VAE, while maintaining comparable brain reconstruction
capability and information about the diagnosis of Alzheimer’s
disease.
Index Terms—ADNI, CBIR, VAE, dimensionality reduction,
3D brain MRI
I. INTRODUCTION
Magnetic resonance (MR) images are stored in the picture
archiving and communication system (PACS) [1] along with
the corresponding clinical information, which enables the
centralized management of scanned images. These stored
images are retrieved for diagnostic and research purposes.
When querying and registering images in such databases, it
is common to use keywords that describe brain structural
and clinical features and so on. However, selecting the
appropriate keywords requires sufficient experience in the
specialized field. Therefore, it is desirable to develop a
content-based image retrieval (CBIR) [2] system in medical
practice to retrieve MR images by querying the images
themselves rather than keywords.
Since MR images are usually composed of millions of
voxels or more, CBIR based on machine learning techniques
*Data used in preparation of this article were obtained from
the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
(adni.loni.usc.edu). As such, the investigators within the ADNI contributed
to the design and implementation of ADNI and/or provided data but
did not participate in analysis or writing of this report. A complete
listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-
content/uploads/how to apply/ADNI Acknowledgement List.pdf
must avoid the curse of dimensionality. Classic and widely
used methods for dimensionality reduction for this purpose
can be mainly categorized into two groups: (i) feature ex-
traction, which transforms the part of the interest in the data
into a compact vector [3]–[5], and (ii) compressed expression
acquisition, which converts the entire data into a vector of
summaries with such as singular value decomposition [6] or
other means. However, feature extraction to obtain compact
vectors generally is not easy and requires specialized feature
engineering. However, acquiring compressed representation
is challenging to balance low dimensionality and preservation
of important features. With the recent advancement of deep
learning-based techniques in computer vision, convolutional
neural networks (CNNs) that can encompass (i) and (ii) have
been proposed, and have been applied to brain MR images
[7]–[9]. In addition, several CNN-based algorithms for CBIR
have been proposed [10]–[13]. Especially, convolutional au-
toencoder (CAE)-based dimensionality reduction methods
[12] have achieved a high compression ratio in brain MR
images. Moreover, an extension 3D-CAE also utilizes metric
learning to acquire more disease-specific low-dimensional
representation [13].
To achieve a reliable CBIR system to support clinical
decisions, users must be provided with human-interpretable
reasons for the similarity of the images. However, CNN or
CAE-based dimensionality reduction methods above do not
consider the readability and interpretability of the obtained
low-dimensional representation. For the CBIR system to
enable image retrieval based on disease-related neuropatho-
logical features, each dimension in the resulting representa-
tion must be associated with a neurologically interpretable
region containing known disease-related pathology. With
these capabilities, such a CBIR system does not merely list
the results but also provides the radiologist with the rationale
for the system’s recommendations, offering the possibility of
using the results even more effectively.
This paper proposes a general-purpose, highly inter-
pretable low-dimensional representation acquisition method,
localized variational autoencoder (Loc-VAE), and applies it
to brain MR images to implement a practical CBIR system.
The proposed Loc-VAE adds a new constraint to the β-
variational autoencoder (β-VAE) [14], resulting in a highly
interpretable low-dimensional representation in which each
arXiv:2210.00506v1 [eess.IV] 2 Oct 2022