Loc-V AE Learning Structurally Localized Representation from 3D Brain MR Images for Content-Based Image Retrieval

2025-05-02 0 0 1.18MB 6 页 10玖币
侵权投诉
Loc-VAE: Learning Structurally Localized
Representation from 3D Brain MR Images
for Content-Based Image Retrieval
Kei Nishimaki1, Kumpei Ikuta1, Yuto Onga1, Hitoshi Iyatomi1, Kenichi Oishi2
for the Alzheimer’s Disease Neuroimaging Initiative*
1Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, Tokyo, Japan
2Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, USA
{kei.nishimaki.1106, kunpei.ikuta, yuuto.onnga.23}@gmail.com, iyatomi@hosei.ac.jp, koishi2@jhmi.edu
Abstract—Content-based image retrieval (CBIR) systems are
an emerging technology that supports reading and interpret-
ing medical images. Since 3D brain MR images are high
dimensional, dimensionality reduction is necessary for CBIR
using machine learning techniques. In addition, for a reliable
CBIR system, each dimension in the resulting low-dimensional
representation must be associated with a neurologically inter-
pretable region. We propose a localized variational autoencoder
(Loc-VAE) that provides neuroanatomically interpretable low-
dimensional representation from 3D brain MR images for
clinical CBIR. Loc-VAE is based on β-VAE with the additional
constraint that each dimension of the low-dimensional represen-
tation corresponds to a local region of the brain. The proposed
Loc-VAE is capable of acquiring representation that preserves
disease features and is highly localized, even under high-
dimensional compression ratios (4096:1). The low-dimensional
representation obtained by Loc-VAE improved the locality
measure of each dimension by 4.61 points compared to na¨
ıve
β-VAE, while maintaining comparable brain reconstruction
capability and information about the diagnosis of Alzheimer’s
disease.
Index Terms—ADNI, CBIR, VAE, dimensionality reduction,
3D brain MRI
I. INTRODUCTION
Magnetic resonance (MR) images are stored in the picture
archiving and communication system (PACS) [1] along with
the corresponding clinical information, which enables the
centralized management of scanned images. These stored
images are retrieved for diagnostic and research purposes.
When querying and registering images in such databases, it
is common to use keywords that describe brain structural
and clinical features and so on. However, selecting the
appropriate keywords requires sufficient experience in the
specialized field. Therefore, it is desirable to develop a
content-based image retrieval (CBIR) [2] system in medical
practice to retrieve MR images by querying the images
themselves rather than keywords.
Since MR images are usually composed of millions of
voxels or more, CBIR based on machine learning techniques
*Data used in preparation of this article were obtained from
the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
(adni.loni.usc.edu). As such, the investigators within the ADNI contributed
to the design and implementation of ADNI and/or provided data but
did not participate in analysis or writing of this report. A complete
listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-
content/uploads/how to apply/ADNI Acknowledgement List.pdf
must avoid the curse of dimensionality. Classic and widely
used methods for dimensionality reduction for this purpose
can be mainly categorized into two groups: (i) feature ex-
traction, which transforms the part of the interest in the data
into a compact vector [3]–[5], and (ii) compressed expression
acquisition, which converts the entire data into a vector of
summaries with such as singular value decomposition [6] or
other means. However, feature extraction to obtain compact
vectors generally is not easy and requires specialized feature
engineering. However, acquiring compressed representation
is challenging to balance low dimensionality and preservation
of important features. With the recent advancement of deep
learning-based techniques in computer vision, convolutional
neural networks (CNNs) that can encompass (i) and (ii) have
been proposed, and have been applied to brain MR images
[7]–[9]. In addition, several CNN-based algorithms for CBIR
have been proposed [10]–[13]. Especially, convolutional au-
toencoder (CAE)-based dimensionality reduction methods
[12] have achieved a high compression ratio in brain MR
images. Moreover, an extension 3D-CAE also utilizes metric
learning to acquire more disease-specific low-dimensional
representation [13].
To achieve a reliable CBIR system to support clinical
decisions, users must be provided with human-interpretable
reasons for the similarity of the images. However, CNN or
CAE-based dimensionality reduction methods above do not
consider the readability and interpretability of the obtained
low-dimensional representation. For the CBIR system to
enable image retrieval based on disease-related neuropatho-
logical features, each dimension in the resulting representa-
tion must be associated with a neurologically interpretable
region containing known disease-related pathology. With
these capabilities, such a CBIR system does not merely list
the results but also provides the radiologist with the rationale
for the system’s recommendations, offering the possibility of
using the results even more effectively.
This paper proposes a general-purpose, highly inter-
pretable low-dimensional representation acquisition method,
localized variational autoencoder (Loc-VAE), and applies it
to brain MR images to implement a practical CBIR system.
The proposed Loc-VAE adds a new constraint to the β-
variational autoencoder (β-VAE) [14], resulting in a highly
interpretable low-dimensional representation in which each
arXiv:2210.00506v1 [eess.IV] 2 Oct 2022
duplicate 󲤋
perturbation 󲤋
input image 󲤋
difference 󲤋
image󲤋
reconstructed
images
Fig. 1. The schematics of the proposed method.
dimension is independent and responsible for a specific
portion of the input data, i.e., a local brain region.
II. RELATED WORKS
In this section, we mainly focus on the properties of the
two CNN-based dimensionality reduction methods for CBIR
and the interpretability of β-VAE.
Swati [11] et al. proposed a framework for CBIR using
VGG19 [15] pre-trained on ImageNet [16] and closed-form
metric learning (CFML) [17] of the similarity distance. The
pre-trained VGG19 is fine-tuned on brain MR images with
metric learning, used to determine the optimum metric,
which increases intraclass similarity while decreasing inter-
class similarity. Similarity cases are determined by calcu-
lating the similarity between query and database images by
applying CFML on features of the FC7 layer in VGG19.
Swati et al.s CNN-based model can acquire the features
to find similarity cases without manually creating features.
However, their model needs disease label information in
their fine-tuning stage. Since CBIR is expected to include a
variety of cases, it is not reasonable to build a model based
on a classification task for all labels. The low-dimensional
representation also retains features useful for classification,
but the readability of each dimension is not mentioned.
Arai et al. proposed CAE-based dimensionality reduction
in CBIR [12]. CAE is an extension of an autoencoder that
uses a using CNN for compression and restoration. The basic
idea behind dimensionality reduction with a CAE is that if
the reconstruction error between the input and the output
is small, the low-dimensional representation retains a large
amount of input information. This methodology is practical
in CBIR because the model can be trained without specific
label information. Arai et al. have successfully compressed
brain MR images of 5 million dimensions down to 150
dimensions while preserving clinically relevant neuroradi-
ological features. Although a CAE provides a high com-
pression performance by learning to reduce reconstruction
errors, the image and its low-dimensional representation
can be obtained only as a point-to-point relationship in the
respective data space, and continuity around a data point is
not guaranteed. In addition, the interpretability of the low-
dimensional representation is not taken into considered.
Higgins et al. proposed β-VAE [14], a deep unsupervised
generative approach for disentangled low-dimensional rep-
resentation. Like a CAE, β-VAE is a CNN-based encoder-
decoder model, with the most significant difference being
that it assumes that the input data are generated from multi-
variate normal distributions. The encoder of β-VAE converts
the input data into a low-dimensional probability distribution
where each dimension follows normal distribution, and the
decoder reconstructs the original data from the distribution.
In other words, in β-VAE, a single data point is embedded as
a low-dimensional probability distribution. Thus, unlike the
CAE, β-VAE guarantees continuity around data points, so
data that are close in the input space are expected to be placed
close in lower-dimensional space. Moreover, since each
dimension of the distribution is independent and regularized,
the resulting low-dimensional representation is much more
neuroanatomically interpretable than in the CAE case. These
are important features in CBIR realization. However, few
studies have obtained disentangled representation evaluated
against brain MR images.
III. PROPOSED METHOD
In this paper, we propose the localized variational au-
toencoder (Loc-VAE), an encoder for acquiring interpretable
low-dimensional representation from brain MR images for
CBIR. Fig. 1 shows an overview of the proposed Loc-VAE.
Loc-VAE is a learning model based on β-VAE [14], which
provides independent embedding for each dimension while
ensuring continuity for each localized region of the brain.
The loss function of Loc-VAE consists of the following two
terms:
L=Lβ-VAE +LLocal.(1)
The first term, Lβ-VAE, is the term used in general VAE
models, and the second term, LLocal, is a newly introduced
term to localize the range carried by each dimension of
摘要:

Loc-VAE:LearningStructurallyLocalizedRepresentationfrom3DBrainMRImagesforContent-BasedImageRetrievalKeiNishimaki1,KumpeiIkuta1,YutoOnga1,HitoshiIyatomi1,KenichiOishi2fortheAlzheimer'sDiseaseNeuroimagingInitiative*1DepartmentofAppliedInformatics,GraduateSchoolofScienceandEngineering,HoseiUniversity,T...

展开>> 收起<<
Loc-V AE Learning Structurally Localized Representation from 3D Brain MR Images for Content-Based Image Retrieval.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:1.18MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注