Loc-V AE Learning Structurally Localized Representation from 3D Brain MR Images for Content-Based Image Retrieval

2025-05-02 0 0 1.18MB 6 页 10玖币

侵权投诉

Loc-VAE: Learning Structurally Localized

Representation from 3D Brain MR Images

for Content-Based Image Retrieval

Kei Nishimaki1, Kumpei Ikuta1, Yuto Onga1, Hitoshi Iyatomi1, Kenichi Oishi2

for the Alzheimer’s Disease Neuroimaging Initiative*

1Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, Tokyo, Japan

2Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, USA

{kei.nishimaki.1106, kunpei.ikuta, yuuto.onnga.23}@gmail.com, iyatomi@hosei.ac.jp, koishi2@jhmi.edu

Abstract—Content-based image retrieval (CBIR) systems are

an emerging technology that supports reading and interpret-

ing medical images. Since 3D brain MR images are high

dimensional, dimensionality reduction is necessary for CBIR

using machine learning techniques. In addition, for a reliable

CBIR system, each dimension in the resulting low-dimensional

representation must be associated with a neurologically inter-

pretable region. We propose a localized variational autoencoder

(Loc-VAE) that provides neuroanatomically interpretable low-

dimensional representation from 3D brain MR images for

clinical CBIR. Loc-VAE is based on β-VAE with the additional

constraint that each dimension of the low-dimensional represen-

tation corresponds to a local region of the brain. The proposed

Loc-VAE is capable of acquiring representation that preserves

disease features and is highly localized, even under high-

dimensional compression ratios (4096:1). The low-dimensional

representation obtained by Loc-VAE improved the locality

measure of each dimension by 4.61 points compared to na¨

ıve

β-VAE, while maintaining comparable brain reconstruction

capability and information about the diagnosis of Alzheimer’s

disease.

Index Terms—ADNI, CBIR, VAE, dimensionality reduction,

3D brain MRI

I. INTRODUCTION

Magnetic resonance (MR) images are stored in the picture

archiving and communication system (PACS) [1] along with

the corresponding clinical information, which enables the

centralized management of scanned images. These stored

images are retrieved for diagnostic and research purposes.

When querying and registering images in such databases, it

is common to use keywords that describe brain structural

and clinical features and so on. However, selecting the

appropriate keywords requires sufﬁcient experience in the

specialized ﬁeld. Therefore, it is desirable to develop a

content-based image retrieval (CBIR) [2] system in medical

practice to retrieve MR images by querying the images

themselves rather than keywords.

Since MR images are usually composed of millions of

voxels or more, CBIR based on machine learning techniques

*Data used in preparation of this article were obtained from

the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database

(adni.loni.usc.edu). As such, the investigators within the ADNI contributed

to the design and implementation of ADNI and/or provided data but

did not participate in analysis or writing of this report. A complete

listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-

content/uploads/how to apply/ADNI Acknowledgement List.pdf

must avoid the curse of dimensionality. Classic and widely

used methods for dimensionality reduction for this purpose

can be mainly categorized into two groups: (i) feature ex-

traction, which transforms the part of the interest in the data

into a compact vector [3]–[5], and (ii) compressed expression

acquisition, which converts the entire data into a vector of

summaries with such as singular value decomposition [6] or

other means. However, feature extraction to obtain compact

vectors generally is not easy and requires specialized feature

engineering. However, acquiring compressed representation

is challenging to balance low dimensionality and preservation

of important features. With the recent advancement of deep

learning-based techniques in computer vision, convolutional

neural networks (CNNs) that can encompass (i) and (ii) have

been proposed, and have been applied to brain MR images

[7]–[9]. In addition, several CNN-based algorithms for CBIR

have been proposed [10]–[13]. Especially, convolutional au-

toencoder (CAE)-based dimensionality reduction methods

[12] have achieved a high compression ratio in brain MR

images. Moreover, an extension 3D-CAE also utilizes metric

learning to acquire more disease-speciﬁc low-dimensional

representation [13].

To achieve a reliable CBIR system to support clinical

decisions, users must be provided with human-interpretable

reasons for the similarity of the images. However, CNN or

CAE-based dimensionality reduction methods above do not

consider the readability and interpretability of the obtained

low-dimensional representation. For the CBIR system to

enable image retrieval based on disease-related neuropatho-

logical features, each dimension in the resulting representa-

tion must be associated with a neurologically interpretable

region containing known disease-related pathology. With

these capabilities, such a CBIR system does not merely list

the results but also provides the radiologist with the rationale

for the system’s recommendations, offering the possibility of

using the results even more effectively.

This paper proposes a general-purpose, highly inter-

pretable low-dimensional representation acquisition method,

localized variational autoencoder (Loc-VAE), and applies it

to brain MR images to implement a practical CBIR system.

The proposed Loc-VAE adds a new constraint to the β-

variational autoencoder (β-VAE) [14], resulting in a highly

interpretable low-dimensional representation in which each

arXiv:2210.00506v1 [eess.IV] 2 Oct 2022

duplicate 󲤋

perturbation 󲤋

input image 󲤋

difference 󲤋

image󲤋

reconstructed

images

Fig. 1. The schematics of the proposed method.

dimension is independent and responsible for a speciﬁc

portion of the input data, i.e., a local brain region.

II. RELATED WORKS

In this section, we mainly focus on the properties of the

two CNN-based dimensionality reduction methods for CBIR

and the interpretability of β-VAE.

Swati [11] et al. proposed a framework for CBIR using

VGG19 [15] pre-trained on ImageNet [16] and closed-form

metric learning (CFML) [17] of the similarity distance. The

pre-trained VGG19 is ﬁne-tuned on brain MR images with

metric learning, used to determine the optimum metric,

which increases intraclass similarity while decreasing inter-

class similarity. Similarity cases are determined by calcu-

lating the similarity between query and database images by

applying CFML on features of the FC7 layer in VGG19.

Swati et al.’s CNN-based model can acquire the features

to ﬁnd similarity cases without manually creating features.

However, their model needs disease label information in

their ﬁne-tuning stage. Since CBIR is expected to include a

variety of cases, it is not reasonable to build a model based

on a classiﬁcation task for all labels. The low-dimensional

representation also retains features useful for classiﬁcation,

but the readability of each dimension is not mentioned.

Arai et al. proposed CAE-based dimensionality reduction

in CBIR [12]. CAE is an extension of an autoencoder that

uses a using CNN for compression and restoration. The basic

idea behind dimensionality reduction with a CAE is that if

the reconstruction error between the input and the output

is small, the low-dimensional representation retains a large

amount of input information. This methodology is practical

in CBIR because the model can be trained without speciﬁc

label information. Arai et al. have successfully compressed

brain MR images of 5 million dimensions down to 150

dimensions while preserving clinically relevant neuroradi-

ological features. Although a CAE provides a high com-

pression performance by learning to reduce reconstruction

errors, the image and its low-dimensional representation

can be obtained only as a point-to-point relationship in the

respective data space, and continuity around a data point is

not guaranteed. In addition, the interpretability of the low-

dimensional representation is not taken into considered.

Higgins et al. proposed β-VAE [14], a deep unsupervised

generative approach for disentangled low-dimensional rep-

resentation. Like a CAE, β-VAE is a CNN-based encoder-

decoder model, with the most signiﬁcant difference being

that it assumes that the input data are generated from multi-

variate normal distributions. The encoder of β-VAE converts

the input data into a low-dimensional probability distribution

where each dimension follows normal distribution, and the

decoder reconstructs the original data from the distribution.

In other words, in β-VAE, a single data point is embedded as

a low-dimensional probability distribution. Thus, unlike the

CAE, β-VAE guarantees continuity around data points, so

data that are close in the input space are expected to be placed

close in lower-dimensional space. Moreover, since each

dimension of the distribution is independent and regularized,

the resulting low-dimensional representation is much more

neuroanatomically interpretable than in the CAE case. These

are important features in CBIR realization. However, few

studies have obtained disentangled representation evaluated

against brain MR images.

III. PROPOSED METHOD

In this paper, we propose the localized variational au-

toencoder (Loc-VAE), an encoder for acquiring interpretable

low-dimensional representation from brain MR images for

CBIR. Fig. 1 shows an overview of the proposed Loc-VAE.

Loc-VAE is a learning model based on β-VAE [14], which

provides independent embedding for each dimension while

ensuring continuity for each localized region of the brain.

The loss function of Loc-VAE consists of the following two

terms:

L=Lβ-VAE +LLocal.(1)

The ﬁrst term, Lβ-VAE, is the term used in general VAE

models, and the second term, LLocal, is a newly introduced

term to localize the range carried by each dimension of

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Loc-VAE:LearningStructurallyLocalizedRepresentationfrom3DBrainMRImagesforContent-BasedImageRetrievalKeiNishimaki1,KumpeiIkuta1,YutoOnga1,HitoshiIyatomi1,KenichiOishi2fortheAlzheimer'sDiseaseNeuroimagingInitiative*1DepartmentofAppliedInformatics,GraduateSchoolofScienceandEngineering,HoseiUniversity,T...

展开>> 收起<<

Loc-V AE Learning Structurally Localized Representation from 3D Brain MR Images for Content-Based Image Retrieval.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Loc-V AE Learning Structurally Localized Representation from 3D Brain MR Images for Content-Based Image Retrieval

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: