Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features

2025-05-02 0 0 1.27MB 53 页 10玖币
侵权投诉
Medical Image Retrieval via Nearest Neighbor Search
on Pre-trained Image Features
Deepak Gupta, Russell Loane, Soumya Gayen, Dina Demner-Fushman
Lister Hill National Center for Biomedical Communications
National Library of Medicine, National Institutes of Health
Bethesda, MD, USA
Abstract
Nearest neighbor search (NNS) aims to locate the points in high-dimensional
space that is closest to the query point. The brute-force approach for finding
the nearest neighbor becomes computationally infeasible when the number
of points is large. The NNS has multiple applications in medicine, such as
searching large medical imaging databases, disease classification, and diagnosis.
With a focus on medical imaging, this paper proposes DenseLinkSearch an
effective and efficient algorithm that searches and retrieves the relevant images
from heterogeneous sources of medical images. Towards this, given a medical
database, the proposed algorithm builds an index that consists of pre-computed
links of each point in the database. The search algorithm utilizes the index to
efficiently traverse the database in search of the nearest neighbor. We extensively
tested the proposed NNS approach and compared the performance with state-
of-the-art NNS approaches on benchmark datasets and our created medical
image datasets. The proposed approach outperformed the existing approaches
in terms of retrieving accurate neighbors and retrieval speed. We also explore
the role of medical image feature representation in content-based medical image
retrieval tasks. We propose a Transformer-based feature representation technique
that outperformed the existing pre-trained Transformer-based approaches on
Corresponding author
Email addresses: deepak.gupta@nih.gov (Deepak Gupta), russellloane@gmail.com
(Russell Loane), soumya.gayen@nih.gov (Soumya Gayen), ddemner@mail.nih.gov (Dina
Demner-Fushman)
Preprint submitted to Artificial Intelligence In Medicine October 6, 2022
arXiv:2210.02401v1 [cs.CV] 5 Oct 2022
CLEF 2011 medical image retrieval task. The source code and datasets of our
experiments are available at https://github.com/deepaknlp/DLS.
Keywords: Content-based image retrieval, Nearest neighbor search, Image
feature representation, Indexing and Searching in High Dimensions
1. Introduction
Over the past few decades, medical imaging has significantly improved
healthcare services. Medical imaging helps to save lives, increase life expectancy,
lower mortality rates, reduce the need for exploratory surgery, and shorten
hospital stays. With medical imaging, the physician makes better medical
decisions regarding diagnosis and treatment. Medical imaging procedures are
non-invasive and painless and often do not necessitate any particular preparation
beforehand. With the growing demand for medical imaging, the workload of
radiologists has increased significantly over the past decades. Mayo Clinic has
observed a ten-fold increase in the demand for radiology imaging from just over
9 million in 1999 to more than 94 million in 2010 (McDonald et al., 2015). To
meet the growing demand, radiologists must process one image every three to
four seconds (McDonald et al., 2015). Consequently, the increase in workload
may lead to the incorrect interpretation of the radiology images and compromise
the quality and safety of patient care.
The recent advancement in the Artificial intelligence (AI) fields of computer
vision and machine learning has the potential to quickly interpret and analyze
different forms of medical images (Lambin et al., 2012; Gupta et al., 2021;
Yu et al., 2020) and videos (Gupta et al., 2022; Gupta & Demner-Fushman,
2022). Content-based image retrieval (CBIR) is one of the key tasks in analyzing
medical images. It involves indexing the large-scale medical-image datasets and
retrieving visually similar images from the existing datasets. With an efficient
CBIR system, one can browse, search, and retrieve from the databases images
that are visually similar to the query image.
CBIR systems are used to support cancer diagnosis (Wei et al., 2009; Bressan
2
et al., 2019), diagnosis of infectious diseases (Zhong et al., 2021) and analyze the
central nervous system (Mesbah et al., 2015; Conjeti et al., 2016; Li et al., 2018b),
biomedical image archive (Antani et al., 2004), malaria parasite detection (Khan
et al., 2011; Rajaraman et al., 2018; Kassim et al., 2020). Given the growing
size of the medical imaging databases, efficiently finding the relevant images
is still an important issue to address. Consider a large-scale medical imaging
database with hundreds of thousands to millions of medical images, in which
each image is represented by high-dimensional (thousands of features) dense
vectors. Searching over the millions of images in such high-dimensional space
requires an efficient search. The features used to represent the image are another
key aspect that affects the image search results. Image features with reduced
expressive ability often fail to discriminate the images with the near-similar
visual appearance. The role of image features becomes more prominent with
image search applications that search over millions of images and demand a
higher degree of precision. To address the aforementioned challenges, we focus
on developing an algorithm that can efficiently search over millions of medical
images. We also examine the role of image features in obtaining relevant and
similar images from large-scale medical imaging datasets.
This study presents DenseLinkSearch an efficient algorithm to search and
retrieve the relevant images from the heterogeneous sources of medical images
and nearest neighbor search benchmark datasets. We first index the feature
vectors of the images. The indexing produces a graph with feature vectors as
vertices and euclidean distance between the endpoints vectors as edges. In the
literature, the tree-based data structure has been used to build indexes to speed
up search retrieval. Beygelzimer et al. (2006) proposed Cover Tree that was
specifically designed to facilitate the speed up of the nearest neighbor search by
efficiently building the index. We compare our proposed DenseLinkSearch
with the existing tree-based and approximate nearest neighbor approaches and
provide a detailed quantitative analysis.
To evaluate the proposed DenseLinkSearch algorithm, we collected 12
,
851
,
263
3
medical images from the Open
i1
biomedical search engine. We extend our
experiments on 11 benchmarked NNS datasets (Artificial, Faces, Corel, MNIST,
FMNIST, TinyImages, CovType, Twitter, YearPred, SIFT, and GIST). The
experimental results show that our proposed DenseLinkSearch is more effi-
cient and accurate in finding the nearest neighbors in comparison to the existing
approaches.
We summarize the contributions of our study as follows:
1.
We devise a robust nearest neighbor search algorithm DenseLinkSearch
to efficiently search large-scale datasets in which the data points are often
represented by the high dimension vectors. To perform the search, we
develop an indexing technique that processes the dataset and builds a
graph to store the link information of each data point present in the dataset.
The created graph in the form of an index is used to quickly scan over the
millions of data points in search of the nearest neighbors of the query data
point.
2.
We also perform an extensive study on the role of features that are used
to represent medical images in the dataset. To assess the effectiveness of
the features in retrieving the relevant images, we explore multiple deep
neural-based features such as ResNet, ViT, and ConvNeXt and analyze
their effectiveness in accurately representing the images in high-dimensional
spaces.
3.
We demonstrate the effectiveness of our proposed DenseLinkSearch
on newly created Open
i
medical imaging datasets and eleven other
benchmarked NNS datasets. The results show that our proposed NNS
technique accurately searches the nearest neighbor orders of magnitude
faster than any comparable algorithm.
1https://openi.nlm.nih.gov/
4
2. Related Work
2.1. Content-based Image Retrieval
Content-based image retrieval focuses on retrieving images by considering the
visual content of the image, such as color, texture, shape, size, intensity, location,
etc. For the instance of medical image retrieval, Xue et al. (2008) introduced the
CervigramFinder system that operates on cervicographic images and aims to
find similar images in the database as per the user-defined region. The system
extracted color, texture, and size as the visual features. Antani et al. (2007)
developed SPIRS-IRMA that combines the capability of IRMA (Lehmann et al.,
2004) system (global image data) and SPIRS (Hsu et al., 2007) system (local
region-of-interest image data) to facilitate retrieval based not only on the whole
image but also on local image features so that users can retrieve images that are
not only similar in terms of their overall appearance but also similar in terms
of the pathology that is displayed locally. Depeursinge et al. (2011) proposed a
3D localization system based on lung anatomy that is used to localize low-level
features used for CBIR. The image retrieval task of the Conference and Labs
of the Evaluation Forum (ImageCLEF) has organized multiple medical image
retrieval tasks (Clough et al., 2004; M¨uller et al., 2009; Kalpathy-Cramer et al.,
2011; M¨uller et al., 2012) from the year 2004 to 2013. ImageCLEF has provided
a venue for the researcher to present their findings and engage in head-to-head
comparisons of the efficiency of their medical image retrieval strategies. Over the
years, the participants at ImageCLEF made use of a diverse selection of local
and global textural features. These included the Tamura features: coarseness,
contrast, directionality, line-likeness, regularity, and roughness. Multiple filters
such as Gabor, Haar, and Gaussian filters have been used to generate a diverse
set of visual features. The visual features (Kalpathy-Cramer et al., 2015) for
medical image retrieval are also generated using Haralick’s co-occurrence matrix
and fractal dimensions.
Rahman et al. (2008) proposed a content-based image retrieval framework
that deals with the diverse collections of medical images of different modalities,
5
摘要:

MedicalImageRetrievalviaNearestNeighborSearchonPre-trainedImageFeaturesDeepakGupta,RussellLoane,SoumyaGayen,DinaDemner-FushmanListerHillNationalCenterforBiomedicalCommunicationsNationalLibraryofMedicine,NationalInstitutesofHealthBethesda,MD,USAAbstractNearestneighborsearch(NNS)aimstolocatethepoints...

展开>> 收起<<
Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features.pdf

共53页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:53 页 大小:1.27MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 53
客服
关注