Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks Xiaofeng Lei1 Shaohua Li1B Xinxing Xu1B Huazhu Fu1 Yong Liu1

2025-05-02 0 0 7.18MB 11 页 10玖币
侵权投诉
Localizing Anatomical Landmarks in Ocular
Images using Zoom-In Attentive Networks
Xiaofeng Lei1, Shaohua Li1B, Xinxing Xu1B, Huazhu Fu1, Yong Liu1,
Yih-Chung Tham2,3, Yangqin Feng1, Mingrui Tan1, Yanyu Xu1, Jocelyn Hui
Lin Goh2, Rick Siow Mong Goh1, and Ching-Yu Cheng2,3
1Institute of High Performance Computing, A*STAR, Singapore
{lei xiaofeng,li shaohua,xuxinx}@ihpc.a-star.edu.sg
2Singapore Eye Research Institute, Singapore National Eye Centre
3Department of Ophthalmology, Yong Loo Lin School of Medicine, NUS, Singapore
Abstract. Localizing anatomical landmarks are important tasks in med-
ical image analysis. However, the landmarks to be localized often lack
prominent visual features. Their locations are elusive and easily con-
fused with the background, and thus precise localization highly depends
on the context formed by their surrounding areas. In addition, the re-
quired precision is usually higher than segmentation and object detec-
tion tasks. Therefore, localization has its unique challenges different
from segmentation or detection. In this paper, we propose a zoom-in
attentive network (ZIAN) for anatomical landmark localization in ocu-
lar images. First, a coarse-to-fine, or “zoom-in” strategy is utilized to
learn the contextualized features in different scales. Then, an atten-
tive fusion module is adopted to aggregate multi-scale features, which
consists of 1) a co-attention network with a multiple regions-of-interest
(ROIs) scheme that learns complementary features from the multiple
ROIs, 2) an attention-based fusion module which integrates the multi-
ROIs features and non-ROI features. We evaluated ZIAN on two open
challenge tasks, i.e., the fovea localization in fundus images and scle-
ral spur localization in AS-OCT images. Experiments show that ZIAN
achieves promising performances and outperforms state-of-the-art local-
ization methods. The source code and trained models of ZIAN are avail-
able at https://github.com/leixiaofeng-astar/OMIA9-ZIAN.
Keywords: fovea localization ·scleral spur localization ·self-attention.
1 Introduction
Localization of anatomical landmarks in medical images is one common task of
medical image analysis. Precise localization plays an important role for some
medical diagnosis. For example, the fovea is an important anatomical landmark
on the posterior pole of the retina which is located in the center of a darker
area of the eye [1]. Fovea location is important in diagnosing eye diseases such
as glaucoma, diabetic retinopathy and macular edema. Similarly, the Scleral
Spur (SS) location is an important anatomical landmark in imaging the anterior
arXiv:2210.02445v2 [eess.IV] 22 Dec 2022
2 X. Lei et al.
Fig. 1: Two typical localization tasks in ocular images. Left: fovea location in
fundus image. Right: Scleral Spur location in AS-OCT image.
chamber angle, as it is a reference point to identify open and narrow/closed
angles based on Optical Coherence Tomography (OCT) images (Fig. 1).
Manually labeling these landmarks by medical experts is expensive and te-
dious. Developing automated approaches for landmark localization is desirable
and has been studied for decades. The conventional computer vision methods
mainly utilize template matching or mathematical morphology techniques to lo-
calize the anatomical landmark [2–5]. However, these methods are sensitive to the
low contrast of the image and the results vary if the images come from a different
source. With more robust performance, machine learning based approaches are
predominantly used for automatic localization of anatomical landmarks [6–8].
In general, there are three types of machine learning approaches for local-
ization [9]. 1) Localization is viewed as a value regression problem [10, 11], and
the coordinates of the target location are directly predicted; 2) Localization is
viewed as a binary segmentation problem that extends the single pixel label to a
small region where the segmented mask center is used as the target position [12];
3) Localization is viewed as heat-map regression task. First we generate a heat-
map around the target position, and then employ regression, morphological or
mathematical methods to estimate the target point [13–17]. Recently, the third
heatmap-regression approach has outperformed the other 2 methods, and our
method is also based on it.
Despite the huge progress in recent years, there are still challenges limiting
the precision of these methods. A common challenge is that input images may
have highly varying scales. A second challenge is that anatomical landmarks
often lack prominent visual features, and the localization highly depends on the
context formed by their surrounding areas.
In this paper, we propose “Zoom-In Attentive Network” (ZIAN) to address
the two challenges above, with ocular images as a case study. First, to be adaptive
to various scales of input images, ZIAN adopts a zoom-in and a multi-scale
ROI schemes; Second, to better incorporate surrounding areas as context for
more precise localization, ZIAN adopts co-attention [18] and self-attention [19]
mechanisms.
In particular, different from the common “zoom-in” strategy in [16,20] which
predicts the final value more accurately based on the first approximation of the
Zoom-In Attentive Networks for Localization 3
region in coarse stage, ZIAN utilizes a “zoom-in” strategy, and a Regions-of-
Interest (ROI) co-attention along with a self-attention mechanism that effectively
fuses the multi-scales features in precise localization. Specifically, in the zoom-in
step, our model performs preliminary positioning of the target through a coarse
network. As a result, multiple ROIs in different scales are cropped according to
the preliminary position, which are used as the input to the fine network. In the
attention step, a ROI co-attention [21, 22] module and a self-attention [23–26]
module work together to fuse the multi-ROI features. The ROI co-attention
module fuses and complements the features of multi-ROIs. In addition, the self-
attention module fuses the multi-ROI features with the output features from
the coarse network for more accurate localization. The main contributions of
this paper are summarized as follows:
1. Different from most existing localization frameworks, we present a “Zoom-In
Attentive Network” (ZIAN) that uses a coarse-to-fine zoom-in strategy, and
a ROI co-attention/self-attention scheme in landmark localization.
2. A novel attentive fusion module is proposed to adaptively fuse features from
different ROIs, and then fuse the multi-scale ROI features with the coarse
features, so that the model learns to combine features of multiple scales and
multiple ROIs for better prediction.
3. We evaluated ZIAN on two common ocular image tasks, i.e., fovea localiza-
tion in fundus images, and Scleral Spur (SS) localization in Anterior Segment
Optical Coherence Tomography (AS-OCT) images. The effectiveness of the
method is validated by comparing it with various state-of-the-art methods.
2 Method
In this section, we provide details for our proposed Zoom-In Attentive Networks
(ZIAN), which consists of two main components: the Zoom-in Module and the
Attentive Fusion Module which includes the details of ROI co-attention and
self-attention fusion module.
2.1 Zoom-In Module
As shown in Fig. 2, ZIAN has a coarse network and a fine network. The input
image Iinput is down-sampled by 4×and fed into a pre-trained base network
HRNet [27] to get per-pixel heat-maps in the coarse network. The peak pixel is
then located as the preliminary positioning of the target. Then, multiple scale
ROIs centered at the preliminary location are cropped as the input of the fine
network. The resized ROI images Ia
roi and Ib
roi are fed in parallel into the pre-
trained model to build their feature representations individually. Next, multi-
ROIs features Va
roi and Va
roi are processed through an attentive fusion module to
get a fine-scale heat-map. The peak pixel in the fine-scale heat-map is located
as the final coordinate of the target. We utilize HRNet [27] as the pre-trained
backbone in the figure. It can be replaced with any state-of-the-art backbone
(U-Net [28], EfficientNet [29], YOLO [30], RCNN [31], etc.).
摘要:

LocalizingAnatomicalLandmarksinOcularImagesusingZoom-InAttentiveNetworksXiaofengLei1,ShaohuaLi1B,XinxingXu1B,HuazhuFu1,YongLiu1,Yih-ChungTham2;3,YangqinFeng1,MingruiTan1,YanyuXu1,JocelynHuiLinGoh2,RickSiowMongGoh1,andChing-YuCheng2;31InstituteofHighPerformanceComputing,A*STAR,Singaporefleixiaofeng,l...

展开>> 收起<<
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks Xiaofeng Lei1 Shaohua Li1B Xinxing Xu1B Huazhu Fu1 Yong Liu1.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:7.18MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注