Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks Xiaofeng Lei1 Shaohua Li1B Xinxing Xu1B Huazhu Fu1 Yong Liu1

2025-05-02 0 0 7.18MB 11 页 10玖币

侵权投诉

Localizing Anatomical Landmarks in Ocular

Images using Zoom-In Attentive Networks

Xiaofeng Lei1, Shaohua Li1B, Xinxing Xu1B, Huazhu Fu1, Yong Liu1,

Yih-Chung Tham2,3, Yangqin Feng1, Mingrui Tan1, Yanyu Xu1, Jocelyn Hui

Lin Goh2, Rick Siow Mong Goh1, and Ching-Yu Cheng2,3

1Institute of High Performance Computing, A*STAR, Singapore

{lei xiaofeng,li shaohua,xuxinx}@ihpc.a-star.edu.sg

2Singapore Eye Research Institute, Singapore National Eye Centre

3Department of Ophthalmology, Yong Loo Lin School of Medicine, NUS, Singapore

Abstract. Localizing anatomical landmarks are important tasks in med-

ical image analysis. However, the landmarks to be localized often lack

prominent visual features. Their locations are elusive and easily con-

fused with the background, and thus precise localization highly depends

on the context formed by their surrounding areas. In addition, the re-

quired precision is usually higher than segmentation and object detec-

tion tasks. Therefore, localization has its unique challenges diﬀerent

from segmentation or detection. In this paper, we propose a zoom-in

attentive network (ZIAN) for anatomical landmark localization in ocu-

lar images. First, a coarse-to-ﬁne, or “zoom-in” strategy is utilized to

learn the contextualized features in diﬀerent scales. Then, an atten-

tive fusion module is adopted to aggregate multi-scale features, which

consists of 1) a co-attention network with a multiple regions-of-interest

(ROIs) scheme that learns complementary features from the multiple

ROIs, 2) an attention-based fusion module which integrates the multi-

ROIs features and non-ROI features. We evaluated ZIAN on two open

challenge tasks, i.e., the fovea localization in fundus images and scle-

ral spur localization in AS-OCT images. Experiments show that ZIAN

achieves promising performances and outperforms state-of-the-art local-

ization methods. The source code and trained models of ZIAN are avail-

able at https://github.com/leixiaofeng-astar/OMIA9-ZIAN.

Keywords: fovea localization ·scleral spur localization ·self-attention.

1 Introduction

Localization of anatomical landmarks in medical images is one common task of

medical image analysis. Precise localization plays an important role for some

medical diagnosis. For example, the fovea is an important anatomical landmark

on the posterior pole of the retina which is located in the center of a darker

area of the eye [1]. Fovea location is important in diagnosing eye diseases such

as glaucoma, diabetic retinopathy and macular edema. Similarly, the Scleral

Spur (SS) location is an important anatomical landmark in imaging the anterior

arXiv:2210.02445v2 [eess.IV] 22 Dec 2022

2 X. Lei et al.

Fig. 1: Two typical localization tasks in ocular images. Left: fovea location in

fundus image. Right: Scleral Spur location in AS-OCT image.

chamber angle, as it is a reference point to identify open and narrow/closed

angles based on Optical Coherence Tomography (OCT) images (Fig. 1).

Manually labeling these landmarks by medical experts is expensive and te-

dious. Developing automated approaches for landmark localization is desirable

and has been studied for decades. The conventional computer vision methods

mainly utilize template matching or mathematical morphology techniques to lo-

calize the anatomical landmark [2–5]. However, these methods are sensitive to the

low contrast of the image and the results vary if the images come from a diﬀerent

source. With more robust performance, machine learning based approaches are

predominantly used for automatic localization of anatomical landmarks [6–8].

In general, there are three types of machine learning approaches for local-

ization [9]. 1) Localization is viewed as a value regression problem [10, 11], and

the coordinates of the target location are directly predicted; 2) Localization is

viewed as a binary segmentation problem that extends the single pixel label to a

small region where the segmented mask center is used as the target position [12];

3) Localization is viewed as heat-map regression task. First we generate a heat-

map around the target position, and then employ regression, morphological or

mathematical methods to estimate the target point [13–17]. Recently, the third

heatmap-regression approach has outperformed the other 2 methods, and our

method is also based on it.

Despite the huge progress in recent years, there are still challenges limiting

the precision of these methods. A common challenge is that input images may

have highly varying scales. A second challenge is that anatomical landmarks

often lack prominent visual features, and the localization highly depends on the

context formed by their surrounding areas.

In this paper, we propose “Zoom-In Attentive Network” (ZIAN) to address

the two challenges above, with ocular images as a case study. First, to be adaptive

to various scales of input images, ZIAN adopts a zoom-in and a multi-scale

ROI schemes; Second, to better incorporate surrounding areas as context for

more precise localization, ZIAN adopts co-attention [18] and self-attention [19]

mechanisms.

In particular, diﬀerent from the common “zoom-in” strategy in [16,20] which

predicts the ﬁnal value more accurately based on the ﬁrst approximation of the

Zoom-In Attentive Networks for Localization 3

region in coarse stage, ZIAN utilizes a “zoom-in” strategy, and a Regions-of-

Interest (ROI) co-attention along with a self-attention mechanism that eﬀectively

fuses the multi-scales features in precise localization. Speciﬁcally, in the zoom-in

step, our model performs preliminary positioning of the target through a coarse

network. As a result, multiple ROIs in diﬀerent scales are cropped according to

the preliminary position, which are used as the input to the ﬁne network. In the

attention step, a ROI co-attention [21, 22] module and a self-attention [23–26]

module work together to fuse the multi-ROI features. The ROI co-attention

module fuses and complements the features of multi-ROIs. In addition, the self-

attention module fuses the multi-ROI features with the output features from

the coarse network for more accurate localization. The main contributions of

this paper are summarized as follows:

1. Diﬀerent from most existing localization frameworks, we present a “Zoom-In

Attentive Network” (ZIAN) that uses a coarse-to-ﬁne zoom-in strategy, and

a ROI co-attention/self-attention scheme in landmark localization.

2. A novel attentive fusion module is proposed to adaptively fuse features from

diﬀerent ROIs, and then fuse the multi-scale ROI features with the coarse

features, so that the model learns to combine features of multiple scales and

multiple ROIs for better prediction.

3. We evaluated ZIAN on two common ocular image tasks, i.e., fovea localiza-

tion in fundus images, and Scleral Spur (SS) localization in Anterior Segment

Optical Coherence Tomography (AS-OCT) images. The eﬀectiveness of the

method is validated by comparing it with various state-of-the-art methods.

2 Method

In this section, we provide details for our proposed Zoom-In Attentive Networks

(ZIAN), which consists of two main components: the Zoom-in Module and the

Attentive Fusion Module which includes the details of ROI co-attention and

self-attention fusion module.

2.1 Zoom-In Module

As shown in Fig. 2, ZIAN has a coarse network and a ﬁne network. The input

image Iinput is down-sampled by 4×and fed into a pre-trained base network

HRNet [27] to get per-pixel heat-maps in the coarse network. The peak pixel is

then located as the preliminary positioning of the target. Then, multiple scale

ROIs centered at the preliminary location are cropped as the input of the ﬁne

network. The resized ROI images Ia

roi and Ib

roi are fed in parallel into the pre-

trained model to build their feature representations individually. Next, multi-

ROIs features Va

roi and Va

roi are processed through an attentive fusion module to

get a ﬁne-scale heat-map. The peak pixel in the ﬁne-scale heat-map is located

as the ﬁnal coordinate of the target. We utilize HRNet [27] as the pre-trained

backbone in the ﬁgure. It can be replaced with any state-of-the-art backbone

(U-Net [28], EﬃcientNet [29], YOLO [30], RCNN [31], etc.).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LocalizingAnatomicalLandmarksinOcularImagesusingZoom-InAttentiveNetworksXiaofengLei1,ShaohuaLi1B,XinxingXu1B,HuazhuFu1,YongLiu1,Yih-ChungTham2;3,YangqinFeng1,MingruiTan1,YanyuXu1,JocelynHuiLinGoh2,RickSiowMongGoh1,andChing-YuCheng2;31InstituteofHighPerformanceComputing,A*STAR,Singaporefleixiaofeng,l...

展开>> 收起<<

Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks Xiaofeng Lei1 Shaohua Li1B Xinxing Xu1B Huazhu Fu1 Yong Liu1.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks Xiaofeng Lei1 Shaohua Li1B Xinxing Xu1B Huazhu Fu1 Yong Liu1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: