Memory transformers for full context and high-resolution 3D Medical Segmentation Loic Themyr120000000313962383 Cl ement Rambour10000000298993201

2025-05-02 0 0 1.08MB 10 页 10玖币

侵权投诉

Memory transformers for full context and

high-resolution 3D Medical Segmentation

Loic Themyr1,2[0000−0003−1396−2383], Cl´ement Rambour1[0000−0002−9899−3201],

Nicolas Thome1[0000−0003−4871−3045], Toby Collins2[0000−0002−9441−8306], and

Alexandre Hostettler2[0000−0001−8269−6766]

1Conservatoire National des Arts et M´etiers, Paris 75014, France

2IRCAD, Strasbourg 67000, France

loic.themyr@lecnam.net

Abstract. Transformer models achieve state-of-the-art results for im-

age segmentation. However, achieving long-range attention, necessary to

capture global context, with high-resolution 3D images is a fundamental

challenge. This paper introduces the Full resolutIoN mEmory (FINE)

transformer to overcome this issue. The core idea behind FINE is to

learn memory tokens to indirectly model full range interactions while

scaling well in both memory and computational costs. FINE introduces

memory tokens at two levels: the ﬁrst one allows full interaction between

voxels within local image regions (patches), the second one allows full

interactions between all regions of the 3D volume. Combined, they allow

full attention over high resolution images, e.g. 512 x 512 x 256 voxels and

above. Experiments on the BCV image segmentation dataset shows bet-

ter performances than state-of-the-art CNN and transformer baselines,

highlighting the superiority of our full attention mechanism compared to

recent transformer baselines, e.g. CoTr, and nnFormer.

Keywords: Transformers ·3D segmentation ·full context, high-resolution.

1 Introduction

Convolutional encoder-decoder models have achieved remarkable performance

for medical image segmentation [1,10]. U-Net [24] and other U-shaped architec-

tures remain popular and competitive baselines. However, the receptive ﬁelds of

these CNNs are small, both in theory and in practice [17], preventing them from

exploiting global context information.

Transformers witnessed huge successes for natural language processing [26,4]

and recently in vision for image classiﬁcation [5]. One key challenge in 3D se-

mantic segmentation is their scalability, since attention’s complexity is quadratic

with respect to the number of inputs.

Eﬃcient attention mechanisms have been proposed, including sparse or low-

rank attention matrices [21,28], kernel-based methods [20,12], window [16,6], and

memory transformers [22,14]. Multi-resolution transformers [16,29,30] apply at-

tention in a hierarchical manner by chaining multiple window transformers. At-

tention at the highest resolution level is thus limited to local image sub-windows.

arXiv:2210.05313v1 [cs.CV] 11 Oct 2022

2 L. Themyr et al.

The receptive ﬁeld is gradually increased through pooling operations. Multi-

resolution transformers have recently shown impressive performances for various

2D medical image segmentation tasks such as multi-organ [11,25,2], histopatho-

logical [15], skin [27], or brain [23] segmentation.

Fig. 1. Proposed full resolution memory transformer (FINE). To segment the kidney

voxel in a) (red cross), FINE combines high-resolution and full contextual information,

as shown in the attention map in b). This is in contrast to nnFormer [33] (resp. CoTr

[31]), which receptive ﬁeld is limited to the green (resp. blue) region in a). FINE thus

properly segments the organs, as show in d).

Recent attempts have been made to apply transformers for 3D medical image

segmentation. nnFormer [33] is a 3D extension of SWIN [16] with a U-shape ar-

chitecture. One limitation relates to the inherent compromise in multi-resolution,

which prevents it from jointly using global context and high-resolution informa-

tion. In [33], only local context is leveraged in the highest-resolution features

maps. Models using deformable transformers such as CoTr [31] are able to lever-

age sparse global-context. A strong limitation shared by nnFormer and CoTr is

that they cannot process large volumes at once and must rely on training the

segmentation model on local 3D random crops. Consequently, full global contex-

tual information is unavailable and positional encoding can be meaningless. On

BCV [13], cropped patch size is about 128 ×128 ×64 which only covers about

6% of the original volume.

This paper introduces the Full resolutIoN mEmory (FINE) transformer. This

is, to the best of our knowledge, the ﬁrst attempt at processing full-range in-

teractions at all resolution levels with transformers for 3D medical image seg-

mentation. To achieve this goal, memory tokens are used to indirectly enable

full-range interactions between all volume elements, even when training with

3D crops. Inside each 3D crop, FINE introduces memory tokens associated to

local windows. A second level of localized memory is introduced at the volume

level to enable full interactions between all 3D volume patches. We show that

FINE outperforms state-of-the-art CNN, transformers, and hybrid methods on

the 3D multi-organ BCV dataset [13]. Fig. 1 illustrates the rationale of FINE to

segment the red crossed kidney voxel in a). We can see that FINE’s attention

map covers the whole image, enabling to model long-range interactions between

organs. In contrast, the receptive ﬁeld of state-of-the art methods only cover a

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Memorytransformersforfullcontextandhigh-resolution3DMedicalSegmentationLoicThemyr1;2[0000000313962383],ClementRambour1[0000000298993201],NicolasThome1[0000000348713045],TobyCollins2[0000000294418306],andAlexandreHostettler2[0000000182696766]1ConservatoireNationaldesArtsetMetiers,Paris75014,France2...

展开>> 收起<<

Memory transformers for full context and high-resolution 3D Medical Segmentation Loic Themyr120000000313962383 Cl ement Rambour10000000298993201.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Memory transformers for full context and high-resolution 3D Medical Segmentation Loic Themyr120000000313962383 Cl ement Rambour10000000298993201

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: