Memory transformers for full context and high-resolution 3D Medical Segmentation Loic Themyr120000000313962383 Cl ement Rambour10000000298993201

2025-05-02 0 0 1.08MB 10 页 10玖币
侵权投诉
Memory transformers for full context and
high-resolution 3D Medical Segmentation
Loic Themyr1,2[0000000313962383], Cl´ement Rambour1[0000000298993201],
Nicolas Thome1[0000000348713045], Toby Collins2[0000000294418306], and
Alexandre Hostettler2[0000000182696766]
1Conservatoire National des Arts et M´etiers, Paris 75014, France
2IRCAD, Strasbourg 67000, France
loic.themyr@lecnam.net
Abstract. Transformer models achieve state-of-the-art results for im-
age segmentation. However, achieving long-range attention, necessary to
capture global context, with high-resolution 3D images is a fundamental
challenge. This paper introduces the Full resolutIoN mEmory (FINE)
transformer to overcome this issue. The core idea behind FINE is to
learn memory tokens to indirectly model full range interactions while
scaling well in both memory and computational costs. FINE introduces
memory tokens at two levels: the first one allows full interaction between
voxels within local image regions (patches), the second one allows full
interactions between all regions of the 3D volume. Combined, they allow
full attention over high resolution images, e.g. 512 x 512 x 256 voxels and
above. Experiments on the BCV image segmentation dataset shows bet-
ter performances than state-of-the-art CNN and transformer baselines,
highlighting the superiority of our full attention mechanism compared to
recent transformer baselines, e.g. CoTr, and nnFormer.
Keywords: Transformers ·3D segmentation ·full context, high-resolution.
1 Introduction
Convolutional encoder-decoder models have achieved remarkable performance
for medical image segmentation [1,10]. U-Net [24] and other U-shaped architec-
tures remain popular and competitive baselines. However, the receptive fields of
these CNNs are small, both in theory and in practice [17], preventing them from
exploiting global context information.
Transformers witnessed huge successes for natural language processing [26,4]
and recently in vision for image classification [5]. One key challenge in 3D se-
mantic segmentation is their scalability, since attention’s complexity is quadratic
with respect to the number of inputs.
Efficient attention mechanisms have been proposed, including sparse or low-
rank attention matrices [21,28], kernel-based methods [20,12], window [16,6], and
memory transformers [22,14]. Multi-resolution transformers [16,29,30] apply at-
tention in a hierarchical manner by chaining multiple window transformers. At-
tention at the highest resolution level is thus limited to local image sub-windows.
arXiv:2210.05313v1 [cs.CV] 11 Oct 2022
2 L. Themyr et al.
The receptive field is gradually increased through pooling operations. Multi-
resolution transformers have recently shown impressive performances for various
2D medical image segmentation tasks such as multi-organ [11,25,2], histopatho-
logical [15], skin [27], or brain [23] segmentation.
Fig. 1. Proposed full resolution memory transformer (FINE). To segment the kidney
voxel in a) (red cross), FINE combines high-resolution and full contextual information,
as shown in the attention map in b). This is in contrast to nnFormer [33] (resp. CoTr
[31]), which receptive field is limited to the green (resp. blue) region in a). FINE thus
properly segments the organs, as show in d).
Recent attempts have been made to apply transformers for 3D medical image
segmentation. nnFormer [33] is a 3D extension of SWIN [16] with a U-shape ar-
chitecture. One limitation relates to the inherent compromise in multi-resolution,
which prevents it from jointly using global context and high-resolution informa-
tion. In [33], only local context is leveraged in the highest-resolution features
maps. Models using deformable transformers such as CoTr [31] are able to lever-
age sparse global-context. A strong limitation shared by nnFormer and CoTr is
that they cannot process large volumes at once and must rely on training the
segmentation model on local 3D random crops. Consequently, full global contex-
tual information is unavailable and positional encoding can be meaningless. On
BCV [13], cropped patch size is about 128 ×128 ×64 which only covers about
6% of the original volume.
This paper introduces the Full resolutIoN mEmory (FINE) transformer. This
is, to the best of our knowledge, the first attempt at processing full-range in-
teractions at all resolution levels with transformers for 3D medical image seg-
mentation. To achieve this goal, memory tokens are used to indirectly enable
full-range interactions between all volume elements, even when training with
3D crops. Inside each 3D crop, FINE introduces memory tokens associated to
local windows. A second level of localized memory is introduced at the volume
level to enable full interactions between all 3D volume patches. We show that
FINE outperforms state-of-the-art CNN, transformers, and hybrid methods on
the 3D multi-organ BCV dataset [13]. Fig. 1 illustrates the rationale of FINE to
segment the red crossed kidney voxel in a). We can see that FINE’s attention
map covers the whole image, enabling to model long-range interactions between
organs. In contrast, the receptive field of state-of-the art methods only cover a
摘要:

Memorytransformersforfullcontextandhigh-resolution3DMedicalSegmentationLoicThemyr1;2[0000000313962383],ClementRambour1[0000000298993201],NicolasThome1[0000000348713045],TobyCollins2[0000000294418306],andAlexandreHostettler2[0000000182696766]1ConservatoireNationaldesArtsetMetiers,Paris75014,France2...

展开>> 收起<<
Memory transformers for full context and high-resolution 3D Medical Segmentation Loic Themyr120000000313962383 Cl ement Rambour10000000298993201.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.08MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注