FocalUNETR A Focal Transformer for Boundary-aware Prostate Segmentation using CT Images

2025-04-27 0 0 1.74MB 13 页 10玖币
侵权投诉
FocalUNETR: A Focal Transformer for
Boundary-aware Prostate Segmentation using
CT Images
Chengyin Li1, Yao Qiang1, Rafi Ibn Sultan1, Hassan Bagher-Ebadian2,
Prashant Khanduri1, Indrin J. Chetty2, and Dongxiao Zhu1(B)
1Department of Computer Science, Wayne State University, Detroit MI, USA
dzhu@wayne.edu
2Department of Radiation Oncology, Henry Ford Cancer Institute, Detroit MI, USA
Abstract. Computed Tomography (CT) based precise prostate segmen-
tation for treatment planning is challenging due to (1) the unclear bound-
ary of the prostate derived from CT’s poor soft tissue contrast and (2)
the limitation of convolutional neural network-based models in capturing
long-range global context. Here we propose a novel focal transformer-
based image segmentation architecture to effectively and efficiently ex-
tract local visual features and global context from CT images. Addi-
tionally, we design an auxiliary boundary-induced label regression task
coupled with the main prostate segmentation task to address the unclear
boundary issue in CT images. We demonstrate that this design signifi-
cantly improves the quality of the CT-based prostate segmentation task
over other competing methods, resulting in substantially improved per-
formance, i.e., higher Dice Similarity Coefficient, lower Hausdorff Dis-
tance, and Average Symmetric Surface Distance, on both private and
public CT image datasets. Our code is available at this link.
Keywords: Focal transformer ·Prostate segmentation ·Computed to-
mography ·Boundary-aware
1 Introduction
Prostate cancer is a leading cause of cancer-related deaths in adult males, as
reported in studies, such as [17]. A common treatment option for prostate can-
cer is external beam radiation therapy (EBRT) [4], where CT scanning is a
cost-effective tool for the treatment planning process compared with the more
expensive magnetic resonance imaging (MRI). As a result, precise prostate seg-
mentation in CT images becomes a crucial step, as it helps to ensure that the
radiation doses are delivered effectively to the tumor tissues while minimizing
harm to the surrounding healthy tissues.
Due to the relatively low spatial resolution and soft tissue contrast in CT im-
ages compared to MRI images, manual prostate segmentation in CT images can
be time-consuming and may result in significant variations between operators
[10]. Several automated segmentation methods have been proposed to alleviate
arXiv:2210.03189v2 [eess.IV] 18 Jul 2023
2 C. Li et al.
these issues, especially the fully convolutional networks (FCN) based U-Net [19]
(an encoder-decoder architecture with skip connections to preserve details and
extract local visual features) and its variants [14,23,26]. Despite good progress,
these methods often have limitations in capturing long-range relationships and
global context information [2] due to the inherent bias of convolutional opera-
tions. Researchers naturally turn to ViT [5], powered with self-attention (SA), for
more possibilities: TransUNet first [2] adapts ViT to medical image segmentation
tasks by connecting several layers of the transformer module (multi-head SA)
to the FCN-based encoder for better capturing the global context information
from the high-level feature maps. TransFuse [25] and MedT [21] use a combined
FCN and Transformer architecture with two branches to capture global depen-
dency and low-level spatial details more effectively. Swin-UNet [1] is the first
U-shaped network based purely on more efficient Swin Transformers [12] and
outperforms models with FCN-based methods. UNETR [6] and SiwnUNETR
[20] are Transformer architectures extended for 3D inputs.
In spite of the improved performance for the aforementioned ViT-based net-
works, these methods utilize the standard or shifted-window-based SA, which
is the fine-grained local SA and may overlook the local and global interactions
[24,18]. As reported by [20], even pre-trained with a massive amount of medical
data using self-supervised learning, the performance of prostate segmentation
task using high-resolution and better soft tissue contrast MRI images has not
been completely satisfactory, not to mention the lower-quality CT images. Ad-
ditionally, the unclear boundary of the prostate in CT images derived from the
low soft tissue contrast is not properly addressed [7,22].
Recently, Focal Transformer [24] is proposed for general computer vision
tasks, in which focal self-attention is leveraged to incorporate both fine-grained
local and coarse-grained global interactions. Each token attends its closest sur-
rounding tokens with fine granularity, and the tokens far away with coarse granu-
larity; thus, focal SA can capture both short- and long-range visual dependencies
efficiently and effectively. Inspired by this work, we propose the FocalUNETR
(Focal U-NEt TRansformers), a novel focal transformer architecture for CT-
based medical image segmentation (Fig. 1A). Even though prior works such as
Psi-Net [15] incorporates additional decoders to enhance boundary detection
and distance map estimation, they either lack the capacity for effective global
context capture through FCN-based techniques or overlook the significance of
considering the randomness of the boundary, particularly in poor soft tissue con-
trast CT images for prostate segmentation. In contrast, our approach utilizes a
multi-task learning strategy that leverages a Gaussian kernel over the bound-
ary of the ground truth segmentation mask [11] as an auxiliary boundary-aware
contour regression task (Fig. 1B). This serves as a regularization term for the
main task of generating the segmentation mask. And the auxiliary task enhances
the model’s generalizability by addressing the challenge of unclear boundaries in
low-contrast CT images.
In this paper, we make several new contributions. First, we develop a novel
focal transformer model (FocalUNETR) for CT-based prostate segmentation,
FocalUNETR 3
Patch Partition
Linear Embed
Blocks
Merging
Blocks
Merging
Blocks
Merging
Blocks
Merging
Stage1
Stage2
Stage3
Stage4
Head
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Res-
Block
Input
Res-
Block
Ground
Truth
Induced
Boundary
Sensitive
Label
(A)
(B)
Focal Transformer
Block
Patch Merging
Deconv
Res-
Block Residual
Block
Bottleneck
Feature
Concatenation Multi-layer
Perceptron
LayerNorm
LayerNorm
Focal
Self-Attention
Hidden Feature
Head
Fig. 1. The architecture of FocalUNETR as (A) the main task for prostate segmenta-
tion and (B) a boundary-aware regression auxiliary task.
which makes use of focal SA to hierarchically learn the feature maps accounting
for both short- and long-range visual dependencies efficiently and effectively.
Second, we also address the challenge of unclear boundaries specific to CT images
by incorporating an auxiliary task of contour regression. Third, our methodology
advances state-of-the-art performance via extensive experiments on both real-
world and benchmark datasets.
2 Methods
2.1 FocalUNETR
Our FocalUNETR architecture (Fig. 1) follows a multi-scale design similar to
[6,20], enabling us to obtain hierarchical feature maps at different stages. The
input medical image X ∈ RC×H×Wis first split into a sequence of tokens with
dimension H
H⌉×⌈W
W, where H, W represent spatial height and width, respec-
tively, and Crepresents the number of channels. These tokens are then projected
into an embedding space of dimension Dusing a patch of resolution (H, W ).
The SA is computed at two focal levels [24]: fine-grained and coarse-grained, as
illustrated in Fig. 2A. The focal SA attends to fine-grained tokens locally, while
summarized tokens are attended to globally (reducing computational cost). We
perform focal SA at the window level, where a feature map of x∈ Rd×H′′ ×W′′
with spatial size H′′ ×W′′ and dchannels is partitioned into a grid of windows
with size sw×sw. For each window, we extract its surroundings using focal SA.
For window-wise focal SA [24], there are three terms {L, sw, sr}. Focal level
Lis the number of granularity levels for which we extract the tokens for our
focal SA. We present an example, depicted in Fig. 2B, that illustrates the use
of two focal levels (fine and coarse) for capturing the interaction of local and
摘要:

FocalUNETR:AFocalTransformerforBoundary-awareProstateSegmentationusingCTImagesChengyinLi1,YaoQiang1,RafiIbnSultan1,HassanBagher-Ebadian2,PrashantKhanduri1,IndrinJ.Chetty2,andDongxiaoZhu1(B)1DepartmentofComputerScience,WayneStateUniversity,DetroitMI,USAdzhu@wayne.edu2DepartmentofRadiationOncology,Hen...

展开>> 收起<<
FocalUNETR A Focal Transformer for Boundary-aware Prostate Segmentation using CT Images.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:13 页 大小:1.74MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注