An Anatomy-aware Framework for Automatic Segmentation of Parotid Tumor from Multimodal MRI

2025-04-30 0 0 2.8MB 17 页 10玖币
侵权投诉
An Anatomy-aware Framework for Automatic Segmentation of
Parotid Tumor from Multimodal MRI
Yifan Gaoa,b,1, Yin Daia,c,,1, Fayu Liud, Weibing Chena,c and Lifu Shie
aCollege of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China
bSchool of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei,
230026, China
cEngineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang, 110169, China
dDepartment of Oromaxillofacial-Head and Neck Surgery, School of Stomatology, China Medical University, Shenyang, 110002, China
eLiaoning Jiayin Medical Technology Co., LTD, Shenyang, 110170, China
ARTICLE INFO
Keywords:
Parotid tumor segmentation
Multimodal fusion
Anatomy-aware loss
Deep learning
Transformer
ABSTRACT
Magnetic Resonance Imaging (MRI) plays an important role in diagnosing the parotid tumor,
where accurate segmentation of tumors is highly desired for determining appropriate treatment
plans and avoiding unnecessary surgery. However, the task remains nontrivial and challenging
due to ambiguous boundaries and various sizes of the tumor, as well as the presence of a large
number of anatomical structures around the parotid gland that are similar to the tumor. To over-
come these problems, we propose a novel anatomy-aware framework for automatic segmentation
of parotid tumors from multimodal MRI. First, a Transformer-based multimodal fusion network
PT-Net is proposed in this paper. The encoder of PT-Net extracts and fuses contextual informa-
tion from three modalities of MRI from coarse to fine, to obtain cross-modality and multi-scale
tumor information. The decoder stacks the feature maps of different modalities and calibrates the
multimodal information using the channel attention mechanism. Second, considering that the
segmentation model is prone to be disturbed by similar anatomical structures and make wrong
predictions, we design anatomy-aware loss. By calculating the distance between the activation
regions of the prediction segmentation and the ground truth, our loss function forces the model
to distinguish similar anatomical structures with the tumor and make correct predictions. Exten-
sive experiments with MRI scans of the parotid tumor showed that our PT-Net achieved higher
segmentation accuracy than existing networks. The anatomy-aware loss outperformed state-of-
the-art loss functions for parotid tumor segmentation. Our framework can potentially improve
the quality of preoperative diagnosis and surgery planning of parotid tumors.
1. Introduction
Parotid tumors are the most common salivary gland tumors, accounting for approximately 2% to 6% of head and
neck tumors Jones et al. (2008). Parotid tumors are categorized into five types according to their clinical characteristics:
pleomorphic adenomas, Warthin tumors, basal cell adenomas, malignant tumors, and other minor benign lesions Jang
et al. (2004); Mendenhall et al. (2008); Zheng et al. (2021). It has been estimated that about 20% of all parotid tumor
cases are malignant Bussu et al. (2011).
Despite the relatively low incidence of the parotid tumor, the rate of clinical misdiagnosis before surgery is high due
to the heterogeneity and the diversity of types Assadsangabi et al. (2022). Currently, the primary treatment for parotid
tumors is resection surgery Poletti et al. (2018). Inappropriate surgical planning, however, can result in incomplete
tumor resection or damage to the facial nerves Espinosa et al. (2018); Stathopoulos et al. (2018); Grasso et al. (2021).
On the one hand, incomplete tumor resection may lead to the recurrence of the tumor. Even more dangerously, re-
operation to remove the tumor can be a very complicated process, and there is an unfavorable prognosis for the patient
Abu-Ghanem et al. (2016); Kanatas et al. (2018). On the other hand, loss of facial nerve function in severe cases can
lead to permanent facial paralysis, significantly impacting the patient’s postoperative recovery and quality of life Tseng
et al. (2007). Therefore, developing techniques for accurate and personalized preoperative diagnosis of parotid tumor
patients is of crucial importance Matsuo et al. (2020); Dai et al. (2021,2022).
Corresponding author
yifangao@mail.ustc.edu.cn (Y. Gao); daiyin@bmie.neu.edu.cn (Y. Dai)
ORCID(s): 0000-0002-9184-0085 (Y. Gao)
1These authors contributed equally to this work.
Yifan Gao et al.: Preprint submitted to Elsevier Page 1 of 17
arXiv:2210.01467v1 [eess.IV] 4 Oct 2022
Automatic parotid tumor segmentation
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 1: Schematic diagram of parotid MRI. Figures (a), (c), (e), and (g) indicate the original image, while (b), (d),
(f), and (h) show the ground truth corresponding to it. The yellow arrow denotes the location of the parotid tumor. The
green mask indicates the actual segmentation label, and the blue mask means the position where the deep model is likely
to make mistakes.
Modern medical imaging plays an essential role in the preoperative diagnosis of parotid tumors. Among them,
multimodal Magnetic Resonance Imaging (MRI) can provide the most accurate results for diagnosing tumors due to its
good contrast and rich information Soler et al. (1997); Stoia et al. (2021). For preoperative diagnosis and quantitative
assessment of the disease, automatic parotid tumor segmentation from MRI is necessary. As performing manual
segmentation of the parotid tumors from 3D volumes is tedious, time-consuming, and often interferences by anatomical
structures within or around the parotid gland, automatic segmentation of tumors is highly preferable in clinical practice.
Recent advances in artificial intelligence, notably deep learning, have contributed to significant breakthroughs in
image recognition and have been widely applied in data-driven medical image analysis. Deep learning models depicted
by convolutional neural networks (CNNs) Krizhevsky et al. (2017); He et al. (2016) and fast-evolving Transformer
Vaswani et al. (2017); Dosovitskiy et al. (2020); Liu et al. (2021) have achieved impressive performance in various
medical image tasks, such as computer-aided diagnosis Jian et al. (2021,2022); Zhao et al. (2021); Chen et al. (2022);
Amador et al. (2022) and medical image segmentation Zhao et al. (2020); Wang et al. (2020,2021); Elsawy and Abdel-
Mottaleb (2022). These advances have demonstrated that deep learning-based technologies are promising for studying
the automatic segmentation of parotid tumors in multimodal MRI.
Despite the success of state-of-the-art segmentation techniques in multiple organs and lesions, it has not been
applied to the parotid tumor to provide personalized treatment. In addition to the limited clinically available imaging
data, there are two important challenges to overcome.
On the one hand, the parotid gland and its surroundings are complex, with a large number of anatomical structures.
This characteristic makes it challenging to develop automatic segmentation techniques. The major anatomical struc-
tures within and around the parotid gland can be roughly divided into muscular tissues and neurovascular structures.
The muscular tissues include the internal medial pterygoid muscle, the posterior belly of digastric muscle, and the
sternocleidomastoid muscle. The neurovascular structures mainly include the retromandibular vein, internal carotid
artery, internal jugular vein, external carotid artery, and other facial nerves. In most cases, the facial nerve is not dis-
cernible on MRI Prevost et al. (2022). As other vessels are usually visible on MRI, they may also undergo anatomical
variation in the case of tumor-occupying lesions and not be visible Prevost et al. (2022).
These anatomical structures often have similar signal intensities to the tumor in MRI and even adhere together,
bringing major problems to automatic segmentation. Fig. 1demonstrates some hard samples in parotid tumor seg-
mentation. The comparison in Fig. 1(a) and 1(b) shows that the tumor and the muscle tissue have almost the same
Yifan Gao et al.: Preprint submitted to Elsevier Page 2 of 17
Automatic parotid tumor segmentation
signal intensity. It is difficult to differentiate the parotid tumor from this anatomical structure in this case. In addition,
the scale of parotid tumors is highly variable, ranging from less than one millimeter to several centimeters in radius.
In the small tumor segmentation, the model is highly likely to confuse it with vascular tissues or muscle tissues. The
comparisons from Fig. 1(e) to 1(h) highlight many anatomical structures with similar intensity and shape to the parotid
tumor. Therefore, this feature makes it very hard for the model to focus on the segmentation of the ground truth tumor.
In addition, the location of the parotid gland in the images often shows a small amount of signal from the facial nerve.
As seen in Fig. 1(e), it further increases the difficulty of automatic segmentation. In summary, unlike most organs
and lesions, the automatic segmentation of parotid tumors is a challenging task. It requires the introduction of prior
anatomical knowledge to improve the robustness and reliability of the model.
On the other hand, parotid tumors themselves have a large number of types. The signal intensity, morphology, and
size of different tumor types in MRI are very different. Therefore, it is difficult for the deep learning-based model to
learn robust feature information related to tumors. As seen in Fig. 1(c) and 1(d), tumors exhibit higher signal intensities,
making it difficult to distinguish them from parotid glands and compare and analyze them with other tumors.
However, experienced radiologists have good consistency in parotid tumor segmentation. The critical factor in this
is the effective extraction and combination of multimodal image information by the expert, which allows for accurate
manual segmentation. The parotid MRI examination produces multiple images. Among them, the most informative
and commonly used modalities are T1 images, T2 images, and STIR images. Although anatomical variations are very
common in individual MRI modalities, it is rare for parotid anatomy and tumors to have abnormal morphology and
signal intensity in all three slices. Therefore, the expert can consider and make the final decision based on observation
and comparison of the different image slices in a comprehensive manner. In summary, deep models for parotid tumor
segmentation need to learn cross-modal representations from multimodal MRI and fuse features from three modalities
to improve the model’s performance.
Therefore, this paper develops an anatomy-aware framework for automatic segmentation of parotid tumors from
multimodal MRI to leverage rich anatomical prior knowledge. The framework contains the Transformer-based seg-
mentation network (PT-Net) and the anatomy-aware loss function.
First, we propose PT-Net, a novel Transformer-based coarse-to-fine multimodal fusion network for parotid tumor
segmentation. The encoder of the network is built on the Transformer, while the decoder is CNNs-based architecture.
Such a design has been shown to balance local feature extraction and global information modeling. Different from the
existing multimodal fusion approaches, the encoder extracts and merges contextual information from three modality-
specific parotid MRI at different scales. It can better obtain cross-modality and multi-scale tumor information. The
decoder stacks the feature maps of various modalities and calibrates the multimodal information using the channel
attention mechanism. Experiments demonstrate that our method has significant advantages over highly competitive
baseline methods in parotid tumor segmentation.
Second, this paper presents the anatomy-aware loss for guiding the deep model to distinguish parotid anatomical
structures from tumors. Considering that segmentation models are prone to be disturbed by irrelevant anatomy and
make wrong predictions, we develop this novel distance-based loss function. In contrast to previous methods Kervadec
et al. (2019); Karimi and Salcudean (2019), anatomy-aware loss uses the distance of the center coordinates for com-
puting the binary mask in both model-predicted segmentation and ground truth. Hence this loss function can force
the model to identify anatomical structures far from the ground truth, thereby predicting the correct tumor location.
It is worth noting that compared with other distance-based loss functions applied to medical image segmentation, our
anatomy-aware loss does not require additional computation and has high training stability.
Based on our experimental results with MRI of 187 parotid tumor patients, we demonstrated the effectiveness of
the proposed PT-Net and the anatomy-aware loss. This approach has the potential to reduce the annotation burden
associated with large-scale parotid tumor image datasets, as well as mitigate the difficult availability of high-quality
labels provided by experienced radiologists.
The main contribution of this paper is summarized as follows:
1. We study parotid tumor segmentation for the first time and propose an automatic segmentation framework with
high performance and robustness.
2. We propose a Transformer-based segmentation network that fuses multimodal information from coarse to fine.
The proposed PT-Net captures compact and high-level tumor features through the self-attention mechanism.
3. This work presents the anatomy-aware loss function. It exploits the prior knowledge of anatomy in parotid MRI
to reduce segmentation mistakes from tumor-similar anatomical structures.
Yifan Gao et al.: Preprint submitted to Elsevier Page 3 of 17
Automatic parotid tumor segmentation
 ×
×
×
 ×
×
×
 ×
×
 ×

 ×
×
 ×

 ×
×
 ×

 ×  ×  ×  × 
 ×
×
 ×

 ×
×
 ×

 ×
×
×
 ×
×
×
 ×
×
×
ICM
 ×
×
×
 ×
×
×
 ×
×
×
 ×  ×  × 
 ×  ×  × 
MFB
SWB
PM
MFB
SWB
PM
MFB
SWB
PM
MFB
SWB
PM
Conv
Conv
Conv Conv Conv Conv
ICM ICM ICMICM
Conv
Conv
Conv
 ×
×
 ×

Figure 2: Overview of the proposed PT-Net, which consists of the Transformer-based encoder and CNNs-based decoder.
Conv: Convolution Block. MFB: Multimodal Fusion Block. SWB: Shift Window Block. PM: Patch Merging. ICM:
Information Calibration Module.
The remaining of this paper is organized as follows: In Section 2, we describe the proposed automatic segmentation
framework in detail, including the multimodal fusion network called PT-Net, and the anatomy-aware loss function. In
Section 3, we show the experiment design and the results. The experiment results are further analyzed and discussed
in Section 4. Finally, we summarize our work in Section 5.
2. Method
2.1. Parotid tumor segmentation network
2.1.1. Network architecture overview
PT-Net follows the classical design of encoder-decoder architecture Ronneberger et al. (2015), which uses both
CNNs and Transformer for modeling low-level features and long-range dependencies. Compared with other Transformer-
based segmentation networks, our method uses a novel multimodal fusion module. The proposed approach can effec-
tively extract and fuse information from coarse to fine to improve the segmentation performance of the model. Fig. 2
shows the basic structure of our proposed model. In the next section, we will elaborate on the structure of the segmen-
tation network, which is the Transformer-based encoder and CNNs-based decoder, respectively. Besides, we will also
introduce the multimodal fusion block, an important component of our PT-Net.
2.1.2. Transformer-based encoder
As seen in Fig. 2, the encoder of PT-Net consists of three modality-independent networks that do not share pa-
rameters. Assume that the input of each modality is a randomly cropped 3D patch 𝑋𝑅𝐷×𝐻×𝑊from the original
image, where 𝐷,𝐻, and 𝑊denote the depth, height, and width of each image, respectively.
First, similar to Zhou et al. (2021), the image of each modality is fed to the embedding layer to reduce the original
resolution. The embedding layer contains two successive convolutional blocks. Each convolution block contains two
convolutional layers with a kernel size of 3. After each convolution layer, we perform the GELU activation function
Hendrycks and Gimpel (2016) and instance normalization Ulyanov et al. (2016). Since the 3D parotid MRI data used
in this paper is significantly smaller in depth than the other two dimensions, the downsampling ratios of the three
dimensions are different in the two convolution layers. Specifically, we reduce the stride of the depth dimension of
the first convolution block from 2 to 1. Therefore, the downsampling ratio is 2 for the depth dimension and 4 for the
other two dimensions. Finally, we obtain the high-dimensional vector 𝑋𝑒𝑅𝐶×𝐷
2×𝐻
4×𝑊
4, and 𝐶is the length of the
representation vector. We summarize the specific structure of the embedding layer in Table 1.
Next, the feature map of the three modalities is fed into four stages of structurally identical Transformer blocks.
Each Transformer block contains a multimodal fusion block, a shift-window block, and a patch merging operation. As
Yifan Gao et al.: Preprint submitted to Elsevier Page 4 of 17
摘要:

AnAnatomy-awareFrameworkforAutomaticSegmentationofParotidTumorfromMultimodalMRIYifanGaoa,b,1,YinDaia,c,<,1,FayuLiud,WeibingChena,candLifuShieaCollegeofMedicineandBiologicalInformationEngineering,NortheasternUniversity,Shenyang,110169,ChinabSchoolofBiomedicalEngineering(Suzhou),DivisionofLifeSciences...

展开>> 收起<<
An Anatomy-aware Framework for Automatic Segmentation of Parotid Tumor from Multimodal MRI.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.8MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注