An Anatomy-aware Framework for Automatic Segmentation of Parotid Tumor from Multimodal MRI

2025-04-30 0 0 2.8MB 17 页 10玖币

An Anatomy-aware Framework for Automatic Segmentation of

Parotid Tumor from Multimodal MRI

Yifan Gaoa,b,1, Yin Daia,c,∗,1, Fayu Liud, Weibing Chena,c and Lifu Shie

aCollege of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China

bSchool of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei,

230026, China

cEngineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang, 110169, China

dDepartment of Oromaxillofacial-Head and Neck Surgery, School of Stomatology, China Medical University, Shenyang, 110002, China

eLiaoning Jiayin Medical Technology Co., LTD, Shenyang, 110170, China

ARTICLE INFO

Keywords:

Parotid tumor segmentation

Multimodal fusion

Anatomy-aware loss

Deep learning

Transformer

ABSTRACT

Magnetic Resonance Imaging (MRI) plays an important role in diagnosing the parotid tumor,

where accurate segmentation of tumors is highly desired for determining appropriate treatment

plans and avoiding unnecessary surgery. However, the task remains nontrivial and challenging

due to ambiguous boundaries and various sizes of the tumor, as well as the presence of a large

number of anatomical structures around the parotid gland that are similar to the tumor. To over-

come these problems, we propose a novel anatomy-aware framework for automatic segmentation

of parotid tumors from multimodal MRI. First, a Transformer-based multimodal fusion network

PT-Net is proposed in this paper. The encoder of PT-Net extracts and fuses contextual informa-

tion from three modalities of MRI from coarse to ﬁne, to obtain cross-modality and multi-scale

tumor information. The decoder stacks the feature maps of diﬀerent modalities and calibrates the

multimodal information using the channel attention mechanism. Second, considering that the

segmentation model is prone to be disturbed by similar anatomical structures and make wrong

predictions, we design anatomy-aware loss. By calculating the distance between the activation

regions of the prediction segmentation and the ground truth, our loss function forces the model

to distinguish similar anatomical structures with the tumor and make correct predictions. Exten-

sive experiments with MRI scans of the parotid tumor showed that our PT-Net achieved higher

segmentation accuracy than existing networks. The anatomy-aware loss outperformed state-of-

the-art loss functions for parotid tumor segmentation. Our framework can potentially improve

the quality of preoperative diagnosis and surgery planning of parotid tumors.

1. Introduction

Parotid tumors are the most common salivary gland tumors, accounting for approximately 2% to 6% of head and

neck tumors Jones et al. (2008). Parotid tumors are categorized into ﬁve types according to their clinical characteristics:

pleomorphic adenomas, Warthin tumors, basal cell adenomas, malignant tumors, and other minor benign lesions Jang

et al. (2004); Mendenhall et al. (2008); Zheng et al. (2021). It has been estimated that about 20% of all parotid tumor

cases are malignant Bussu et al. (2011).

Despite the relatively low incidence of the parotid tumor, the rate of clinical misdiagnosis before surgery is high due

to the heterogeneity and the diversity of types Assadsangabi et al. (2022). Currently, the primary treatment for parotid

tumors is resection surgery Poletti et al. (2018). Inappropriate surgical planning, however, can result in incomplete

tumor resection or damage to the facial nerves Espinosa et al. (2018); Stathopoulos et al. (2018); Grasso et al. (2021).

On the one hand, incomplete tumor resection may lead to the recurrence of the tumor. Even more dangerously, re-

operation to remove the tumor can be a very complicated process, and there is an unfavorable prognosis for the patient

Abu-Ghanem et al. (2016); Kanatas et al. (2018). On the other hand, loss of facial nerve function in severe cases can

lead to permanent facial paralysis, signiﬁcantly impacting the patient’s postoperative recovery and quality of life Tseng

et al. (2007). Therefore, developing techniques for accurate and personalized preoperative diagnosis of parotid tumor

patients is of crucial importance Matsuo et al. (2020); Dai et al. (2021,2022).

∗Corresponding author

yifangao@mail.ustc.edu.cn (Y. Gao); daiyin@bmie.neu.edu.cn (Y. Dai)

ORCID(s): 0000-0002-9184-0085 (Y. Gao)

1These authors contributed equally to this work.

Yifan Gao et al.: Preprint submitted to Elsevier Page 1 of 17

arXiv:2210.01467v1 [eess.IV] 4 Oct 2022

Automatic parotid tumor segmentation

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 1: Schematic diagram of parotid MRI. Figures (a), (c), (e), and (g) indicate the original image, while (b), (d),

(f), and (h) show the ground truth corresponding to it. The yellow arrow denotes the location of the parotid tumor. The

green mask indicates the actual segmentation label, and the blue mask means the position where the deep model is likely

to make mistakes.

Modern medical imaging plays an essential role in the preoperative diagnosis of parotid tumors. Among them,

multimodal Magnetic Resonance Imaging (MRI) can provide the most accurate results for diagnosing tumors due to its

good contrast and rich information Soler et al. (1997); Stoia et al. (2021). For preoperative diagnosis and quantitative

assessment of the disease, automatic parotid tumor segmentation from MRI is necessary. As performing manual

segmentation of the parotid tumors from 3D volumes is tedious, time-consuming, and often interferences by anatomical

structures within or around the parotid gland, automatic segmentation of tumors is highly preferable in clinical practice.

Recent advances in artiﬁcial intelligence, notably deep learning, have contributed to signiﬁcant breakthroughs in

image recognition and have been widely applied in data-driven medical image analysis. Deep learning models depicted

by convolutional neural networks (CNNs) Krizhevsky et al. (2017); He et al. (2016) and fast-evolving Transformer

Vaswani et al. (2017); Dosovitskiy et al. (2020); Liu et al. (2021) have achieved impressive performance in various

medical image tasks, such as computer-aided diagnosis Jian et al. (2021,2022); Zhao et al. (2021); Chen et al. (2022);

Amador et al. (2022) and medical image segmentation Zhao et al. (2020); Wang et al. (2020,2021); Elsawy and Abdel-

Mottaleb (2022). These advances have demonstrated that deep learning-based technologies are promising for studying

the automatic segmentation of parotid tumors in multimodal MRI.

Despite the success of state-of-the-art segmentation techniques in multiple organs and lesions, it has not been

applied to the parotid tumor to provide personalized treatment. In addition to the limited clinically available imaging

data, there are two important challenges to overcome.

On the one hand, the parotid gland and its surroundings are complex, with a large number of anatomical structures.

This characteristic makes it challenging to develop automatic segmentation techniques. The major anatomical struc-

tures within and around the parotid gland can be roughly divided into muscular tissues and neurovascular structures.

The muscular tissues include the internal medial pterygoid muscle, the posterior belly of digastric muscle, and the

sternocleidomastoid muscle. The neurovascular structures mainly include the retromandibular vein, internal carotid

artery, internal jugular vein, external carotid artery, and other facial nerves. In most cases, the facial nerve is not dis-

cernible on MRI Prevost et al. (2022). As other vessels are usually visible on MRI, they may also undergo anatomical

variation in the case of tumor-occupying lesions and not be visible Prevost et al. (2022).

These anatomical structures often have similar signal intensities to the tumor in MRI and even adhere together,

bringing major problems to automatic segmentation. Fig. 1demonstrates some hard samples in parotid tumor seg-

mentation. The comparison in Fig. 1(a) and 1(b) shows that the tumor and the muscle tissue have almost the same

Yifan Gao et al.: Preprint submitted to Elsevier Page 2 of 17

Automatic parotid tumor segmentation

signal intensity. It is diﬃcult to diﬀerentiate the parotid tumor from this anatomical structure in this case. In addition,

the scale of parotid tumors is highly variable, ranging from less than one millimeter to several centimeters in radius.

In the small tumor segmentation, the model is highly likely to confuse it with vascular tissues or muscle tissues. The

comparisons from Fig. 1(e) to 1(h) highlight many anatomical structures with similar intensity and shape to the parotid

tumor. Therefore, this feature makes it very hard for the model to focus on the segmentation of the ground truth tumor.

In addition, the location of the parotid gland in the images often shows a small amount of signal from the facial nerve.

As seen in Fig. 1(e), it further increases the diﬃculty of automatic segmentation. In summary, unlike most organs

and lesions, the automatic segmentation of parotid tumors is a challenging task. It requires the introduction of prior

anatomical knowledge to improve the robustness and reliability of the model.

On the other hand, parotid tumors themselves have a large number of types. The signal intensity, morphology, and

size of diﬀerent tumor types in MRI are very diﬀerent. Therefore, it is diﬃcult for the deep learning-based model to

learn robust feature information related to tumors. As seen in Fig. 1(c) and 1(d), tumors exhibit higher signal intensities,

making it diﬃcult to distinguish them from parotid glands and compare and analyze them with other tumors.

However, experienced radiologists have good consistency in parotid tumor segmentation. The critical factor in this

is the eﬀective extraction and combination of multimodal image information by the expert, which allows for accurate

manual segmentation. The parotid MRI examination produces multiple images. Among them, the most informative

and commonly used modalities are T1 images, T2 images, and STIR images. Although anatomical variations are very

common in individual MRI modalities, it is rare for parotid anatomy and tumors to have abnormal morphology and

signal intensity in all three slices. Therefore, the expert can consider and make the ﬁnal decision based on observation

and comparison of the diﬀerent image slices in a comprehensive manner. In summary, deep models for parotid tumor

segmentation need to learn cross-modal representations from multimodal MRI and fuse features from three modalities

to improve the model’s performance.

Therefore, this paper develops an anatomy-aware framework for automatic segmentation of parotid tumors from

multimodal MRI to leverage rich anatomical prior knowledge. The framework contains the Transformer-based seg-

mentation network (PT-Net) and the anatomy-aware loss function.

First, we propose PT-Net, a novel Transformer-based coarse-to-ﬁne multimodal fusion network for parotid tumor

segmentation. The encoder of the network is built on the Transformer, while the decoder is CNNs-based architecture.

Such a design has been shown to balance local feature extraction and global information modeling. Diﬀerent from the

existing multimodal fusion approaches, the encoder extracts and merges contextual information from three modality-

speciﬁc parotid MRI at diﬀerent scales. It can better obtain cross-modality and multi-scale tumor information. The

decoder stacks the feature maps of various modalities and calibrates the multimodal information using the channel

attention mechanism. Experiments demonstrate that our method has signiﬁcant advantages over highly competitive

baseline methods in parotid tumor segmentation.

Second, this paper presents the anatomy-aware loss for guiding the deep model to distinguish parotid anatomical

structures from tumors. Considering that segmentation models are prone to be disturbed by irrelevant anatomy and

make wrong predictions, we develop this novel distance-based loss function. In contrast to previous methods Kervadec

et al. (2019); Karimi and Salcudean (2019), anatomy-aware loss uses the distance of the center coordinates for com-

puting the binary mask in both model-predicted segmentation and ground truth. Hence this loss function can force

the model to identify anatomical structures far from the ground truth, thereby predicting the correct tumor location.

It is worth noting that compared with other distance-based loss functions applied to medical image segmentation, our

anatomy-aware loss does not require additional computation and has high training stability.

Based on our experimental results with MRI of 187 parotid tumor patients, we demonstrated the eﬀectiveness of

the proposed PT-Net and the anatomy-aware loss. This approach has the potential to reduce the annotation burden

associated with large-scale parotid tumor image datasets, as well as mitigate the diﬃcult availability of high-quality

labels provided by experienced radiologists.

The main contribution of this paper is summarized as follows:

1. We study parotid tumor segmentation for the ﬁrst time and propose an automatic segmentation framework with

high performance and robustness.

2. We propose a Transformer-based segmentation network that fuses multimodal information from coarse to ﬁne.

The proposed PT-Net captures compact and high-level tumor features through the self-attention mechanism.

3. This work presents the anatomy-aware loss function. It exploits the prior knowledge of anatomy in parotid MRI

to reduce segmentation mistakes from tumor-similar anatomical structures.

Yifan Gao et al.: Preprint submitted to Elsevier Page 3 of 17

Automatic parotid tumor segmentation

 ×



×



×



 ×



×



×



 ×



×



 ×





 ×



×



 ×





 ×



×



 ×





 ×  ×  ×  × 

 ×



×



 ×





 ×



×



 ×





 ×



×



×





 ×



×



×





 ×



×



×





ICM

 ×



×



×





 ×



×



×





 ×



×



×





 ×  ×  × 

 ×  ×  × 

MFB

SWB

MFB

SWB

MFB

SWB

MFB

SWB

Conv

Conv Conv Conv Conv

ICM ICM ICMICM

Conv

 ×



×



 ×





Figure 2: Overview of the proposed PT-Net, which consists of the Transformer-based encoder and CNNs-based decoder.

Conv: Convolution Block. MFB: Multimodal Fusion Block. SWB: Shift Window Block. PM: Patch Merging. ICM:

Information Calibration Module.

The remaining of this paper is organized as follows: In Section 2, we describe the proposed automatic segmentation

framework in detail, including the multimodal fusion network called PT-Net, and the anatomy-aware loss function. In

Section 3, we show the experiment design and the results. The experiment results are further analyzed and discussed

in Section 4. Finally, we summarize our work in Section 5.

2. Method

2.1. Parotid tumor segmentation network

2.1.1. Network architecture overview

PT-Net follows the classical design of encoder-decoder architecture Ronneberger et al. (2015), which uses both

CNNs and Transformer for modeling low-level features and long-range dependencies. Compared with other Transformer-

based segmentation networks, our method uses a novel multimodal fusion module. The proposed approach can eﬀec-

tively extract and fuse information from coarse to ﬁne to improve the segmentation performance of the model. Fig. 2

shows the basic structure of our proposed model. In the next section, we will elaborate on the structure of the segmen-

tation network, which is the Transformer-based encoder and CNNs-based decoder, respectively. Besides, we will also

introduce the multimodal fusion block, an important component of our PT-Net.

2.1.2. Transformer-based encoder

As seen in Fig. 2, the encoder of PT-Net consists of three modality-independent networks that do not share pa-

rameters. Assume that the input of each modality is a randomly cropped 3D patch 𝑋∈𝑅1×𝐷×𝐻×𝑊from the original

image, where 𝐷,𝐻, and 𝑊denote the depth, height, and width of each image, respectively.

First, similar to Zhou et al. (2021), the image of each modality is fed to the embedding layer to reduce the original

resolution. The embedding layer contains two successive convolutional blocks. Each convolution block contains two

convolutional layers with a kernel size of 3. After each convolution layer, we perform the GELU activation function

Hendrycks and Gimpel (2016) and instance normalization Ulyanov et al. (2016). Since the 3D parotid MRI data used

in this paper is signiﬁcantly smaller in depth than the other two dimensions, the downsampling ratios of the three

dimensions are diﬀerent in the two convolution layers. Speciﬁcally, we reduce the stride of the depth dimension of

the ﬁrst convolution block from 2 to 1. Therefore, the downsampling ratio is 2 for the depth dimension and 4 for the

other two dimensions. Finally, we obtain the high-dimensional vector 𝑋𝑒∈𝑅𝐶×𝐷

2×𝐻

4×𝑊

4, and 𝐶is the length of the

representation vector. We summarize the speciﬁc structure of the embedding layer in Table 1.

Next, the feature map of the three modalities is fed into four stages of structurally identical Transformer blocks.

Each Transformer block contains a multimodal fusion block, a shift-window block, and a patch merging operation. As

Yifan Gao et al.: Preprint submitted to Elsevier Page 4 of 17

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnAnatomy-awareFrameworkforAutomaticSegmentationofParotidTumorfromMultimodalMRIYifanGaoa,b,1,YinDaia,c,<,1,FayuLiud,WeibingChena,candLifuShieaCollegeofMedicineandBiologicalInformationEngineering,NortheasternUniversity,Shenyang,110169,ChinabSchoolofBiomedicalEngineering(Suzhou),DivisionofLifeSciences...

展开>> 收起<<

An Anatomy-aware Framework for Automatic Segmentation of Parotid Tumor from Multimodal MRI.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

An Anatomy-aware Framework for Automatic Segmentation of Parotid Tumor from Multimodal MRI

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: