Joint localization and classication of breast tumors on ultrasound images using a novel auxiliary attention-based framework

2025-04-27 0 0 3.66MB 27 页 10玖币
侵权投诉
Joint localization and classification of breast tumors on
ultrasound images using a novel auxiliary attention-based
framework
Zong Fana, Ping Gongb, Shanshan Tangb, Christine U. Leeb, Xiaohui Zhanga, Pengfei
Songa,c,e,f, Shigao Chenb, and Hua Lia,d,e
aDepartment of Bioengineering, University of Illinois at Urbana-Champaign, IL, USA
bMayo Clinic, Rochester, Minnesota, USA
cDepartment of Elect. & Computer Eng., University of Illinois at Urbana-Champaign, IL, USA
dDepartment of Radiation Oncology, Washington University in St. Louis, MO, USA
eCancer Center at Illinois, Urbana, IL, USA
fBeckman Institute, University of Illinois at Urbana-Champaign, IL, USA
ABSTRACT
Automatic breast lesion detection and classification is an important task in computer-aided diagnosis, in which
breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep
learning-based methods have been proposed for joint localization and classification of breast lesions using BUS
images. In these methods, features extracted by a shared network trunk are appended by two independent
network branches to achieve classification and localization. Improper information sharing might cause conflicts
in feature optimization in the two branches and leads to performance degradation. Also, these methods gener-
ally require large amounts of pixel-level annotated data for model training. To overcome these limitations, we
proposed a novel joint localization and classification model based on the attention mechanism and disentangled
semi-supervised learning strategy. The model used in this study is composed of a classification network and
an auxiliary lesion-aware network. By use of the attention mechanism, the auxiliary lesion-aware network can
optimize multi-scale intermediate feature maps and extract rich semantic information to improve classification
and localization performance. The disentangled semi-supervised learning strategy only requires incomplete train-
ing datasets for model training. The proposed modularized framework allows flexible network replacement to
be generalized for various applications. Experimental results on two different breast ultrasound image datasets
Send correspondence to Hua Li. E-mail: li.hua@wustl.edu
arXiv:2210.05762v1 [eess.IV] 11 Oct 2022
demonstrate the effectiveness of the proposed method. The impacts of various network factors on model perfor-
mance are also investigated to gain deep insights into the designed framework.
Keywords: Breast Tumor Detection; Ultrasound Imaging; Multi-task Learning; Semi-supervised learning; At-
tention Mechanism; Computer-aided Diagnosis
1. INTRODUCTION
Breast cancer is the most frequent cause of death in women aged between 35-55 years.1,2Ultrasonography
screening is a common tool for early diagnosis of breast lesions due to its cost-effectiveness and safety.3During
the past years, a number of computer-aided diagnosis methods have been proposed to assist in lesion localization
and classification. These automatic screening methods range from conventional machine learning techniques,46
to deep learning (DL) techniques.79Particularly, DL-based methods have achieved great success due to their
powerful learning capabilities.9,10 Generally, the localization task is represented as either lesion segmentation
or detection. The segmentation task aims to accurately delineate the lesion regions,11,12 while the detection
task is to simply predict lesion locations in the form of bounding boxes,9,13 DL-based classification methods are
developed to stratify lesions into subgroups to help clinicians design appropriate treatment strategies.
Multi-task learning (MTL) models have been proposed to conduct these two tasks simultaneously to increase
data efficiency without sacrificing the performance of each task. They generally consist of a shared feature
extractor with two appended task-specific branches. Given the discriminative features extracted by the shared
feature extractor, the classification branch differentiates lesion types and the localization branch confines the
potential lesion regions, respectively.14 This shared design leverages semantic information to decode the lesion
type and location simultaneously, which can reduce the risk of overfitting and improve learning efficiency and
robustness.1420 Zhou et al. employed an encoder-decoder network (VNet) for the segmentation task, while
the intermediate feature maps were reused for classification by a lightweight network only consisting of a global
average pooling layer and three fully-connected layers.17 Chowdary et al. employed residual U-Net architecture
for segmentation and shared the intermediate feature maps for classification with a two-layer fully-connected (FC)
network.18 Some MTL methods simplify pixel-wise lesion segmentation to detection in the form of bounding
boxes instead.9,13 For instance, Cao et al. studied and compared the performance of several popular object
detection methods such as YOLO and SSD.13 Shin et al. employed Faster-RCNN for detection and classification
of breast tumors on a BUS image dataset.9
In these traditional MTL methods, balancing the degree of the information shared between the two different
tasks is critical to ensure the model performance.21 Improper information sharing may decrease the model
performance due to the conflicts in optimizing extracted features between the two tasks with different objectives.14
Loss weighting is a common method that balances and tunes the individual loss functions for different tasks.14,22
Liu et al. proposed an adaptive weighting method to dynamically balance the learning rate of each task.22
Gradient demodulation methods modify training gradients to alleviate the conflicts of learning dynamics between
different tasks.14,23 Sinha et al. employed adversarial training to align the gradients from different tasks to
boost the model performance.23 Also, the attention mechanism is widely employed to consider the correlation
of different tasks to improve model learning capability and performance .19,20,24,25 This technique enables the
extracted features to focus on more discriminative information. Xu et al. proposed a self-attention module on
top of a U-Net to utilize the context information to improve both breast tumor segmentation performance and
classification performance.20
Conventional DL-based methods for detection and classification of breast lesion typically require large amounts
of fully-annotated training images, which is very time- and effort-consuming. Semi-supervised learning techniques
can alleviate the burden of annotating localization labels, which automatically exploit incomplete or inexact su-
pervisions to improve model performance.2630 Han et al. adopted a generative adversarial network (GAN)-based
model for semi-supervised breast tumor segmentation on BUS images.27 This method employed an evaluation
network to assess the quality of the segmentation outcomes in order to enhance the model performance through
an adversarial training strategy. Mittal et al. proposed a dual-branch model using the consistency regularization
technique, which combined a GAN-based network for segmentation and a multi-label teacher network to filter
false positive segmentation predictions to improve model performance.28
In this study, we proposed a novel MTL method to address these two problems for joint breast tumor
lesion localization and classification based on a disentangled semi-supervised learning strategy and attention
mechanism. The proposed model was composed of a shared feature extractor appended by an auxiliary lesion-
aware network and a classifier for joint lesion localization and classification. Multiple attention modules were
employed in the auxiliary lesion-aware network to optimize the multi-scale intermediate feature maps from the
feature extractor. This design can leverage the intensity-level and geometrical-level knowledge and improve the
representativeness of the extracted feature maps by focusing on the lesion region via the channel and spatial
attention, thus achieving better performance in both classification and localization tasks. The disentangled
semi-supervised learning strategy was designed for training the model by use of incomplete training datasets
with partial lesion location annotations. It was adopted from the pseudo-labeling method, which is a simple
but efficient semi-supervised learning approach.31 By assigning high-confident pseudo-labels to unlabeled images
to increase the number of labeled training samples, this learning strategy can significantly reduce the burden
on data annotation and fully utilize the unlabeled data to improve localization performance. In addition, the
proposed model was modularized so that each network component can be flexibly configured and adjusted to
satisfy specific objectives in various potential applications. Experimental results on two breast ultrasound image
datasets demonstrate the effectiveness of the proposed method. The impacts of various network factors on model
performance are also investigated to gain deep insights into the designed model.
The remainder of the paper is organized as follows. Section 2describes the proposed lesion-aware classifi-
cation method. Section 3describes the dataset and implementation details of the proposed method, and the
experimental results are shown in Section 4. The discussion and conclusion are described in Section 5and
Section 6, respectively.
2. METHODS
2.1 The proposed framework architecture
As shown in Figure 1, the proposed framework consists of a feature extractor (FEX) followed by a classifier
and an auxiliary lesion-aware network (LA-Net). The FEX extracts feature maps from hierarchical intermediate
convolutional layers, which contain rich multi-scale lesion-relevant information. These features are shared for
the classification and localization task via two branches. The auxiliary LA-Net branch leverages these multi-
scale feature maps via multiple attention modules to predict the potential lesion location. The classifier branch
predicts the class labels by combining the learned lesion localization knowledge from LA-Net with extracted
feature maps of FEX through a self-attention module. This design explicitly utilizes correlation and alleviates
the potential optimization conflicts between the classification task and localization task. The model architecture
is discussed in terms of each network as follows.
Figure 1: The proposed framework for joint localization and classification. FEX: feature extractor; C: classifier;
{f1, ..., fn}: the extracted intermediate feature maps of FEX; MAM: mask attention module where Nmeans
element-wise multiplication and Lmeans element-wise addition.
2.1.1 Feature extractor (FEX)
The FEX is hierarchically structured with nconvolutional blocks. Given a 2-dimensional BUS image XRM×N,
a set of feature maps {f1, ..., fn}are extracted by each of the convolutional blocks, which will be used as the
input of LA-Net. Only the top extracted feature map fnwith the lowest dimensionality is used as the input of
the classifier to improve the classification robustness. The mathematical representation of FEX is:
{f1,f2, ..., fn}=F(X, ΘF),(1)
where Frepresents the mapping function of FEX parameterized by ΘF.
2.1.2 Lesion-aware network (LA-Net)
The architecture of LA-Net is shown in Figure 2. A shared convolutional block attention module (CBAM)32 is
employed to process the set of extracted multi-scale intermediate features {f1, ..., fn}from FEX. CBAM includes
a channel attention module (CAM) and a spatial attention module (SAM). By fusing channel attention and
spatial attention, CBAM can exploit channel and spatial knowledge to enhance the informativeness of extracted
feature maps. Next, a feature fusion module is designed to fuse the CBAM-processed feature maps and predict
the lesion location mask. First, a convolutional (Conv) layer is employed to squeeze the multiple channels of
the input feature map into a single channel, distilling the learned knowledge to highlight the lesion regions of
interest (ROIs). These squeezed feature maps are resized to the size of feature fnand concatenated into a feature
map fmerge with dimension of Sn×Sn×n, where Snis the width and height of fn. The merged feature map
is processed by a Conv layer, batch normalization (BN) layer and sigmoid activation layer to output a lesion
location mask Ypixel with the shape of Sn×Sn. Each pixel on the lesion location mask indicates its probability
of belonging to the lesion or background,
P(Ypixel|X) = D({f1, ..fn},ΘD),(2)
where Drepresents the mapping function of LA-Net which is parameterized by trainable parameters ΘD.
2.1.3 Classifier
The last extracted feature map fnof FEX is first enhanced by P(Ypixel) through a mask-attention module
(MAM), as shown in Figure 1. Then the classifier uses the enhanced feature map fatt to predict the probability
of the input BUS image belonging to each of Kclasses, P(Ycls|X). With the MAM design, the lesion location
information is introduced into the classification process, which improves the classification performance by learning
摘要:

Jointlocalizationandclassi cationofbreasttumorsonultrasoundimagesusinganovelauxiliaryattention-basedframeworkZongFana,PingGongb,ShanshanTangb,ChristineU.Leeb,XiaohuiZhanga,PengfeiSonga,c,e,f,ShigaoChenb,andHuaLia,d,eaDepartmentofBioengineering,UniversityofIllinoisatUrbana-Champaign,IL,USAbMayoClinic...

展开>> 收起<<
Joint localization and classication of breast tumors on ultrasound images using a novel auxiliary attention-based framework.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:3.66MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注