Joint localization and classication of breast tumors on ultrasound images using a novel auxiliary attention-based framework

2025-04-27 0 0 3.66MB 27 页 10玖币

侵权投诉

Joint localization and classiﬁcation of breast tumors on

ultrasound images using a novel auxiliary attention-based

framework

Zong Fana, Ping Gongb, Shanshan Tangb, Christine U. Leeb, Xiaohui Zhanga, Pengfei

Songa,c,e,f, Shigao Chenb, and Hua Lia,d,e

aDepartment of Bioengineering, University of Illinois at Urbana-Champaign, IL, USA

bMayo Clinic, Rochester, Minnesota, USA

cDepartment of Elect. & Computer Eng., University of Illinois at Urbana-Champaign, IL, USA

dDepartment of Radiation Oncology, Washington University in St. Louis, MO, USA

eCancer Center at Illinois, Urbana, IL, USA

fBeckman Institute, University of Illinois at Urbana-Champaign, IL, USA

ABSTRACT

Automatic breast lesion detection and classiﬁcation is an important task in computer-aided diagnosis, in which

breast ultrasound (BUS) imaging is a common and frequently used screening tool. Recently, a number of deep

learning-based methods have been proposed for joint localization and classiﬁcation of breast lesions using BUS

images. In these methods, features extracted by a shared network trunk are appended by two independent

network branches to achieve classiﬁcation and localization. Improper information sharing might cause conﬂicts

in feature optimization in the two branches and leads to performance degradation. Also, these methods gener-

ally require large amounts of pixel-level annotated data for model training. To overcome these limitations, we

proposed a novel joint localization and classiﬁcation model based on the attention mechanism and disentangled

semi-supervised learning strategy. The model used in this study is composed of a classiﬁcation network and

an auxiliary lesion-aware network. By use of the attention mechanism, the auxiliary lesion-aware network can

optimize multi-scale intermediate feature maps and extract rich semantic information to improve classiﬁcation

and localization performance. The disentangled semi-supervised learning strategy only requires incomplete train-

ing datasets for model training. The proposed modularized framework allows ﬂexible network replacement to

be generalized for various applications. Experimental results on two diﬀerent breast ultrasound image datasets

Send correspondence to Hua Li. E-mail: li.hua@wustl.edu

arXiv:2210.05762v1 [eess.IV] 11 Oct 2022

demonstrate the eﬀectiveness of the proposed method. The impacts of various network factors on model perfor-

mance are also investigated to gain deep insights into the designed framework.

Keywords: Breast Tumor Detection; Ultrasound Imaging; Multi-task Learning; Semi-supervised learning; At-

tention Mechanism; Computer-aided Diagnosis

1. INTRODUCTION

Breast cancer is the most frequent cause of death in women aged between 35-55 years.1,2Ultrasonography

screening is a common tool for early diagnosis of breast lesions due to its cost-eﬀectiveness and safety.3During

the past years, a number of computer-aided diagnosis methods have been proposed to assist in lesion localization

and classiﬁcation. These automatic screening methods range from conventional machine learning techniques,4–6

to deep learning (DL) techniques.7–9Particularly, DL-based methods have achieved great success due to their

powerful learning capabilities.9,10 Generally, the localization task is represented as either lesion segmentation

or detection. The segmentation task aims to accurately delineate the lesion regions,11,12 while the detection

task is to simply predict lesion locations in the form of bounding boxes,9,13 DL-based classiﬁcation methods are

developed to stratify lesions into subgroups to help clinicians design appropriate treatment strategies.

Multi-task learning (MTL) models have been proposed to conduct these two tasks simultaneously to increase

data eﬃciency without sacriﬁcing the performance of each task. They generally consist of a shared feature

extractor with two appended task-speciﬁc branches. Given the discriminative features extracted by the shared

feature extractor, the classiﬁcation branch diﬀerentiates lesion types and the localization branch conﬁnes the

potential lesion regions, respectively.14 This shared design leverages semantic information to decode the lesion

type and location simultaneously, which can reduce the risk of overﬁtting and improve learning eﬃciency and

robustness.14–20 Zhou et al. employed an encoder-decoder network (VNet) for the segmentation task, while

the intermediate feature maps were reused for classiﬁcation by a lightweight network only consisting of a global

average pooling layer and three fully-connected layers.17 Chowdary et al. employed residual U-Net architecture

for segmentation and shared the intermediate feature maps for classiﬁcation with a two-layer fully-connected (FC)

network.18 Some MTL methods simplify pixel-wise lesion segmentation to detection in the form of bounding

boxes instead.9,13 For instance, Cao et al. studied and compared the performance of several popular object

detection methods such as YOLO and SSD.13 Shin et al. employed Faster-RCNN for detection and classiﬁcation

of breast tumors on a BUS image dataset.9

In these traditional MTL methods, balancing the degree of the information shared between the two diﬀerent

tasks is critical to ensure the model performance.21 Improper information sharing may decrease the model

performance due to the conﬂicts in optimizing extracted features between the two tasks with diﬀerent objectives.14

Loss weighting is a common method that balances and tunes the individual loss functions for diﬀerent tasks.14,22

Liu et al. proposed an adaptive weighting method to dynamically balance the learning rate of each task.22

Gradient demodulation methods modify training gradients to alleviate the conﬂicts of learning dynamics between

diﬀerent tasks.14,23 Sinha et al. employed adversarial training to align the gradients from diﬀerent tasks to

boost the model performance.23 Also, the attention mechanism is widely employed to consider the correlation

of diﬀerent tasks to improve model learning capability and performance .19,20,24,25 This technique enables the

extracted features to focus on more discriminative information. Xu et al. proposed a self-attention module on

top of a U-Net to utilize the context information to improve both breast tumor segmentation performance and

classiﬁcation performance.20

Conventional DL-based methods for detection and classiﬁcation of breast lesion typically require large amounts

of fully-annotated training images, which is very time- and eﬀort-consuming. Semi-supervised learning techniques

can alleviate the burden of annotating localization labels, which automatically exploit incomplete or inexact su-

pervisions to improve model performance.26–30 Han et al. adopted a generative adversarial network (GAN)-based

model for semi-supervised breast tumor segmentation on BUS images.27 This method employed an evaluation

network to assess the quality of the segmentation outcomes in order to enhance the model performance through

an adversarial training strategy. Mittal et al. proposed a dual-branch model using the consistency regularization

technique, which combined a GAN-based network for segmentation and a multi-label teacher network to ﬁlter

false positive segmentation predictions to improve model performance.28

In this study, we proposed a novel MTL method to address these two problems for joint breast tumor

lesion localization and classiﬁcation based on a disentangled semi-supervised learning strategy and attention

mechanism. The proposed model was composed of a shared feature extractor appended by an auxiliary lesion-

aware network and a classiﬁer for joint lesion localization and classiﬁcation. Multiple attention modules were

employed in the auxiliary lesion-aware network to optimize the multi-scale intermediate feature maps from the

feature extractor. This design can leverage the intensity-level and geometrical-level knowledge and improve the

representativeness of the extracted feature maps by focusing on the lesion region via the channel and spatial

attention, thus achieving better performance in both classiﬁcation and localization tasks. The disentangled

semi-supervised learning strategy was designed for training the model by use of incomplete training datasets

with partial lesion location annotations. It was adopted from the pseudo-labeling method, which is a simple

but eﬃcient semi-supervised learning approach.31 By assigning high-conﬁdent pseudo-labels to unlabeled images

to increase the number of labeled training samples, this learning strategy can signiﬁcantly reduce the burden

on data annotation and fully utilize the unlabeled data to improve localization performance. In addition, the

proposed model was modularized so that each network component can be ﬂexibly conﬁgured and adjusted to

satisfy speciﬁc objectives in various potential applications. Experimental results on two breast ultrasound image

datasets demonstrate the eﬀectiveness of the proposed method. The impacts of various network factors on model

performance are also investigated to gain deep insights into the designed model.

The remainder of the paper is organized as follows. Section 2describes the proposed lesion-aware classiﬁ-

cation method. Section 3describes the dataset and implementation details of the proposed method, and the

experimental results are shown in Section 4. The discussion and conclusion are described in Section 5and

Section 6, respectively.

2. METHODS

2.1 The proposed framework architecture

As shown in Figure 1, the proposed framework consists of a feature extractor (FEX) followed by a classiﬁer

and an auxiliary lesion-aware network (LA-Net). The FEX extracts feature maps from hierarchical intermediate

convolutional layers, which contain rich multi-scale lesion-relevant information. These features are shared for

the classiﬁcation and localization task via two branches. The auxiliary LA-Net branch leverages these multi-

scale feature maps via multiple attention modules to predict the potential lesion location. The classiﬁer branch

predicts the class labels by combining the learned lesion localization knowledge from LA-Net with extracted

feature maps of FEX through a self-attention module. This design explicitly utilizes correlation and alleviates

the potential optimization conﬂicts between the classiﬁcation task and localization task. The model architecture

is discussed in terms of each network as follows.

Figure 1: The proposed framework for joint localization and classiﬁcation. FEX: feature extractor; C: classiﬁer;

{f1, ..., fn}: the extracted intermediate feature maps of FEX; MAM: mask attention module where Nmeans

element-wise multiplication and Lmeans element-wise addition.

2.1.1 Feature extractor (FEX)

The FEX is hierarchically structured with nconvolutional blocks. Given a 2-dimensional BUS image X∈RM×N,

a set of feature maps {f1, ..., fn}are extracted by each of the convolutional blocks, which will be used as the

input of LA-Net. Only the top extracted feature map fnwith the lowest dimensionality is used as the input of

the classiﬁer to improve the classiﬁcation robustness. The mathematical representation of FEX is:

{f1,f2, ..., fn}=F(X, ΘF),(1)

where Frepresents the mapping function of FEX parameterized by ΘF.

2.1.2 Lesion-aware network (LA-Net)

The architecture of LA-Net is shown in Figure 2. A shared convolutional block attention module (CBAM)32 is

employed to process the set of extracted multi-scale intermediate features {f1, ..., fn}from FEX. CBAM includes

a channel attention module (CAM) and a spatial attention module (SAM). By fusing channel attention and

spatial attention, CBAM can exploit channel and spatial knowledge to enhance the informativeness of extracted

feature maps. Next, a feature fusion module is designed to fuse the CBAM-processed feature maps and predict

the lesion location mask. First, a convolutional (Conv) layer is employed to squeeze the multiple channels of

the input feature map into a single channel, distilling the learned knowledge to highlight the lesion regions of

interest (ROIs). These squeezed feature maps are resized to the size of feature fnand concatenated into a feature

map fmerge with dimension of Sn×Sn×n, where Snis the width and height of fn. The merged feature map

is processed by a Conv layer, batch normalization (BN) layer and sigmoid activation layer to output a lesion

location mask Ypixel with the shape of Sn×Sn. Each pixel on the lesion location mask indicates its probability

of belonging to the lesion or background,

P(Ypixel|X) = D({f1, ..fn},ΘD),(2)

where Drepresents the mapping function of LA-Net which is parameterized by trainable parameters ΘD.

2.1.3 Classiﬁer

The last extracted feature map fnof FEX is ﬁrst enhanced by P(Ypixel) through a mask-attention module

(MAM), as shown in Figure 1. Then the classiﬁer uses the enhanced feature map fatt to predict the probability

of the input BUS image belonging to each of Kclasses, P(Ycls|X). With the MAM design, the lesion location

information is introduced into the classiﬁcation process, which improves the classiﬁcation performance by learning

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Jointlocalizationandclassicationofbreasttumorsonultrasoundimagesusinganovelauxiliaryattention-basedframeworkZongFana,PingGongb,ShanshanTangb,ChristineU.Leeb,XiaohuiZhanga,PengfeiSonga,c,e,f,ShigaoChenb,andHuaLia,d,eaDepartmentofBioengineering,UniversityofIllinoisatUrbana-Champaign,IL,USAbMayoClinic...

展开>> 收起<<

Joint localization and classication of breast tumors on ultrasound images using a novel auxiliary attention-based framework.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Joint localization and classication of breast tumors on ultrasound images using a novel auxiliary attention-based framework

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: