MASKED AUTOENCODERS FOR LOW DOSE CT DENOISING Dayang Wang Yongshun Xu Shuo Han Hengyong Yu Department of Electrical and Computer Engineering

2025-05-02 0 0 2.09MB 4 页 10玖币

侵权投诉

MASKED AUTOENCODERS FOR LOW DOSE CT DENOISING

Dayang Wang, Yongshun Xu, Shuo Han, Hengyong Yu*

Department of Electrical and Computer Engineering,

University of Massachusetts Lowell, Lowell, MA, USA, 01854.

ABSTRACT

Low-dose computed tomography (LDCT) reduces the

X-ray radiation but compromises image quality with more

noises and artifacts. A plethora of transformer models have

been developed recently to improve LDCT image quality.

However, the success of a transformer model relies on a large

amount of paired noisy and clean data, which is often unavail-

able in clinical applications. In computer vision and natural

language processing ﬁelds, masked autoencoders (MAE)

have been proposed as an effective label-free self-pretraining

method for transformers, due to its excellent feature rep-

resentation ability. Here, we redesign the classical encoder-

decoder learning model to match the denoising task and apply

it to LDCT denoising problem. The MAE can leverage the

unlabeled data and facilitate structural preservation for the

LDCT denoising model when there are insufﬁcient ground

truth data. Experiments on the Mayo dataset validate that the

MAE can boost the transformer’s denoising performance and

relieve the dependence on the ground truth data.

Index Terms—Low-dose CT, Masked Autoencoder,

Self-pretraining, Transformer.

1. INTRODUCTION

In recent years, LDCT has become the mainstream in clinical

applications due to the substantial x-ray radiation danger in

normal-dose CT (NDCT). However, LDCT images compro-

mise the image quality and diagnosis value, which has been a

barrier for its applications. To overcome this issue, numerous

deep learning models are explored along this direction [1–5].

Transformer Models. In the past few years, transformer

models have gained lots of attention due to their ability to cap-

ture global contextual information. Several researches also

cultivated their applications to the LDCT denoising. Zhang et

al. employed a Gaussian ﬁlter to decompose LDCT images

into high/low frequency parts [4]. Then, a transformer mod-

ule was applied to the two parts for feature inference and con-

textual information fusion. Wang et al. proposed a more ad-

vanced convolution-free encoder-decoder transformer. Then,

*email: hengyong-yu@ieee.org. This work has been submitted to the

IEEE for possible publication. Copyright may be transferred without notice,

after which this version may no longer be accessible.

the static and dynamic latent learning behavior of the model

was revealed by analyzing the attention maps [5]. Recently,

the Swin transformer has been very popular as a backbone

architecture for a variety of downstream tasks [6, 7]. The

SwinIR is one of its important adaptions for image denoising,

super-resolution, and artifact reduction [8]. Notably, some

SwinIR-based transformer models or modules have also been

proposed for CT image enhancement/denoising and have

achieved excellent results [9]. Therefore, the application of

the MAE in this paper is also based on the SwinIR.

Masked Autoencoder. Recently, MAE has emerged as an

excellent self-supervised learning strategy for various com-

puter vision tasks [10]. He et. al. revealed that small frac-

tion of an image can infer complex and holistic visual con-

cepts like semantics [10]. Zhou et. al. showed that MAE

can enhance medical image segmentation and classiﬁcation

performance [11]. However, these works are on high-level

vision tasks. Currently, there is no exploration of MAE as

self-pretraining on low-level vision tasks like LDCT denois-

ing.

Therefore, we are motivated to explore the potential of

the MAE in LDCT denoising. We believe this study is signif-

icant for two reasons: i) There are usually insufﬁcient ground

truth data for clinical applications, and the self-pretraining

paradigm of the MAE can reap the beneﬁts of the unlabeled

data. Therefore, it ought to be the ideal choice for an LDCT

denoising work. ii) Structural preservation and enhancement

are crucial goals in LDCT denoising. But, the anatomical

structures in the CT image are supposed to be connected with

each other mechanically and functionally. The MAE can ag-

gregate the contextual information to infer the masked struc-

tures. Therefore, it can potentially strengthen the dependence

between each anatomical region and complement the struc-

tural loss.

2. METHODS

2.1. Masked Autoencoder Design

As shown in the model ﬂowchart in Fig. 1, the MAE learn-

ing paradigm is employed for LDCT denoising. In the self-

pretraining stage, the CT images are trained from LDCT im-

ages to LDCT images to learn the structure relationship, and

arXiv:2210.04944v1 [eess.IV] 10 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MASKEDAUTOENCODERSFORLOWDOSECTDENOISINGDayangWang,YongshunXu,ShuoHan,HengyongYu*DepartmentofElectricalandComputerEngineering,UniversityofMassachusettsLowell,Lowell,MA,USA,01854.ABSTRACTLow-dosecomputedtomography(LDCT)reducestheX-rayradiationbutcompromisesimagequalitywithmorenoisesandartifacts.Apleth...

展开>> 收起<<

MASKED AUTOENCODERS FOR LOW DOSE CT DENOISING Dayang Wang Yongshun Xu Shuo Han Hengyong Yu Department of Electrical and Computer Engineering.pdf

共4页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

MASKED AUTOENCODERS FOR LOW DOSE CT DENOISING Dayang Wang Yongshun Xu Shuo Han Hengyong Yu Department of Electrical and Computer Engineering

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: