A Self-attention Guided Multi-scale Gradient GAN for Diversied X-ray Image Synthesis Muhammad Muneeb Saad10000 000202040597 Mubashir Husain

2025-04-30 1 0 587.73KB 14 页 10玖币

侵权投诉

A Self-attention Guided Multi-scale Gradient

GAN for Diversiﬁed X-ray Image Synthesis?

Muhammad Muneeb Saad1[0000−0002−0204−0597], Mubashir Husain

Rehmani1[0000−0002−3565−7390], and Ruairi O’Reilly1[0000−0001−7990−3461]

Munster Technological University Cork, Ireland

muhammad.saad@mycit.ie, mubashir.rehmani@mtu.ie, and

ruairi.oreilly@mtu.ie

Abstract. Imbalanced image datasets are commonly available in the

domain of biomedical image analysis. Biomedical images contain diver-

siﬁed features that are signiﬁcant in predicting targeted diseases. Gen-

erative Adversarial Networks (GANs) are utilized to address the data

limitation problem via the generation of synthetic images. Training chal-

lenges such as mode collapse, non-convergence, and instability degrade

a GAN’s performance in synthesizing diversiﬁed and high-quality im-

ages. In this work, MSG-SAGAN, an attention-guided multi-scale gra-

dient GAN architecture is proposed to model the relationship between

long-range dependencies of biomedical image features and improves the

training performance using a ﬂow of multi-scale gradients at multiple

resolutions in the layers of generator and discriminator models. The in-

tent is to reduce the impact of mode collapse and stabilize the training of

GAN using an attention mechanism with multi-scale gradient learning for

diversiﬁed X-ray image synthesis. Multi-scale Structural Similarity In-

dex Measure (MS-SSIM) and Frechet Inception Distance (FID) are used

to identify the occurrence of mode collapse and evaluate the diversity

of synthetic images generated. The proposed architecture is compared

with the multi-scale gradient GAN (MSG-GAN) to assess the diversity

of generated synthetic images. Results indicate that the MSG-SAGAN

outperforms MSG-GAN in synthesizing diversiﬁed images as evidenced

by the MS-SSIM and FID scores.

Keywords: GANs ·Self-Attention ·Multi-scale Gradients ·Mode Col-

lapse ·Diversity ·X-ray images ·Synthesis ·MS-SSIM ·FID.

1 Introduction

Generative adversarial networks (GANs) are generative models used for image

synthesis in the computer vision domain [1]. GANs are composed of genera-

tor and discriminator models. The generator takes a random vector input and

generates a noisy image. This image is passed to the discriminator model. The

?This work is supported by the Munster Technological University’s Risam Scholarship

Award

arXiv:2210.06334v2 [eess.IV] 12 Nov 2022

2 Muhammad Muneeb Saad. et al.

discriminator model classiﬁes the generated images from the real images and

provides gradient feedback to the generator. The generator model updates its

learning of the feature distribution of real images through feedback provided

by the discriminator. GANs work with adversarial training where the generator

and the discriminator try to improve their performance based on each other’s

feedback [2].

GANs face diﬃculty in synthesizing images with complex and diverse fea-

tures. This problem arises due to technical challenges that occur during the train-

ing of GANs. Training challenges include mode collapse, non-convergence, and

instability [3]. Mode collapse refers to the generation of identical synthetic im-

ages by the generator regardless of diverse real images while the non-convergence

and instability problem imbalanced the training due to the vanishing gradient

problem. These problems limit the utility of GANs for image datasets with a

diverse range of salient image features [4]. In general, GANs are designed with

convolutional neural networks (CNNs) that fail to capture image features such

as texture, geometry, position, and color of the objects. One of the reasons could

be that the CNNs mostly utilize convolutional features in modeling the depen-

dencies over diverse image regions [5].

In the domain of biomedical imaging, the diverse features of biomedical im-

ages are important to consider in disease recognition or computer-based diagnosis

tasks [6]. These diverse features contain signiﬁcant information about the disease

being diagnosed and analyzed. GANs have been utilized for biomedical image

synthesis. Several imaging modalities such as X-rays, Computed Tomography

(CT), Magnetic Resonance (MR), Ultrasound, and Positron Emission Tomogra-

phy (PET) have utilized GANs to generate synthetic samples [7]. The generation

of diversiﬁed synthetic images is a signiﬁcant barrier for GANs that limits their

utility in the biomedical imaging domain.

X-ray images are widely utilized to diagnose diseases in the human body. X-

ray images contain a wide spectrum of disease features that help physicians to

monitor diseases more accurately [8]. Publicly available X-ray image datasets are

limited and imbalanced [9]. Image synthesis is a potential means of augmenting

and balancing these X-ray images. In image synthesis, synthetic images are pro-

duced by replicating the actual distributions of image features. Therefore, this

method is signiﬁcant as compared to the traditional augmentation approaches

such as geometrical transformations [10]. GANs have demonstrated remarkable

advancements in image synthesis in the biomedical imaging domain [11].

State-of-the-art GANs such as ProGAN [12], StyleGAN [13], and MSG-GAN

[14] have been used for biomedical image synthesis. These GAN architectures

have demonstrated signiﬁcant performance in generating diverse images [15].

Minibatch discrimination, PixNorm, progressive growth of GAN layers, and

Spectral normalization techniques have also been utilized to enhance the di-

versity of synthetic images. The multi-scale gradient technique enables the dis-

criminator learning more robust for the classiﬁcation of real and synthetic images

[16]. Biomedical images contain salient disease features such as the location, size,

color, and structure of the disease region of interest. These features are suscep-

Title Suppressed Due to Excessive Length 3

tible and important to predict and analysis of the disease. GANs learn images

through convolutional features without giving attention to these salient features

when generating synthetic images. However, it is important for a GAN to learn

these biomedical image features during the training process.

In the domain of image recognition, self-attention is considered the best ap-

proach to focusing on diverse features of the images [17]. The self-attention

measures relative information of features based on their feature maps and com-

bines them globally with a weighted scoring function. Consequently, it helps to

focus on the signiﬁcant features for the speciﬁc application tasks [5].

To address the training challenges of GANs, several GAN variants based on

the attention mechanisms have attempted to improve the training performance

of GANs for natural and biomedical images [17]. Self-attention improves the

learning of generator and discriminator models in generating diversiﬁed biomed-

ical images [18].

In order to balance and stabilize the training of a GAN, the loss function has

also a great impact on the GAN’s training performance for generating realis-

tic synthetic images. Loss functions such as WGAN-GP, Hinge, and relativistic

hinge losses have shown a reasonable improvement in generating diversiﬁed syn-

thetic images [19]. However, the hinge loss has shown a great capacity to improve

the GAN’s learning to generate diverse biomedical images [20].

The occurrence of mode collapse and diversity of synthetic images is assessed

by the Multi-scale Structural Similarity Index Measure (MS-SSIM) and Frechet

Inception Distance (FID). The MS-SSIM score can detect the lack of diversity

using perceptual similarity measures in synthetic images while the FID score

provides a distance between the feature distributions of real and synthetic images

[21].

This work contributes a novel GAN architecture for diversiﬁed X-ray im-

age synthesis. The generator and discriminator models use multi-scale gradient

learning to learn the gradient information at intermediate layers of the gener-

ator and discriminator models using multi-scale image resolutions during the

training of GAN. A self-attention layer is proposed in the generator and dis-

criminator models to learn the long-range dependencies of X-ray image features

during training through a multi-scale gradient approach. The relativistic-hinge

loss is used to stabilize the training and generate diverse synthetic images. The

MS-SSIM and FID scores are used to evaluate the diversity of generated images.

2 Related Work

Several GAN models with modiﬁed architectures and loss functions have been

proposed to improve the generation of diverse synthetic images. GAN archi-

tectures have been proposed with novel discriminators and generators based

on the application domains. The performance of GANs has improved by em-

bedding new convolutional layers, normalization, and regularization techniques

in the generator and discriminator models [29][30][31]. Several loss functions

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ASelf-attentionGuidedMulti-scaleGradientGANforDiversiedX-rayImageSynthesis?MuhammadMuneebSaad1[0000000202040597],MubashirHusainRehmani1[0000000235657390],andRuairiO'Reilly1[0000000179903461]MunsterTechnologicalUniversityCork,Irelandmuhammad.saad@mycit.ie,mubashir.rehmani@mtu.ie,andruairi.oreilly@mt...

展开>> 收起<<

A Self-attention Guided Multi-scale Gradient GAN for Diversied X-ray Image Synthesis Muhammad Muneeb Saad10000 000202040597 Mubashir Husain.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Self-attention Guided Multi-scale Gradient GAN for Diversied X-ray Image Synthesis Muhammad Muneeb Saad10000 000202040597 Mubashir Husain

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: