A Self-attention Guided Multi-scale Gradient GAN for Diversied X-ray Image Synthesis Muhammad Muneeb Saad10000 000202040597 Mubashir Husain

2025-04-30 1 0 587.73KB 14 页 10玖币
侵权投诉
A Self-attention Guided Multi-scale Gradient
GAN for Diversified X-ray Image Synthesis?
Muhammad Muneeb Saad1[0000000202040597], Mubashir Husain
Rehmani1[0000000235657390], and Ruairi O’Reilly1[0000000179903461]
Munster Technological University Cork, Ireland
muhammad.saad@mycit.ie, mubashir.rehmani@mtu.ie, and
ruairi.oreilly@mtu.ie
Abstract. Imbalanced image datasets are commonly available in the
domain of biomedical image analysis. Biomedical images contain diver-
sified features that are significant in predicting targeted diseases. Gen-
erative Adversarial Networks (GANs) are utilized to address the data
limitation problem via the generation of synthetic images. Training chal-
lenges such as mode collapse, non-convergence, and instability degrade
a GAN’s performance in synthesizing diversified and high-quality im-
ages. In this work, MSG-SAGAN, an attention-guided multi-scale gra-
dient GAN architecture is proposed to model the relationship between
long-range dependencies of biomedical image features and improves the
training performance using a flow of multi-scale gradients at multiple
resolutions in the layers of generator and discriminator models. The in-
tent is to reduce the impact of mode collapse and stabilize the training of
GAN using an attention mechanism with multi-scale gradient learning for
diversified X-ray image synthesis. Multi-scale Structural Similarity In-
dex Measure (MS-SSIM) and Frechet Inception Distance (FID) are used
to identify the occurrence of mode collapse and evaluate the diversity
of synthetic images generated. The proposed architecture is compared
with the multi-scale gradient GAN (MSG-GAN) to assess the diversity
of generated synthetic images. Results indicate that the MSG-SAGAN
outperforms MSG-GAN in synthesizing diversified images as evidenced
by the MS-SSIM and FID scores.
Keywords: GANs ·Self-Attention ·Multi-scale Gradients ·Mode Col-
lapse ·Diversity ·X-ray images ·Synthesis ·MS-SSIM ·FID.
1 Introduction
Generative adversarial networks (GANs) are generative models used for image
synthesis in the computer vision domain [1]. GANs are composed of genera-
tor and discriminator models. The generator takes a random vector input and
generates a noisy image. This image is passed to the discriminator model. The
?This work is supported by the Munster Technological University’s Risam Scholarship
Award
arXiv:2210.06334v2 [eess.IV] 12 Nov 2022
2 Muhammad Muneeb Saad. et al.
discriminator model classifies the generated images from the real images and
provides gradient feedback to the generator. The generator model updates its
learning of the feature distribution of real images through feedback provided
by the discriminator. GANs work with adversarial training where the generator
and the discriminator try to improve their performance based on each other’s
feedback [2].
GANs face difficulty in synthesizing images with complex and diverse fea-
tures. This problem arises due to technical challenges that occur during the train-
ing of GANs. Training challenges include mode collapse, non-convergence, and
instability [3]. Mode collapse refers to the generation of identical synthetic im-
ages by the generator regardless of diverse real images while the non-convergence
and instability problem imbalanced the training due to the vanishing gradient
problem. These problems limit the utility of GANs for image datasets with a
diverse range of salient image features [4]. In general, GANs are designed with
convolutional neural networks (CNNs) that fail to capture image features such
as texture, geometry, position, and color of the objects. One of the reasons could
be that the CNNs mostly utilize convolutional features in modeling the depen-
dencies over diverse image regions [5].
In the domain of biomedical imaging, the diverse features of biomedical im-
ages are important to consider in disease recognition or computer-based diagnosis
tasks [6]. These diverse features contain significant information about the disease
being diagnosed and analyzed. GANs have been utilized for biomedical image
synthesis. Several imaging modalities such as X-rays, Computed Tomography
(CT), Magnetic Resonance (MR), Ultrasound, and Positron Emission Tomogra-
phy (PET) have utilized GANs to generate synthetic samples [7]. The generation
of diversified synthetic images is a significant barrier for GANs that limits their
utility in the biomedical imaging domain.
X-ray images are widely utilized to diagnose diseases in the human body. X-
ray images contain a wide spectrum of disease features that help physicians to
monitor diseases more accurately [8]. Publicly available X-ray image datasets are
limited and imbalanced [9]. Image synthesis is a potential means of augmenting
and balancing these X-ray images. In image synthesis, synthetic images are pro-
duced by replicating the actual distributions of image features. Therefore, this
method is significant as compared to the traditional augmentation approaches
such as geometrical transformations [10]. GANs have demonstrated remarkable
advancements in image synthesis in the biomedical imaging domain [11].
State-of-the-art GANs such as ProGAN [12], StyleGAN [13], and MSG-GAN
[14] have been used for biomedical image synthesis. These GAN architectures
have demonstrated significant performance in generating diverse images [15].
Minibatch discrimination, PixNorm, progressive growth of GAN layers, and
Spectral normalization techniques have also been utilized to enhance the di-
versity of synthetic images. The multi-scale gradient technique enables the dis-
criminator learning more robust for the classification of real and synthetic images
[16]. Biomedical images contain salient disease features such as the location, size,
color, and structure of the disease region of interest. These features are suscep-
Title Suppressed Due to Excessive Length 3
tible and important to predict and analysis of the disease. GANs learn images
through convolutional features without giving attention to these salient features
when generating synthetic images. However, it is important for a GAN to learn
these biomedical image features during the training process.
In the domain of image recognition, self-attention is considered the best ap-
proach to focusing on diverse features of the images [17]. The self-attention
measures relative information of features based on their feature maps and com-
bines them globally with a weighted scoring function. Consequently, it helps to
focus on the significant features for the specific application tasks [5].
To address the training challenges of GANs, several GAN variants based on
the attention mechanisms have attempted to improve the training performance
of GANs for natural and biomedical images [17]. Self-attention improves the
learning of generator and discriminator models in generating diversified biomed-
ical images [18].
In order to balance and stabilize the training of a GAN, the loss function has
also a great impact on the GAN’s training performance for generating realis-
tic synthetic images. Loss functions such as WGAN-GP, Hinge, and relativistic
hinge losses have shown a reasonable improvement in generating diversified syn-
thetic images [19]. However, the hinge loss has shown a great capacity to improve
the GAN’s learning to generate diverse biomedical images [20].
The occurrence of mode collapse and diversity of synthetic images is assessed
by the Multi-scale Structural Similarity Index Measure (MS-SSIM) and Frechet
Inception Distance (FID). The MS-SSIM score can detect the lack of diversity
using perceptual similarity measures in synthetic images while the FID score
provides a distance between the feature distributions of real and synthetic images
[21].
This work contributes a novel GAN architecture for diversified X-ray im-
age synthesis. The generator and discriminator models use multi-scale gradient
learning to learn the gradient information at intermediate layers of the gener-
ator and discriminator models using multi-scale image resolutions during the
training of GAN. A self-attention layer is proposed in the generator and dis-
criminator models to learn the long-range dependencies of X-ray image features
during training through a multi-scale gradient approach. The relativistic-hinge
loss is used to stabilize the training and generate diverse synthetic images. The
MS-SSIM and FID scores are used to evaluate the diversity of generated images.
2 Related Work
Several GAN models with modified architectures and loss functions have been
proposed to improve the generation of diverse synthetic images. GAN archi-
tectures have been proposed with novel discriminators and generators based
on the application domains. The performance of GANs has improved by em-
bedding new convolutional layers, normalization, and regularization techniques
in the generator and discriminator models [29][30][31]. Several loss functions
摘要:

ASelf-attentionGuidedMulti-scaleGradientGANforDiversi edX-rayImageSynthesis?MuhammadMuneebSaad1[0000000202040597],MubashirHusainRehmani1[0000000235657390],andRuairiO'Reilly1[0000000179903461]MunsterTechnologicalUniversityCork,Irelandmuhammad.saad@mycit.ie,mubashir.rehmani@mtu.ie,andruairi.oreilly@mt...

展开>> 收起<<
A Self-attention Guided Multi-scale Gradient GAN for Diversied X-ray Image Synthesis Muhammad Muneeb Saad10000 000202040597 Mubashir Husain.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:587.73KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注