A Comparative Study on 1.5T - 3T MRI Conversion through Deep Neural Network Models Binhua Liaoy Yani Chen Zhewei Wang Charles D. Smithz Jundong Liu

2025-04-27 0 0 445.73KB 6 页 10玖币
侵权投诉
A Comparative Study on 1.5T - 3T MRI
Conversion through Deep Neural Network Models
Binhua Liao, Yani Chen, Zhewei Wang, Charles D. Smith, Jundong Liu
School of Electrical Engineering and Computer Science, Ohio University, USA
College of Mathematics and Statistics, Huazhong Normal University, PR China
Department of Neurology, University of Kentucky, USA
Abstract—In this paper, we explore the capabilities of a number
of deep neural network models in generating whole-brain 3T-
like MR images from clinical 1.5T MRIs. The models include
a fully convolutional network (FCN) method and three state-
of-the-art super-resolution solutions, ESPCN [26], SRGAN [17]
and PRSR [7]. The FCN solution, U-Convert-Net, carries out
mapping of 1.5T-to-3T slices through a U-Net-like architecture,
with 3D neighborhood information integrated through a multi-
view ensemble. The pros and cons of the models, as well the
associated evaluation metrics, are measured with experiments
and discussed in depth. To the best of our knowledge, this study
is the first work to evaluate multiple deep learning solutions for
whole-brain MRI conversion, as well as the first attempt to utilize
FCN/U-Net-like structure for this purpose.
Index Terms—FCN, MRI, modality conversion, U-Net, U-
Convert-Net, GAN, SRGAN
I. INTRODUCTION
Magnetic resonance imaging (MRI) is widely used in neu-
roimaging and the popularity is due to its non-invasive nature,
high soft tissue contrast, as well as the availability of safe
intracellular contrast agents. Currently 1.5 tesla (T) short-bore
MRI is the standard technology for clinical use. However,
3T (and even 7T) MRI scanners are becoming increasingly
more desirable, as they can provide extremely clear and vivid
images. Comparing with 1.5T, 3T MR images have higher
signal-to-noise ratios (SNR) and higher contrast-to-noise ratios
(CNR) between gray and white matter. The latter make 3T
MRI a better choice for brain tissue segmentation, as well as
a generally preferred modality in neuroimaging studies.
While the availability of 3T MRI has increased significantly
over the past decade, the majority of clinical scanners across
the US are still 1.5T systems. Converting 1.5T images into 3T-
like images, if with great fidelity, would help physicians make
better informed diagnosis and treatment decisions. In addition,
historical 1.5T MR images in various ongoing longitudinal
studies can also be brought into a better use. One of such
examples is the Alzheimer’s Disease Neuroimaging Initiative
(ADNI) project – 1.5T was the major MRI modality in ADNI
1, the first stage of the project, but the acquisition switched
to 3T alone in later stages (ADNI GO, 2 and 3). Converting
1.5T images into 3T-like counterparts may allow the datasets
generated in such studies to be delivered in a more uniform
form.
Establishing a nonlinear spatially-varying intensity mapping
between two images is a challenging task. The efforts to tackle
this problem can trace back to at least the Image Analogies
model [12], which relies on a nonparametric texture model [9]
to learn the mapping on a single pair of input-output images.
The emerge of the powerful deep learning paradigm in recent
years makes the task more viable. Generative adversarial net-
works (GAN) [15], [17], [34], [37], [41] and pixel-RNN/CNN
[7] are among the models that have been applied for modality
conversion, producing impressive results.
The original GAN model by Goodfellow et al. [11] was
designed to generate images that are similar to the training
samples. Several later solutions, including DualGAN [37],
CycleGAN [41] and DiscoGAN [16], take the similar idea
to train image-to-image translation with unpaired natural im-
ages. The CycleGAN model has been adopted to synthesize
CT images from MRIs [34]. While flexible and with broad
applicability, this group of solutions reply on the distribution
of real samples instead of paired inputs/outputs, even if the
latter are available. Consequently, the results from this group
can be rather unstable and far from uniformly positive [41].
Some GANs, including pix2pix [15] and PAN [31], take paired
training samples to trade flexibility for stability.
With paired input/output samples, MR modality conversion
could be implemented as a special case of super-resolution,
where one or multiple low-resolution images are combined
to generate images with higher spatial resolution. Traditional
super-resolution solutions include reconstruction-based meth-
ods [25], [27], [30], [35], and example-based methods [3],
[10], [14], [20], [24], [36], [39]. Under the deep learning
framework, numerous new super-resolution solutions have
recently been developed, in both the computer vision [7],
[26] and medical image computing [2], [1], [8], [18], [28],
[40] communities. SRGAN [17], a model designed to recover
the finer texture details even with large upscaling factors, is
commonly regarded as one of the state-of-the-art solutions.
Fully convolutional networks (FCN) proposed by Long et
al. [19] was primarily designed for image segmentation, which
can also be regarded as a special type of modality mapping
– from gray-valued intensities to binary-valued labels. U-Net
[23] and its variants [4], [5], [29], [32], [33] follow the similar
idea of FCN and rely on skip connections to concatenate
features from the contracting (convolution) and expanding
arXiv:2210.06362v1 [eess.IV] 12 Oct 2022
3T
SRGAN for image translation
Conv
PReLU
Conv
BN
PReLU
Elementwise sum
Conv
BN
Elementwise Sum
Conv
Conv
Conv
k9n64s1 k3n64s1 k3n64s1 k3n64s1 k3n256s1 k3n256s1 k9n3s1
16 residual blocks
skip connection
Generator Network
Discriminator Network
PReLU
PReLU
Conv
Leaky ReLU
k3n64s1
Conv
BN
Leaky ReLU
Dense(1024)
Leaky ReLU
Dense(1)
Sigmoid
k3n64s2 k3n128s1 k3n128s2 k3n256s1 k3n256s2
k3n512s1k3n512s2
?
1.5T 3T-like
3T-like
3T-like
Fig. 1: The overall architecture of SRGAN for MRI conversion: generator and discriminator with corresponding kernel size
(k), number of feature maps (n) and stride (s) indicated for each convolutional layer.
(deconvolution) paths. In theory, an FCN with a proper setup
can potentially describe any intensity mapping between two
modalities. However, such capacity of FCN has yet been
explored for general-purpose modality conversion. It should be
noted that, Nie et al. [21] use a convolutional network for MR
to CT conversion, but their network structure is not FCN/U-
Net equivalent, as no pooling, skip connections, contracting
and expanding components are utilized.
In this paper, we explore the capability of a number of
super-resolution (SR) and segmentation models in handling
modality conversion. More specifically, we adopt SR models
including ESPCN [26], SRGAN and PRSR [7], and modify
Chen’s segmentation model [5], to convert 1.5T whole-brain
MR images into 3T. Experiments are conducted with ADNI
data. To the best of our knowledge, this study is the first
work to compare and evaluate multiple deep learning solutions,
based on various performance metrics, for whole-brain MRI
conversion.
II. MOTHOD
The MR conversion models to be analyzed in this study are
modified from SR and segmentation solutions, respectively. In
this section, we introduce them in detail.
A. Modified from Super-Resolution Solutions
SRGAN is designed to generate 4×upscaled photo-realistic
natural images with highly perceptual quality. Focused on
recovering finer texture details in up-scaled images, SRGAN
adopts a perceptual loss function that consists of an adversarial
loss and a content loss. As a super-resolution solution, SRGAN
produces outputs that have different sizes from inputs. To suit
for our MRI conversion task, we remove one upsampling layer
to make the input/output of equal size.
As shown in Fig. 1, the modified SRGAN for MR conver-
sion model consists of two major components: generator and
discriminator. The generator part is a deep residual network
(ResNet) with skip-connections, generating 3T-like images
from 1.5T inputs. The goal of the generator is to be able
to produce 3T images so realistic that it would be able to
fool the discriminator. The discriminator, on the other hand,
is configured as a classification CNN and its goal is to be
trained as sharp as possible to distinguish fake 3T from real
3T images. With this setup, the generator can eventually learn
to create outputs that are highly similar to real 3T images.
ESPCN uses two convolutional layers to extract feature
maps from low resolution image and then applies a sub-pixel
convolution layer to transform these feature maps back to
an enlarged super resolution image. The sub-pixel layer is
designed to be very efficient, which reduces the computational
complexity of the model and enables the system to achieve
real-time super-resolution of 1080p videos on a single K2
GPU.
PRSR is a super resolution model build upon ResNet and
PixelCNNs (a probabilistic generative model) that is capable
of enlarging small input image to a wide range of plausible
high-resolution images with large amplification factors. Exper-
iments show that the transformed images obtain high rate of
perceptual evaluation by humans.
摘要:

AComparativeStudyon1.5T-3TMRIConversionthroughDeepNeuralNetworkModelsBinhuaLiaoy,YaniChen,ZheweiWang,CharlesD.Smithz,JundongLiuSchoolofElectricalEngineeringandComputerScience,OhioUniversity,USAyCollegeofMathematicsandStatistics,HuazhongNormalUniversity,PRChinazDepartmentofNeurology,Universityof...

展开>> 收起<<
A Comparative Study on 1.5T - 3T MRI Conversion through Deep Neural Network Models Binhua Liaoy Yani Chen Zhewei Wang Charles D. Smithz Jundong Liu.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:445.73KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注