A Comparative Study on 1.5T - 3T MRI
Conversion through Deep Neural Network Models
Binhua Liao∗†, Yani Chen∗, Zhewei Wang∗, Charles D. Smith‡, Jundong Liu∗
∗School of Electrical Engineering and Computer Science, Ohio University, USA
†College of Mathematics and Statistics, Huazhong Normal University, PR China
‡Department of Neurology, University of Kentucky, USA
Abstract—In this paper, we explore the capabilities of a number
of deep neural network models in generating whole-brain 3T-
like MR images from clinical 1.5T MRIs. The models include
a fully convolutional network (FCN) method and three state-
of-the-art super-resolution solutions, ESPCN [26], SRGAN [17]
and PRSR [7]. The FCN solution, U-Convert-Net, carries out
mapping of 1.5T-to-3T slices through a U-Net-like architecture,
with 3D neighborhood information integrated through a multi-
view ensemble. The pros and cons of the models, as well the
associated evaluation metrics, are measured with experiments
and discussed in depth. To the best of our knowledge, this study
is the first work to evaluate multiple deep learning solutions for
whole-brain MRI conversion, as well as the first attempt to utilize
FCN/U-Net-like structure for this purpose.
Index Terms—FCN, MRI, modality conversion, U-Net, U-
Convert-Net, GAN, SRGAN
I. INTRODUCTION
Magnetic resonance imaging (MRI) is widely used in neu-
roimaging and the popularity is due to its non-invasive nature,
high soft tissue contrast, as well as the availability of safe
intracellular contrast agents. Currently 1.5 tesla (T) short-bore
MRI is the standard technology for clinical use. However,
3T (and even 7T) MRI scanners are becoming increasingly
more desirable, as they can provide extremely clear and vivid
images. Comparing with 1.5T, 3T MR images have higher
signal-to-noise ratios (SNR) and higher contrast-to-noise ratios
(CNR) between gray and white matter. The latter make 3T
MRI a better choice for brain tissue segmentation, as well as
a generally preferred modality in neuroimaging studies.
While the availability of 3T MRI has increased significantly
over the past decade, the majority of clinical scanners across
the US are still 1.5T systems. Converting 1.5T images into 3T-
like images, if with great fidelity, would help physicians make
better informed diagnosis and treatment decisions. In addition,
historical 1.5T MR images in various ongoing longitudinal
studies can also be brought into a better use. One of such
examples is the Alzheimer’s Disease Neuroimaging Initiative
(ADNI) project – 1.5T was the major MRI modality in ADNI
1, the first stage of the project, but the acquisition switched
to 3T alone in later stages (ADNI GO, 2 and 3). Converting
1.5T images into 3T-like counterparts may allow the datasets
generated in such studies to be delivered in a more uniform
form.
Establishing a nonlinear spatially-varying intensity mapping
between two images is a challenging task. The efforts to tackle
this problem can trace back to at least the Image Analogies
model [12], which relies on a nonparametric texture model [9]
to learn the mapping on a single pair of input-output images.
The emerge of the powerful deep learning paradigm in recent
years makes the task more viable. Generative adversarial net-
works (GAN) [15], [17], [34], [37], [41] and pixel-RNN/CNN
[7] are among the models that have been applied for modality
conversion, producing impressive results.
The original GAN model by Goodfellow et al. [11] was
designed to generate images that are similar to the training
samples. Several later solutions, including DualGAN [37],
CycleGAN [41] and DiscoGAN [16], take the similar idea
to train image-to-image translation with unpaired natural im-
ages. The CycleGAN model has been adopted to synthesize
CT images from MRIs [34]. While flexible and with broad
applicability, this group of solutions reply on the distribution
of real samples instead of paired inputs/outputs, even if the
latter are available. Consequently, the results from this group
can be rather unstable and far from uniformly positive [41].
Some GANs, including pix2pix [15] and PAN [31], take paired
training samples to trade flexibility for stability.
With paired input/output samples, MR modality conversion
could be implemented as a special case of super-resolution,
where one or multiple low-resolution images are combined
to generate images with higher spatial resolution. Traditional
super-resolution solutions include reconstruction-based meth-
ods [25], [27], [30], [35], and example-based methods [3],
[10], [14], [20], [24], [36], [39]. Under the deep learning
framework, numerous new super-resolution solutions have
recently been developed, in both the computer vision [7],
[26] and medical image computing [2], [1], [8], [18], [28],
[40] communities. SRGAN [17], a model designed to recover
the finer texture details even with large upscaling factors, is
commonly regarded as one of the state-of-the-art solutions.
Fully convolutional networks (FCN) proposed by Long et
al. [19] was primarily designed for image segmentation, which
can also be regarded as a special type of modality mapping
– from gray-valued intensities to binary-valued labels. U-Net
[23] and its variants [4], [5], [29], [32], [33] follow the similar
idea of FCN and rely on skip connections to concatenate
features from the contracting (convolution) and expanding
arXiv:2210.06362v1 [eess.IV] 12 Oct 2022