1 Flexible Alignment Super-Resolution Network for Multi-Contrast MRI

2025-04-28 0 0 1.87MB 12 页 10玖币
侵权投诉
1
Flexible Alignment Super-Resolution Network for
Multi-Contrast MRI
Yiming Liu ˙
ID ,Member, IEEE, Mengxi Zhang ˙
ID , Bo Jiang, Bo Hou,
Dan Liu, Jie Chen, Member, IEEE, Heqing Lian ˙
ID ,Member, IEEE
Abstract—Magnetic resonance imaging plays an essential role
in clinical diagnosis by acquiring the structural information
of biological tissue. Recently, many multi-contrast MRI super-
resolution networks achieve good effects. However, most studies
ignore the impact of the inappropriate foreground scale and
patch size of multi-contrast MRI, which probably leads to
inappropriate feature alignment. To tackle this problem, we pro-
pose the Flexible Alignment Super-Resolution Network (FASR-
Net) for multi-contrast MRI Super-Resolution. The Flexible
Alignment module of FASR-Net consists of two modules for
feature alignment. (1) The Single-Multi Pyramid Alignment(S-
A) module solves the situation where low-resolution (LR) im-
ages and reference (Ref) images have different scales. (2) The
Multi-Multi Pyramid Alignment(M-A) module solves the situa-
tion where LR and Ref images have the same scale. Besides,
we propose the Cross-Hierarchical Progressive Fusion (CHPF)
module aiming at fusing the features effectively, further im-
proving the image quality. Compared with other state-of-the-
art methods, FASR-net achieves the most competitive results
on FastMRI and IXI datasets. Our code will be available at
https://github.com/yimingliu123/FASR-Net.
Index Terms—Magnetic resonance imaging, Multi-Contrast
Super-Resolution, feature fusion, feature alignment.
I. INTRODUCTION
Magnetic Resonance Imaging (MRI) is a non-invasive imag-
ing modality that enables the observation of three-dimensional
detailed anatomical images and plays a significant role in pro-
viding clear information about soft tissue structure. However,
during acquiring magnetic resonance images, patients have to
endure physical and psychological discomfort, including irri-
tating noise and acute anxiety. To make the patient feel cozier,
technically, it will reduce the retention time that patients stay
in the strong magnetic field at the expense of image quality.
Super-Resolution reconstruction technology can improve the
image quality without changing the hardware settings, which is
extensively utilized as the post-processing tool to overcome the
difficulty in obtaining high-resolution (HR) MRI scans [1],[2].
This work was supported by the China National Funds for Distinguished
Young Scientists under Grant 81601485. (Yiming Liu and Mengxi Zhang
contributed equally to this work. Corresponding authors: Jie Chen; Heqing
Lian.)
Yiming Liu and Heqing Lian are with the Xiao Ying AI Lab, Beijing,
China. ( liuyiming@xiaoyingai.com, lianheqing@xiaoyingai.com).
Mengxi Zhang is with the School of Electrical and Information Engineering,
Tianjin University, Tianjin, China. (mengxizhang@tju.edu.cn).
Bo Jiang, Dan Liu and Bo Hou are with peking union medical
college hospital, Beijing, China. (jbpumch@163.com, liud2104@163.com,
houbo97@pumch.cn)
Jie Chen is with the School of Electronics Engineering and Computer
Science, Peking University, China. (jiechen2019@pku.edu.cn)
The research on image SR is divided into two
categories: single-image Super-Resolution (SISR) and
reference-based image Super-Resolution (Ref-SR).
SISR [3], [4], [5], [6], [7], [8], [9] only adopts a single
low-resolution (LR) image to recover HR images, which
highly depends on the prior knowledge of training sets.
Some transformer-based methods achieve fantastic results
in SISR, such as [7], [10]. Since the domain shift between
training sets and test sets and the complementary details are
inferred by training sets, as a result, SISR often produces
blurry effects because the HR textural features cannot be
effectively recovered in the reconstruction process. The
texture information of medical images is crucial evidence for
doctors’ diagnoses. Therefore, SISR is not suitable for the
medical image Super-Resolution.
T2-Weighted Images PD-Weighted Images T2-Weighted Images PD-Weighted Images
Cross-Attention Method Flexible Align Method
(a)
(b)
(c)
Fig. 1. The comparison between Cross Attention method (CA) and our
Flexible Align method (FA). FA contains two parts: S-A module and M-
A module. Specifically, when the scale of LR and Ref images is different,
S-A module will adjust the mismatch produced by CA, as shown in (a). M-A
module solves the mismatch caused by the inappropriate patch size. If the
patch size is too large, the similarity between patches will be dominated by
background noise, which is illustrated in (b). When the patch size is too small,
the insufficient semantics will lead to mismatch, which is denoted in (c).
Ref-SR [11], [12], [13], [14], [15], [16] adopts an additional
high-resolution reference image as an auxiliary which transfers
HR textures and details to the LR image in the process of SR.
In clinical settings, MRI generates multi-contrast images for
diagnosis together. Due to different settings, the appearances
and functions of these images are widely divergent. However,
these images can be used as complementary information for
arXiv:2210.03460v2 [eess.IV] 8 Jan 2023
2
diagnosing the same anatomical structure. Generally, T1, T2,
PD, and FS-PD weighted images are produced together in
the acquisition of MRI. Clinically, PD-weighted images have
shorter repetition and echo time than T2 weighted images [17].
Inspired by this, some Ref-SR based methods leverage the HR
PD-weighted images to recover the HR T2 images from the LR
T2 images. For Ref-SR, some researches roughly complement
LR image features with Ref image features through the plus
or concatenation operation, where the improvement of the LR
images quality is limited. Subsequently, a series of methods
adopt deformable convolution. [18], [14]to fuse the Ref
image features and LR image features. Existing state-of-the-art
(SOTA) feature fusion methods subsequently unfold the image
into patches and adopt Transformer-based cross-attention (CA)
mechanism [14], [15] to calculate the correlation between
patches of LR and Ref images. These methods have verified
that the feature alignment, which means matching valuable
information of the patch from Ref images to the corresponding
LR images, strongly impacts the reconstruction of HR images.
Distinct from natural images, the color of MRI is sole and
the object boundary is more ambiguous. Relevant experiments
indicate that despite the existence of authentic high-frequency
details in Ref images, the network cannot completely trans-
form these details into HR images. We divide the MRI images
into two parts: foreground and background. Specifically, the
foreground contains some concerning tissue and texture which
are important in SR. The background consists of some less
important regions, such as the black region and the skeleton
where the pixel information is nearly equal. Theoretically,
the cross-attention (CA) methods only consider the search
for the most relevant regions but ignore the variety of the
scale of foreground. Through a large amount of experiments
and observations, we find that the flexibility of patches has
a significant effect on the feature alignment. Specifically,
consider two cases:
1) When the LR and Ref image foregrounds are of different
scales, the patch will contain different areas of the foreground.
However, the foregrounds of the LR and Ref images have
the same semantic information. The affinity of the semantic
information will lead to the mismatch between LR and Ref
image patches, as illustrated in Fig. 1(a).
2) Assumed that the foreground scales of LR and Ref
images are the same, the cross-attention (CA) method ignores
the harmony between the patch size and the scale of the
foreground. The fixed patch size barely adapts to the various
scale of foregrounds. Therefore, the patch size is hard to fit the
foreground scale. For example, if the size of the foreground is
smaller than patch size, the patch will contain massive amounts
of information, further interfering with the calculation of the
correlation matrix, as illustrated in Fig. 1(b). If the patch size is
too small, different patches will mismatch due to the similarity
of local features, as illustrated in Fig. 1(c).
In fact, scale diversity has been shown to be important
in feature expression [19] and image restoration [20], [21].
Small-scale features can provide more complete semantic
features, while large-scale feature maps can provide texture
details. Based on this consideration, the core concept of the
patch size can be illustrated as follows. (1) The patch should
contain sufficient foreground information which contributes
to the alignment. (2) In the meantime, disturbed background
information is not expected too much in patch. To meet these
demands, the receptive fields of patch should be adjustable.
Therefore, we propose the Flexible Alignment (FA) module
aiming at generating various patch size and receptive field
to improve the precision of feature alignment. Specifically,
FA contains the Single-Multi Pyramid Alignment module (S-
A) and the Multi-Multi Pyramid Alignment module (M-A)
which respectively serves for the case I and case II. S-A
leverages various receptive field to ensure the completeness of
foreground information. M-A dynamically adjusts the patch
size of LR and Ref images to escape from the influence
of background. Additionally, we fuse the multi-scale fea-
tures with the Cross-Hierarchical Progressive Fusion (CHPF)
module, further improving the image quality. Furthermore,
fourier loss function is introduced to optimize the model. Our
contributions can be summarized as follows:
We propose the FASR-Net to transform the textural
information of high-resolution PD images into low-resolution
T2 images and make the texture more realistic.
Our model jointly combines the Multi-Multi Pyramid
Alignment module (M-A) and the Single-Multi Pyramid
Alignment module (S-A) to endow feature alignment with
flexibility.
We introduce an effective feature fusion backbone Cross-
Hierarchical Progressive Fusion (CHPF) which takes advan-
tage of textural information and details of multi-scale features.
Our code will be available at FASR-Net.
II. RELATED WORK
A. Single Image Super-Resolution
For the past few years, deep learning-based SISR meth-
ods have performed amazing performances. Some Coarse-
to-Fine works [20], [21] have attractive results. Cai et
al. [20] proposed a novel Transformer-based method, coarse-
to-fine sparse Transformer (CST). Specifically, CST uses
spectraaware screening mechanism (SASM) for coarse patch
selecting. Then the selected patches are fed into spectra-
aggregation hashing multi-head self-attention (SAH-MSA) for
fine pixel clustering and self similarity capturing. Liang et
al. [21] focus on speeding-up the high-resolution photore-
alistic I2IT tasks based on closed-form Laplacian pyramid
decomposition and reconstruction. Except for Coarse-to-Fine
method, Lu et al. propose SRCNN [4], which introduces
deep convolutional neural networks in the field of image
Super-Resolution. Thereafter, residual blocks [6] and atten-
tion mechanisms [22], [5], [3] are introduced to deepen
the network. However, these approaches improve the image
quality restrictedly since the difficulty in recovering high-
frequency details. Christian et al. [23] firstly adopt Generative
Adversarial Network (GAN) in SR tasks. According to Chris-
tian, minimizing the mean squared loss (MSE) often lacks
high-frequency details. Thus, [23] utilizes the perceptual loss
which consists of content loss and adversarial loss. Wang et
al. [24] further propose enhanced generator and discriminator
obtaining more perceptually competitive results. Yan et al. [25]
3
Cross-Hierarchical
Progressive Fusion
Module (CHPF)
Texture
Extractor
Texture
Extractor
Texture
Extractor
Unfold
Single-Multi Pyramid
Alignment Module (S-A)
Multi -Multi Pyramid
Alignment Module (M-A)
Flexible Alignment ModuleShallow Feature Extractor
Fig. 2. The overview of RCFA-Net. RCFA-Net is composed of three parts: the shallow feature extractor, the Flexible Alignment module, and the Cross-
Hierarchical Progressive Fusion module. Qn×,Kn×, and Vn×are multi-scale shallow features extracted from ILR,IRef ↓↑, and IRef . The S-A and M-A
are two parts of the Flexible Alignment module introduced in Section III C. FSA and FM A are respectively aligned features produced by S-A and M-A
module. After concatenating the features, the Cross-Hierarchical Progressive Fusion module is utilized to reconstruct the ISR.
adopt FASRGAN to discriminate each pixel of real and fake
images. Knowledge distillation framework is also introduced
in SR tasks, such as [8], [9]. Recently, some Transformer-
based networks have been applied to SISR tasks [7], [10].
ESRT [7] adopts a high-preserving block and lightweight
transformer backbone, achieving satisfying results with low
computational cost. SwinIR [10] adopts shifted-window-based
self-attention mechanism in Swin Transformer [26].
Although SISR approaches achieve marvelous results in
the natural image domain, these methods are not suitable for
medical images. The details of HR medical images, which are
significant for diagnosis, are generated by networks but are not
authentic. Therefore, the reasonable method for MRI Super-
Resolution is complementing high-frequency information from
additional HR images. Therefore, Ref-SR methods are prior to
achieving believing methods for medical images.
B. Reference-Based Image Super-Resolution
Ref-SR adopts an additional high-resolution reference im-
age to resolve low-resolution image. Compared with SISR,
Ref-SR is more likely to harvest accurate textural information.
One branch of Ref-SR methods is to align LR and Ref images.
CrossNet [11] adopts an end-to-end and fully-convolutional
neural network with the optical flow estimator to align Ref and
LR images. However, this approach depends on the alignment
of Ref and LR images to a great extent. Additionally, the
utilization of optical flow neglects the long-range dependen-
cies. SSEN [12] introduces a stack of deformable convolu-
tion [27] layers, enlarging the receptive field of Ref images.
C2-matching [13] introduces the contrastive correspondence
network and teacher-student correlation distillation to align
images on pixel level. However, because the restoration mod-
ule of C2-matching contains only simple residual blocks,
the misalignment between images will drastically destroy the
performance of this method.
Another mainstream of Ref-SR approaches is based on
patch matching [28]. SRNTT [14] adopts cross-attention
mechanism to achieve patch matching, which endows LR im-
ages with HR details by transferring textural information from
Ref images according to the correlation. Further, TTSR [15]
retains the idea of cross-attention and introduces soft-attention
module which subsequently computes the relevance between
original and swapped features and feeds all swapped features
with different weights into the main network. MASA [16]
takes the potential enormous difference, such as color and
luminance distribution, into consideration and reduces the
computational cost. Cao et al. [29] combine deformable at-
tention with cross-attention mechanism, further improving the
performance of Res-SR in exchange for the sacrifice of the
computational cost.
III. METHOD
A. Overview
The Ref-SR methods aiming at transforming details of high-
resolution reference (Ref) images into low-resolution (LR)
images have achieved fantastic results recently. The details
recovered by the Ref-SR methods are more reliable than the
SISR methods. Clinically, to observe a tissue extensively, a se-
ries of multi-contrast images will be produced together during
the acquisition of MRI. Therefore, an intuitive thought is that
the low-cost images (PD) can be used as references to offer
helpful detail information for high-cost images (T2). Based on
this consideration, we propose a novel Multi-Contrast Flexible
Alignment Super-Resolution Network (FASR-Net) for MRI.
The architecture of FASR-Net is shown in Fig. 2.ILRand
IRef represent upsampled T2 images and PD images respec-
tively. We sequentially apply downsampling and upsampling
with the same factor 4×on PD images to obtain IRef↓↑.
Functionally, the FASR-Net can be roughly divided into
three parts: the shallow feature extractor, the Flexible Align-
ment (FA) module and the Cross-Hierarchical Progressive
Fusion (CHPF) module. Specifically, the shallow feature ex-
tractor aims at obtaining robust semantic features. FA module
which is composed of M-A and S-A serves for feature align-
ment. After the feature alignment, we leverage CHPF to fuse
the aligned features.
摘要:

1FlexibleAlignmentSuper-ResolutionNetworkforMulti-ContrastMRIYimingLiu_ID,Member,IEEE,MengxiZhang_ID,BoJiang,BoHou,DanLiu,JieChen,Member,IEEE,HeqingLian_ID,Member,IEEEAbstract—Magneticresonanceimagingplaysanessentialroleinclinicaldiagnosisbyacquiringthestructuralinformationofbiologicaltissue.Recentl...

展开>> 收起<<
1 Flexible Alignment Super-Resolution Network for Multi-Contrast MRI.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.87MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注