1 Flexible Alignment Super-Resolution Network for Multi-Contrast MRI

2025-04-28 0 0 1.87MB 12 页 10玖币

侵权投诉

Flexible Alignment Super-Resolution Network for

Multi-Contrast MRI

Yiming Liu ˙

ID ,Member, IEEE, Mengxi Zhang ˙

ID , Bo Jiang, Bo Hou,

Dan Liu, Jie Chen, Member, IEEE, Heqing Lian ˙

ID ,Member, IEEE

Abstract—Magnetic resonance imaging plays an essential role

in clinical diagnosis by acquiring the structural information

of biological tissue. Recently, many multi-contrast MRI super-

resolution networks achieve good effects. However, most studies

ignore the impact of the inappropriate foreground scale and

patch size of multi-contrast MRI, which probably leads to

inappropriate feature alignment. To tackle this problem, we pro-

pose the Flexible Alignment Super-Resolution Network (FASR-

Net) for multi-contrast MRI Super-Resolution. The Flexible

Alignment module of FASR-Net consists of two modules for

feature alignment. (1) The Single-Multi Pyramid Alignment(S-

A) module solves the situation where low-resolution (LR) im-

ages and reference (Ref) images have different scales. (2) The

Multi-Multi Pyramid Alignment(M-A) module solves the situa-

tion where LR and Ref images have the same scale. Besides,

we propose the Cross-Hierarchical Progressive Fusion (CHPF)

module aiming at fusing the features effectively, further im-

proving the image quality. Compared with other state-of-the-

art methods, FASR-net achieves the most competitive results

on FastMRI and IXI datasets. Our code will be available at

https://github.com/yimingliu123/FASR-Net.

Index Terms—Magnetic resonance imaging, Multi-Contrast

Super-Resolution, feature fusion, feature alignment.

I. INTRODUCTION

Magnetic Resonance Imaging (MRI) is a non-invasive imag-

ing modality that enables the observation of three-dimensional

detailed anatomical images and plays a signiﬁcant role in pro-

viding clear information about soft tissue structure. However,

during acquiring magnetic resonance images, patients have to

endure physical and psychological discomfort, including irri-

tating noise and acute anxiety. To make the patient feel cozier,

technically, it will reduce the retention time that patients stay

in the strong magnetic ﬁeld at the expense of image quality.

Super-Resolution reconstruction technology can improve the

image quality without changing the hardware settings, which is

extensively utilized as the post-processing tool to overcome the

difﬁculty in obtaining high-resolution (HR) MRI scans [1],[2].

This work was supported by the China National Funds for Distinguished

Young Scientists under Grant 81601485. (Yiming Liu and Mengxi Zhang

contributed equally to this work. Corresponding authors: Jie Chen; Heqing

Lian.)

Yiming Liu and Heqing Lian are with the Xiao Ying AI Lab, Beijing,

China. ( liuyiming@xiaoyingai.com, lianheqing@xiaoyingai.com).

Mengxi Zhang is with the School of Electrical and Information Engineering,

Tianjin University, Tianjin, China. (mengxizhang@tju.edu.cn).

Bo Jiang, Dan Liu and Bo Hou are with peking union medical

college hospital, Beijing, China. (jbpumch@163.com, liud2104@163.com,

houbo97@pumch.cn)

Jie Chen is with the School of Electronics Engineering and Computer

Science, Peking University, China. (jiechen2019@pku.edu.cn)

The research on image SR is divided into two

categories: single-image Super-Resolution (SISR) and

reference-based image Super-Resolution (Ref-SR).

SISR [3], [4], [5], [6], [7], [8], [9] only adopts a single

low-resolution (LR) image to recover HR images, which

highly depends on the prior knowledge of training sets.

Some transformer-based methods achieve fantastic results

in SISR, such as [7], [10]. Since the domain shift between

training sets and test sets and the complementary details are

inferred by training sets, as a result, SISR often produces

blurry effects because the HR textural features cannot be

effectively recovered in the reconstruction process. The

texture information of medical images is crucial evidence for

doctors’ diagnoses. Therefore, SISR is not suitable for the

medical image Super-Resolution.

T2-Weighted Images PD-Weighted Images T2-Weighted Images PD-Weighted Images

Cross-Attention Method Flexible Align Method

(a)

(b)

(c)

Fig. 1. The comparison between Cross Attention method (CA) and our

Flexible Align method (FA). FA contains two parts: S-A module and M-

A module. Speciﬁcally, when the scale of LR and Ref images is different,

S-A module will adjust the mismatch produced by CA, as shown in (a). M-A

module solves the mismatch caused by the inappropriate patch size. If the

patch size is too large, the similarity between patches will be dominated by

background noise, which is illustrated in (b). When the patch size is too small,

the insufﬁcient semantics will lead to mismatch, which is denoted in (c).

Ref-SR [11], [12], [13], [14], [15], [16] adopts an additional

high-resolution reference image as an auxiliary which transfers

HR textures and details to the LR image in the process of SR.

In clinical settings, MRI generates multi-contrast images for

diagnosis together. Due to different settings, the appearances

and functions of these images are widely divergent. However,

these images can be used as complementary information for

arXiv:2210.03460v2 [eess.IV] 8 Jan 2023

diagnosing the same anatomical structure. Generally, T1, T2,

PD, and FS-PD weighted images are produced together in

the acquisition of MRI. Clinically, PD-weighted images have

shorter repetition and echo time than T2 weighted images [17].

Inspired by this, some Ref-SR based methods leverage the HR

PD-weighted images to recover the HR T2 images from the LR

T2 images. For Ref-SR, some researches roughly complement

LR image features with Ref image features through the plus

or concatenation operation, where the improvement of the LR

images quality is limited. Subsequently, a series of methods

adopt deformable convolution. [18], [14]to fuse the Ref

image features and LR image features. Existing state-of-the-art

(SOTA) feature fusion methods subsequently unfold the image

into patches and adopt Transformer-based cross-attention (CA)

mechanism [14], [15] to calculate the correlation between

patches of LR and Ref images. These methods have veriﬁed

that the feature alignment, which means matching valuable

information of the patch from Ref images to the corresponding

LR images, strongly impacts the reconstruction of HR images.

Distinct from natural images, the color of MRI is sole and

the object boundary is more ambiguous. Relevant experiments

indicate that despite the existence of authentic high-frequency

details in Ref images, the network cannot completely trans-

form these details into HR images. We divide the MRI images

into two parts: foreground and background. Speciﬁcally, the

foreground contains some concerning tissue and texture which

are important in SR. The background consists of some less

important regions, such as the black region and the skeleton

where the pixel information is nearly equal. Theoretically,

the cross-attention (CA) methods only consider the search

for the most relevant regions but ignore the variety of the

scale of foreground. Through a large amount of experiments

and observations, we ﬁnd that the ﬂexibility of patches has

a signiﬁcant effect on the feature alignment. Speciﬁcally,

consider two cases:

1) When the LR and Ref image foregrounds are of different

scales, the patch will contain different areas of the foreground.

However, the foregrounds of the LR and Ref images have

the same semantic information. The afﬁnity of the semantic

information will lead to the mismatch between LR and Ref

image patches, as illustrated in Fig. 1(a).

2) Assumed that the foreground scales of LR and Ref

images are the same, the cross-attention (CA) method ignores

the harmony between the patch size and the scale of the

foreground. The ﬁxed patch size barely adapts to the various

scale of foregrounds. Therefore, the patch size is hard to ﬁt the

foreground scale. For example, if the size of the foreground is

smaller than patch size, the patch will contain massive amounts

of information, further interfering with the calculation of the

correlation matrix, as illustrated in Fig. 1(b). If the patch size is

too small, different patches will mismatch due to the similarity

of local features, as illustrated in Fig. 1(c).

In fact, scale diversity has been shown to be important

in feature expression [19] and image restoration [20], [21].

Small-scale features can provide more complete semantic

features, while large-scale feature maps can provide texture

details. Based on this consideration, the core concept of the

patch size can be illustrated as follows. (1) The patch should

contain sufﬁcient foreground information which contributes

to the alignment. (2) In the meantime, disturbed background

information is not expected too much in patch. To meet these

demands, the receptive ﬁelds of patch should be adjustable.

Therefore, we propose the Flexible Alignment (FA) module

aiming at generating various patch size and receptive ﬁeld

to improve the precision of feature alignment. Speciﬁcally,

FA contains the Single-Multi Pyramid Alignment module (S-

A) and the Multi-Multi Pyramid Alignment module (M-A)

which respectively serves for the case I and case II. S-A

leverages various receptive ﬁeld to ensure the completeness of

foreground information. M-A dynamically adjusts the patch

size of LR and Ref images to escape from the inﬂuence

of background. Additionally, we fuse the multi-scale fea-

tures with the Cross-Hierarchical Progressive Fusion (CHPF)

module, further improving the image quality. Furthermore,

fourier loss function is introduced to optimize the model. Our

contributions can be summarized as follows:

•We propose the FASR-Net to transform the textural

information of high-resolution PD images into low-resolution

T2 images and make the texture more realistic.

•Our model jointly combines the Multi-Multi Pyramid

Alignment module (M-A) and the Single-Multi Pyramid

Alignment module (S-A) to endow feature alignment with

ﬂexibility.

•We introduce an effective feature fusion backbone Cross-

Hierarchical Progressive Fusion (CHPF) which takes advan-

tage of textural information and details of multi-scale features.

Our code will be available at FASR-Net.

II. RELATED WORK

A. Single Image Super-Resolution

For the past few years, deep learning-based SISR meth-

ods have performed amazing performances. Some Coarse-

to-Fine works [20], [21] have attractive results. Cai et

al. [20] proposed a novel Transformer-based method, coarse-

to-ﬁne sparse Transformer (CST). Speciﬁcally, CST uses

spectraaware screening mechanism (SASM) for coarse patch

selecting. Then the selected patches are fed into spectra-

aggregation hashing multi-head self-attention (SAH-MSA) for

ﬁne pixel clustering and self similarity capturing. Liang et

al. [21] focus on speeding-up the high-resolution photore-

alistic I2IT tasks based on closed-form Laplacian pyramid

decomposition and reconstruction. Except for Coarse-to-Fine

method, Lu et al. propose SRCNN [4], which introduces

deep convolutional neural networks in the ﬁeld of image

Super-Resolution. Thereafter, residual blocks [6] and atten-

tion mechanisms [22], [5], [3] are introduced to deepen

the network. However, these approaches improve the image

quality restrictedly since the difﬁculty in recovering high-

frequency details. Christian et al. [23] ﬁrstly adopt Generative

Adversarial Network (GAN) in SR tasks. According to Chris-

tian, minimizing the mean squared loss (MSE) often lacks

high-frequency details. Thus, [23] utilizes the perceptual loss

which consists of content loss and adversarial loss. Wang et

al. [24] further propose enhanced generator and discriminator

obtaining more perceptually competitive results. Yan et al. [25]

Cross-Hierarchical

Progressive Fusion

Module (CHPF)

Texture

Extractor

Texture

Extractor

Texture

Extractor

Unfold

Single-Multi Pyramid

Alignment Module (S-A)

Multi -Multi Pyramid

Alignment Module (M-A)

Flexible Alignment ModuleShallow Feature Extractor

Fig. 2. The overview of RCFA-Net. RCFA-Net is composed of three parts: the shallow feature extractor, the Flexible Alignment module, and the Cross-

Hierarchical Progressive Fusion module. Qn×,Kn×, and Vn×are multi-scale shallow features extracted from ILR↑,IRef ↓↑, and IRef . The S-A and M-A

are two parts of the Flexible Alignment module introduced in Section III C. FSA and FM A are respectively aligned features produced by S-A and M-A

module. After concatenating the features, the Cross-Hierarchical Progressive Fusion module is utilized to reconstruct the ISR.

adopt FASRGAN to discriminate each pixel of real and fake

images. Knowledge distillation framework is also introduced

in SR tasks, such as [8], [9]. Recently, some Transformer-

based networks have been applied to SISR tasks [7], [10].

ESRT [7] adopts a high-preserving block and lightweight

transformer backbone, achieving satisfying results with low

computational cost. SwinIR [10] adopts shifted-window-based

self-attention mechanism in Swin Transformer [26].

Although SISR approaches achieve marvelous results in

the natural image domain, these methods are not suitable for

medical images. The details of HR medical images, which are

signiﬁcant for diagnosis, are generated by networks but are not

authentic. Therefore, the reasonable method for MRI Super-

Resolution is complementing high-frequency information from

additional HR images. Therefore, Ref-SR methods are prior to

achieving believing methods for medical images.

B. Reference-Based Image Super-Resolution

Ref-SR adopts an additional high-resolution reference im-

age to resolve low-resolution image. Compared with SISR,

Ref-SR is more likely to harvest accurate textural information.

One branch of Ref-SR methods is to align LR and Ref images.

CrossNet [11] adopts an end-to-end and fully-convolutional

neural network with the optical ﬂow estimator to align Ref and

LR images. However, this approach depends on the alignment

of Ref and LR images to a great extent. Additionally, the

utilization of optical ﬂow neglects the long-range dependen-

cies. SSEN [12] introduces a stack of deformable convolu-

tion [27] layers, enlarging the receptive ﬁeld of Ref images.

C2-matching [13] introduces the contrastive correspondence

network and teacher-student correlation distillation to align

images on pixel level. However, because the restoration mod-

ule of C2-matching contains only simple residual blocks,

the misalignment between images will drastically destroy the

performance of this method.

Another mainstream of Ref-SR approaches is based on

patch matching [28]. SRNTT [14] adopts cross-attention

mechanism to achieve patch matching, which endows LR im-

ages with HR details by transferring textural information from

Ref images according to the correlation. Further, TTSR [15]

retains the idea of cross-attention and introduces soft-attention

module which subsequently computes the relevance between

original and swapped features and feeds all swapped features

with different weights into the main network. MASA [16]

takes the potential enormous difference, such as color and

luminance distribution, into consideration and reduces the

computational cost. Cao et al. [29] combine deformable at-

tention with cross-attention mechanism, further improving the

performance of Res-SR in exchange for the sacriﬁce of the

computational cost.

III. METHOD

A. Overview

The Ref-SR methods aiming at transforming details of high-

resolution reference (Ref) images into low-resolution (LR)

images have achieved fantastic results recently. The details

recovered by the Ref-SR methods are more reliable than the

SISR methods. Clinically, to observe a tissue extensively, a se-

ries of multi-contrast images will be produced together during

the acquisition of MRI. Therefore, an intuitive thought is that

the low-cost images (PD) can be used as references to offer

helpful detail information for high-cost images (T2). Based on

this consideration, we propose a novel Multi-Contrast Flexible

Alignment Super-Resolution Network (FASR-Net) for MRI.

The architecture of FASR-Net is shown in Fig. 2.ILR↑and

IRef represent upsampled T2 images and PD images respec-

tively. We sequentially apply downsampling and upsampling

with the same factor 4×on PD images to obtain IRef↓↑.

Functionally, the FASR-Net can be roughly divided into

three parts: the shallow feature extractor, the Flexible Align-

ment (FA) module and the Cross-Hierarchical Progressive

Fusion (CHPF) module. Speciﬁcally, the shallow feature ex-

tractor aims at obtaining robust semantic features. FA module

which is composed of M-A and S-A serves for feature align-

ment. After the feature alignment, we leverage CHPF to fuse

the aligned features.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1FlexibleAlignmentSuper-ResolutionNetworkforMulti-ContrastMRIYimingLiu_ID,Member,IEEE,MengxiZhang_ID,BoJiang,BoHou,DanLiu,JieChen,Member,IEEE,HeqingLian_ID,Member,IEEEAbstractMagneticresonanceimagingplaysanessentialroleinclinicaldiagnosisbyacquiringthestructuralinformationofbiologicaltissue.Recentl...

展开>> 收起<<

1 Flexible Alignment Super-Resolution Network for Multi-Contrast MRI.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Flexible Alignment Super-Resolution Network for Multi-Contrast MRI

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: