in the following steps. The main contributions of this paper
can be summarized as follows:
• We present a novel FSR framework based on the recur-
rent network that takes full advantage of multi-scale
facial prior information (i.e. landmarks and AUs) to
generate realistic HR face images.
• Global shape information and local texture informa-
tion are embedded into a recurrent network progres-
sively. And we introduced AU classification results as
a novel quantitative metric for facial details restoration
of FSR.
• Extensive qualitative and quantitative experiments
demonstrate that, compared with similar state-of-the-
art methods, our proposed framework achieves supe-
rior results in terms of both image quality and facial
details restoration.
2. Related Work
2.1. Single Image Super-Resolution
FSR is a special case of single image super-resolution
(SISR). Recently, due to the excellent learning ability, deep
convolutional neural networks have demonstrated high su-
periority on SISR tasks. Dong et al. [7] firstly presented
the SRCNN for SISR and achieved promising performance
against traditional methods. Inspired by this pioneering
work, many deep network based SISR methods have been
proposed. Kim et al. [15] designed the VDSR network with
more convolutional layers based on residual learning [11].
Ledig et al. [19] proposed SRGAN for generating photo-
realistic images based on generative adversarial network
(GAN) [9]. Shi and Liu [37] proposed DPA-Net for in-
fant fingerprint super-resolution and enhancement. Some
attention-based methods [46, 6, 22] are also proposed to
further improve the SR performance. However, most of the
above methods have a deep network and hold a lot of pa-
rameters, which may suffer from overfitting. To gain better
generalization capability without introducing overwhelm-
ing parameters, the recurrent structure has also been em-
ployed for SISR. Kim et al. [16] firstly introduced recursive
learning in DRCN for parameter sharing. Later, Tai et al.
designed a recursive block with enhanced residual units in
DRRN [38] and memory blocks with the recursive unit and
gate unit in MemNet [39]. Han et al. [10] presented DSRN
considering a dual-state design to exploit features from both
LR and HR states for final predictions. Li et al. [21] devel-
oped a novel feedback block consisting of up- and down-
sampling layers with dense skip connections in SRFBN.
These SISR methods are designed for general images and
most of them only handle up to 4× super-resolution. They
fail to restore the details well for face images, especially
when the scaling factor is large (e.g. 8 ×).
2.2. Face Super-Resolution
Since the concept of face hallucination was first pro-
posed by Baker and Kanade [1], many methods were pro-
posed to improve the FSR performance, especially with
the development of deep learning. Yu et al. [43] pro-
posed a GAN-based network URDGN to super-resolve very
low-resolution face images. Huang et al. [12] presented
a wavelet-based method to transform the FSR problem to
wavelet coefficients prediction task. Cao et al. [4] proposed
Attention-FH using reinforcement learning to discover at-
tended patches and then enhance the facial part sequentially.
Dou et al. [8] introduced the incremental orthogonal pro-
jection discrimination in the principal component analysis
subspace to enhance the FSR task. Lu et al. [24] designed
a SISN to reconstruct photorealistic high-resolution facial
images by fusing the features from two paths.
Compared with general images, the face images have
unique prior information which could be utilized. Chen
et al. [5] introduced geometry priors including landmark
heatmaps and parsing maps. Kim et al. [14] used landmark
heatmaps to design an attention loss. Ma et al. [26] intro-
duced a recurrent network to face SR and designed a deep
iterative collaboration framework to optimize face recovery
and landmark estimation alternatively. In addition, facial at-
tributes, such as age and gender, are also usually employed
in some FSR methods [25, 42, 41]. However, most of the
existing methods explore only single-level prior informa-
tion and pay more attention to global shape and structure
information, but less to local texture information.
3. Proposed Method
In this section, we propose a multi-scale prior infor-
mation embedded recursive network (MPENet) for FSR,
which can be unfolded as shown in Figure 1 and consists
of three branches: face super-resolution branch, face align-
ment branch, and AU detection branch. In addition, to gen-
erate photo-realistic face images, we use MPENet as a gen-
erator network Gand introduce a discriminator network D
to build our generative adversarial model MPEGAN.
3.1. Network Architecture
Face Super-Resolution Branch: As shown in Figure 1,
our proposed MPENet can be unfolded into 3 iterations.
The face super-resolution branch in each iteration contains
three parts: a shallow feature extractor (SFE), a prior em-
bedded recurrent block (PERB), and a high resolution re-
construction block (RECB). Given a low-resolution (LR)
input ILR, we use a 3×3 convolutional layer and a pixel
shuffle layer to extract shallow feature Ft
sf at t-th iteration
as:
Ft
sf =HSF E (ILR),(1)
where HSF E (·)denotes the operations of the SFE. Then,
we use a 1×1 and a 3×3 convolutional layer to fuse shal-