state of the art, Figure 1. Our contributions are summarized
as follows.
• We develop a novel dual interactive implicit neural net-
work (DIINN) for SISR that handles image content
features in a modulation branch and positional features
in a synthesis branch, while allowing for interactions
between the two.
• We learn an implicit neural network with a pixel-level
representation, which allows for locally continuous
super-resolution synthesis with respect to the nearest
LR pixel.
• We demonstrate the effectiveness of our proposed net-
work by setting new benchmarks on public datasets.
Our source code is available at [10]. The remainder of
this paper is organized as follows. Related research is dis-
cussed in Section 2. In Section 3, we present a detailed de-
scription of our model for SISR at arbitrary scales using an
implicit representation. Experimental results are presented
in Section 4. The paper concludes in Section 5 and dis-
cusses future work.
2. Related Work
This section highlights pertinent literature on the task of
SISR. First, we discuss deep learning techniques for SISR.
Then, we provide an overview of implicit neural represen-
tations. Lastly, we cite the nascent domain of implicit rep-
resentations for images. The SISR problem is an ill-defined
problem in the sense that there are many possible HR im-
ages that can be downsampled to a single LR image. In this
work, we focus on learning deterministic mappings, rather
than stochastic mappings (i.e., generative models). In gen-
eral, the input to an SISR system is an LR image, and the
output is a super-resolved (SR) image that may or may not
have the same resolution as a target HR image.
2.1. Deep Learning for SISR
Existing work on SISR typically utilizes convolutional
neural networks (CNNs) coupled with upsampling opera-
tors to increase the resolution of the input image.
2.1.1 Upscaling +Refinement
SRCNN [11], VDSR [20], and DRCN [21] first interpolate
an LR image to a desired resolution using bicubic interpola-
tion, followed by a CNN-based neural network to enhance
the interpolated image and produce an SR image. The re-
fining network acts as a nonlinear mapping, which aims to
improve the quality of the interpolation. These methods can
produce SR images at arbitrary scales, but the performance
is severely affected by the noise introduced during the inter-
polation process. The refining CNNs also have to operate at
the desired resolution, thus leading to a longer runtime.
2.1.2 Learning Features +Upscaling
Methods following this approach first feed an LR image
through a CNN to obtain a deep feature map at the same
resolution. In this way, the CNNs are of cheaper cost since
they are applied at LR, which allows for deeper architec-
tures. Next, an upscaling operator is used to produce an SR
image. The most common upscaling operators are decon-
volution (FSRCNN [12], DBPN [14]), and sub-pixel con-
volution (ESPCN [32], EDSR [23]). It is also possible to
perform many iterations of learning features +upscaling
and explicitly exploit the relationship between intermediate
representations [14]. These methods only work with integer
scale factors and produce fixed-sized outputs.
EDSR [23] attempts to mitigate these problems by train-
ing a separate upscaling head for each scale factor. On the
other hand, Meta-SR [15] is among the first attempts to
solve SISR at arbitrary real-valued scale factors via a soft
version of the sub-pixel convolution. To predict the sig-
nal at each pixel in the SR image, Meta-SR uses a meta-
network to determine the weights for features of a (3 ×3)
window around the nearest pixel in the LR image. Effec-
tively, each channel of the predicted pixel in the SR image
is a weighted sum of a (C×3×3) volume, where Cis the
number of channels in the deep feature map. While Meta-
SR has a limited generalization capability to scale factors
larger than its training scales, it can be viewed as a hybrid
implicit/explicit model.
2.2. Implicit Neural Representations
Implicit neural representations are an elegant way to pa-
rameterize signals continuously in comparison to conven-
tional representations, which are usually discrete. Chen et
al. [7], Mescheder et al. [27], and Park et al. [28] are among
the first to show that implicit neural representations outper-
form 3D representations (e.g., meshes, voxels, and point
clouds) in 3D modeling. Many works that achieve state-
of-the-art results in 3D computer vision have followed. For
example, Chabra et al. [5] learned local shape priors for the
reconstruction of 3D surfaces coupled with a deep signed
distance function. A new implicit representation for 3D
shape learning called a neural distance field was proposed
by Chibane et al. [9]. Jiang et al. [18] leveraged voxel
representations to enable implicit functions to fit large 3D
scenes, and Peng et al. [30] increased the expressiveness of
3D scenes with various convolutional models. It also is pos-
sible to condition the implicit neural representations on the
input signals [5, 8, 18, 30], which can be considered as a
hybrid implicit/explicit model.
2