Single Image Super-Resolution via a Dual Interactive Implicit Neural Network Quan H. Nguyen and William J. Beksi The University of Texas at Arlington

2025-05-03 0 0 3.31MB 10 页 10玖币

侵权投诉

Single Image Super-Resolution via a Dual Interactive Implicit Neural Network

Quan H. Nguyen and William J. Beksi

The University of Texas at Arlington

Arlington, TX, USA

quan.nguyen4@mavs.uta.edu, william.beksi@uta.edu

Abstract

In this paper, we introduce a novel implicit neural net-

work for the task of single image super-resolution at arbi-

trary scale factors. To do this, we represent an image as a

decoding function that maps locations in the image along

with their associated features to their reciprocal pixel at-

tributes. Since the pixel locations are continuous in this

representation, our method can refer to any location in an

image of varying resolution. To retrieve an image of a par-

ticular resolution, we apply a decoding function to a grid of

locations each of which refers to the center of a pixel in the

output image. In contrast to other techniques, our dual in-

teractive neural network decouples content and positional

features. As a result, we obtain a fully implicit represen-

tation of the image that solves the super-resolution prob-

lem at (real-valued) elective scales using a single model.

We demonstrate the efﬁcacy and ﬂexibility of our approach

against the state of the art on publicly available benchmark

datasets.

1. Introduction

Single image super-resolution (SISR) is a fundamental

low-level computer vision problem that aims to recover a

high-resolution (HR) image from its low-resolution (LR)

counterpart. There are two main reasons for performing

SISR: (i) to enhance the visual quality of an image for hu-

man consumption, and (ii) to improve the representation of

an image for machine perception. SISR has many practi-

cal applications including robotics, remote sensing, satel-

lite imaging, thermal imaging, medical imaging, and much

more [40, 39]. Despite being a challenging and ill-posed

subject, SISR has remained a crucial area of study in the

research community.

Recent deep learning approaches have provided high-

quality SISR results [3, 37]. In perception systems, images

are represented as 2D arrays of pixels whose quality, sharp-

ness, and memory footprint are controlled by the resolution

of the image. Consequently, the scale of the generated HR

Implicit

Neural

Representation

40 x 40

80 x 80 100 x 100 120 x 120

Figure 1: Our proposed dual interactive implicit neural net-

work (DIINN) is capable of producing images of arbitrary

resolution, using a single trained model, by capturing the

underlying implicit representation of the input image.

image is ﬁxed depending on the training data. For exam-

ple, if a neural network is trained to recover HR images of

×2scale then it only performs well on what it is trained for

(i.e., performance will be poor on ×3,×4, or other scales).

Thus, instead of training multiple models for various reso-

lutions, it can be extremely useful in terms of practicality to

have a single SISR architecture that handles arbitrary scale

factors. This is especially true for embedded vision plat-

forms (e.g., unmanned ground/aerial vehicles) with multi-

ple on-board cameras that must execute difﬁcult tasks using

limited computational resources.

The notion of an implicit neural representation, also

known as coordinate-based representation, is an active ﬁeld

of research that has yielded substantial results in modeling

3D shapes [2, 5, 18, 30, 31]. Inspired by these successes,

learning implicit neural representations of 2D images is a

natural solution to the SISR problem since an implicit sys-

tem can produce output at arbitrary resolutions. While this

idea has been touched upon in several works [4, 33, 6, 26],

in this paper we propose a more expressive neural network

for SISR with signiﬁcant improvements over the existing

arXiv:2210.12593v1 [cs.CV] 23 Oct 2022

state of the art, Figure 1. Our contributions are summarized

as follows.

• We develop a novel dual interactive implicit neural net-

work (DIINN) for SISR that handles image content

features in a modulation branch and positional features

in a synthesis branch, while allowing for interactions

between the two.

• We learn an implicit neural network with a pixel-level

representation, which allows for locally continuous

super-resolution synthesis with respect to the nearest

LR pixel.

• We demonstrate the effectiveness of our proposed net-

work by setting new benchmarks on public datasets.

Our source code is available at [10]. The remainder of

this paper is organized as follows. Related research is dis-

cussed in Section 2. In Section 3, we present a detailed de-

scription of our model for SISR at arbitrary scales using an

implicit representation. Experimental results are presented

in Section 4. The paper concludes in Section 5 and dis-

cusses future work.

2. Related Work

This section highlights pertinent literature on the task of

SISR. First, we discuss deep learning techniques for SISR.

Then, we provide an overview of implicit neural represen-

tations. Lastly, we cite the nascent domain of implicit rep-

resentations for images. The SISR problem is an ill-deﬁned

problem in the sense that there are many possible HR im-

ages that can be downsampled to a single LR image. In this

work, we focus on learning deterministic mappings, rather

than stochastic mappings (i.e., generative models). In gen-

eral, the input to an SISR system is an LR image, and the

output is a super-resolved (SR) image that may or may not

have the same resolution as a target HR image.

2.1. Deep Learning for SISR

Existing work on SISR typically utilizes convolutional

neural networks (CNNs) coupled with upsampling opera-

tors to increase the resolution of the input image.

2.1.1 Upscaling +Reﬁnement

SRCNN [11], VDSR [20], and DRCN [21] ﬁrst interpolate

an LR image to a desired resolution using bicubic interpola-

tion, followed by a CNN-based neural network to enhance

the interpolated image and produce an SR image. The re-

ﬁning network acts as a nonlinear mapping, which aims to

improve the quality of the interpolation. These methods can

produce SR images at arbitrary scales, but the performance

is severely affected by the noise introduced during the inter-

polation process. The reﬁning CNNs also have to operate at

the desired resolution, thus leading to a longer runtime.

2.1.2 Learning Features +Upscaling

Methods following this approach ﬁrst feed an LR image

through a CNN to obtain a deep feature map at the same

resolution. In this way, the CNNs are of cheaper cost since

they are applied at LR, which allows for deeper architec-

tures. Next, an upscaling operator is used to produce an SR

image. The most common upscaling operators are decon-

volution (FSRCNN [12], DBPN [14]), and sub-pixel con-

volution (ESPCN [32], EDSR [23]). It is also possible to

perform many iterations of learning features +upscaling

and explicitly exploit the relationship between intermediate

representations [14]. These methods only work with integer

scale factors and produce ﬁxed-sized outputs.

EDSR [23] attempts to mitigate these problems by train-

ing a separate upscaling head for each scale factor. On the

other hand, Meta-SR [15] is among the ﬁrst attempts to

solve SISR at arbitrary real-valued scale factors via a soft

version of the sub-pixel convolution. To predict the sig-

nal at each pixel in the SR image, Meta-SR uses a meta-

network to determine the weights for features of a (3 ×3)

window around the nearest pixel in the LR image. Effec-

tively, each channel of the predicted pixel in the SR image

is a weighted sum of a (C×3×3) volume, where Cis the

number of channels in the deep feature map. While Meta-

SR has a limited generalization capability to scale factors

larger than its training scales, it can be viewed as a hybrid

implicit/explicit model.

2.2. Implicit Neural Representations

Implicit neural representations are an elegant way to pa-

rameterize signals continuously in comparison to conven-

tional representations, which are usually discrete. Chen et

al. [7], Mescheder et al. [27], and Park et al. [28] are among

the ﬁrst to show that implicit neural representations outper-

form 3D representations (e.g., meshes, voxels, and point

clouds) in 3D modeling. Many works that achieve state-

of-the-art results in 3D computer vision have followed. For

example, Chabra et al. [5] learned local shape priors for the

reconstruction of 3D surfaces coupled with a deep signed

distance function. A new implicit representation for 3D

shape learning called a neural distance ﬁeld was proposed

by Chibane et al. [9]. Jiang et al. [18] leveraged voxel

representations to enable implicit functions to ﬁt large 3D

scenes, and Peng et al. [30] increased the expressiveness of

3D scenes with various convolutional models. It also is pos-

sible to condition the implicit neural representations on the

input signals [5, 8, 18, 30], which can be considered as a

hybrid implicit/explicit model.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SingleImageSuper-ResolutionviaaDualInteractiveImplicitNeuralNetworkQuanH.NguyenandWilliamJ.BeksiTheUniversityofTexasatArlingtonArlington,TX,USAquan.nguyen4@mavs.uta.edu,william.beksi@uta.eduAbstractInthispaper,weintroduceanovelimplicitneuralnet-workforthetaskofsingleimagesuper-resolutionatarbi-trary...

展开>> 收起<<

Single Image Super-Resolution via a Dual Interactive Implicit Neural Network Quan H. Nguyen and William J. Beksi The University of Texas at Arlington.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Single Image Super-Resolution via a Dual Interactive Implicit Neural Network Quan H. Nguyen and William J. Beksi The University of Texas at Arlington

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: