
Jointly Resampling and Reconstructing Corrupted Images
for Image Classification using
Frequency-Selective Mesh-to-Grid Resampling
Viktoria Heimann, Andreas Spruck, and Andr´
e Kaup
Multimedia Communications and Signal Processing
Friedrich-Alexander-Universit¨
at Erlangen-N¨
urnberg
Cauerstraße 7, 91058 Erlangen
Email: {viktoria.heimann, andreas.spruck, andre.kaup}@fau.de
©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
Abstract—Neural networks became the standard technique for image
classification throughout the last years. They are extracting image features
from a large number of images in a training phase. In a following test
phase, the network is applied to the problem it was trained for and its
performance is measured. In this paper, we focus on image classification.
The amount of visual data that is interpreted by neural networks grows
with the increasing usage of neural networks. Mostly, the visual data
is transmitted from the application side to a central server where
the interpretation is conducted. If the transmission is disturbed, losses
occur in the transmitted images. These losses have to be reconstructed
using postprocessing. In this paper, we incorporate the widely applied
bilinear and bicubic interpolation and the high-quality reconstruction
Frequency-Selective Reconstruction (FSR) for the reconstruction of cor-
rupted images. However, we propose to use Frequency-Selective Mesh-
to-Grid Resampling (FSMR) for the joint reconstruction and resizing of
corrupted images. The performance in terms of classification accuracy of
EfficientNetB0, DenseNet121, DenseNet201, ResNet50 and ResNet152 is
examined. Results show that the reconstruction with FSMR leads to the
highest classification accuracy for most networks. Average improvements
of up to 6.7 percentage points are possible for DenseNet121.
I. INTRODUCTION
Neural networks are widely applied during the last years for
image processing. They are especially used for interpreting images.
A task that is easy for humans but difficult for machines. Neural
networks are able to extract features from image content. They do
not require the image content to be described analytically. They learn
to extract the important image features by training. In the training
phase, a large number of images is presented to the network. Thus,
if only a small number of images is available, techniques such as
data augmentation have to be applied in order to create a larger
number of images for training. Data augmentation refers to the task
of artificially creating additional data with new characteristics from
the already available data. Typical techniques for data augmentation
are geometric transformations such as zooming and rotating input
images. Furthermore, most of the networks demand a fixed input size
and thus, the input images have to be resized. Resizing on pixel level
means, that pixels are transformed to a new position. Thereby, the
resized pixels are mostly located on mesh positions that are not fixed
to an integer grid. An example on pixel level is shown in Fig. 1a.
The red circles denote the resized pixels on mesh positions, whereas
the black dots denote the pixel grid. The dashed lines support the
presentation of the pixel grid. For a further processing, the resized
pixels at mesh positions have to be resampled to the pixel grid.
Throughout the last years, efficient network architectures were
developed [1]–[4]. They provide good results for image classification.
(a) Original points after resizing
on mesh points. Resampling onto
pixel grid necessary.
(b) BLOCK loss covers the cen-
ter (gray). No original points in
this area. Pixel grid must be re-
constructed.
Fig. 1: Original data given as red circles. The full pixel grid as goal
of resizing and reconstruction given as black dots.
With the increasing usage of neural networks and the production of
a growing number of images and videos, an increasing amount of
visual data is interpreted by machines. Typically, the interpretation is
conducted after transmitting the data to a central server in both, cloud
and edge computing scenarios [5], [6]. For transmission, the visual
content has to be compressed. Current compression schemes typically
separate the images into smaller blocks [7], [8]. If the transmission
is disturbed, only a subset of blocks can be transmitted properly.
The remaining image parts have to be filled using reconstruction
algorithms. The scenario is demonstrated on pixel level in Figure 1b.
As in Figure 1a, the black dots denote the pixel grid. The gray
colored area in the middle represents a BLOCK loss. The original
data is given as red circles outside the gray area. The information
for the black dots in the gray backed area has to be reconstructed
from the red data points. If such an image should be analyzed by
a neural network, the loss in the image has to be reconstructed
first. Thereafter, the image is resized to the expected input size of
the network. Reconstructing missing parts of images was mainly
investigated for humans as observers and thus, the algorithms are
optimized mainly in terms of Peak-Signal-to-Noise-Ratio (PSNR) and
Structural Similarity Index Measurement (SSIM) [9], [10]. A high-
quality reconstruction technique is Frequency-Selective Reconstruc-
tion (FSR) [9]. For the resizing step, a classical interpolation such
as bilinear or bicubic interpolation can typically be incorporated.
arXiv:2210.15444v3 [eess.IV] 27 Nov 2023