Jointly Resampling and Reconstructing Corrupted Images for Image Classification using Frequency-Selective Mesh-to-Grid Resampling

2025-05-05 0 0 752.11KB 7 页 10玖币
侵权投诉
Jointly Resampling and Reconstructing Corrupted Images
for Image Classification using
Frequency-Selective Mesh-to-Grid Resampling
Viktoria Heimann, Andreas Spruck, and Andr´
e Kaup
Multimedia Communications and Signal Processing
Friedrich-Alexander-Universit¨
at Erlangen-N¨
urnberg
Cauerstraße 7, 91058 Erlangen
Email: {viktoria.heimann, andreas.spruck, andre.kaup}@fau.de
©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works.
Abstract—Neural networks became the standard technique for image
classification throughout the last years. They are extracting image features
from a large number of images in a training phase. In a following test
phase, the network is applied to the problem it was trained for and its
performance is measured. In this paper, we focus on image classification.
The amount of visual data that is interpreted by neural networks grows
with the increasing usage of neural networks. Mostly, the visual data
is transmitted from the application side to a central server where
the interpretation is conducted. If the transmission is disturbed, losses
occur in the transmitted images. These losses have to be reconstructed
using postprocessing. In this paper, we incorporate the widely applied
bilinear and bicubic interpolation and the high-quality reconstruction
Frequency-Selective Reconstruction (FSR) for the reconstruction of cor-
rupted images. However, we propose to use Frequency-Selective Mesh-
to-Grid Resampling (FSMR) for the joint reconstruction and resizing of
corrupted images. The performance in terms of classification accuracy of
EfficientNetB0, DenseNet121, DenseNet201, ResNet50 and ResNet152 is
examined. Results show that the reconstruction with FSMR leads to the
highest classification accuracy for most networks. Average improvements
of up to 6.7 percentage points are possible for DenseNet121.
I. INTRODUCTION
Neural networks are widely applied during the last years for
image processing. They are especially used for interpreting images.
A task that is easy for humans but difficult for machines. Neural
networks are able to extract features from image content. They do
not require the image content to be described analytically. They learn
to extract the important image features by training. In the training
phase, a large number of images is presented to the network. Thus,
if only a small number of images is available, techniques such as
data augmentation have to be applied in order to create a larger
number of images for training. Data augmentation refers to the task
of artificially creating additional data with new characteristics from
the already available data. Typical techniques for data augmentation
are geometric transformations such as zooming and rotating input
images. Furthermore, most of the networks demand a fixed input size
and thus, the input images have to be resized. Resizing on pixel level
means, that pixels are transformed to a new position. Thereby, the
resized pixels are mostly located on mesh positions that are not fixed
to an integer grid. An example on pixel level is shown in Fig. 1a.
The red circles denote the resized pixels on mesh positions, whereas
the black dots denote the pixel grid. The dashed lines support the
presentation of the pixel grid. For a further processing, the resized
pixels at mesh positions have to be resampled to the pixel grid.
Throughout the last years, efficient network architectures were
developed [1]–[4]. They provide good results for image classification.
(a) Original points after resizing
on mesh points. Resampling onto
pixel grid necessary.
(b) BLOCK loss covers the cen-
ter (gray). No original points in
this area. Pixel grid must be re-
constructed.
Fig. 1: Original data given as red circles. The full pixel grid as goal
of resizing and reconstruction given as black dots.
With the increasing usage of neural networks and the production of
a growing number of images and videos, an increasing amount of
visual data is interpreted by machines. Typically, the interpretation is
conducted after transmitting the data to a central server in both, cloud
and edge computing scenarios [5], [6]. For transmission, the visual
content has to be compressed. Current compression schemes typically
separate the images into smaller blocks [7], [8]. If the transmission
is disturbed, only a subset of blocks can be transmitted properly.
The remaining image parts have to be filled using reconstruction
algorithms. The scenario is demonstrated on pixel level in Figure 1b.
As in Figure 1a, the black dots denote the pixel grid. The gray
colored area in the middle represents a BLOCK loss. The original
data is given as red circles outside the gray area. The information
for the black dots in the gray backed area has to be reconstructed
from the red data points. If such an image should be analyzed by
a neural network, the loss in the image has to be reconstructed
first. Thereafter, the image is resized to the expected input size of
the network. Reconstructing missing parts of images was mainly
investigated for humans as observers and thus, the algorithms are
optimized mainly in terms of Peak-Signal-to-Noise-Ratio (PSNR) and
Structural Similarity Index Measurement (SSIM) [9], [10]. A high-
quality reconstruction technique is Frequency-Selective Reconstruc-
tion (FSR) [9]. For the resizing step, a classical interpolation such
as bilinear or bicubic interpolation can typically be incorporated.
arXiv:2210.15444v3 [eess.IV] 27 Nov 2023
Training Set Validation Set
Data Augmentation
Rotation & Zoom
Resample
Resize
Resample
Training of network
Fig. 2: Training set up as flow chart.
However, this preprocessing pipeline consists of two processing
parts that might both introduce errors to the final image. Hence,
we propose to resize the image including its losses and combine
reconstruction and resampling into one computation step. Thus, we
aim at minimizing the error that was made in the preprocessing of a
neural network and thereby, increase the classification accuracy. We
incorporate Frequency-Selective Mesh-to-Grid Resampling (FSMR)
for this purpose. FSMR could already proof to be beneficial as a
preprocessing method for the resizing of test images [11].
We will briefly introduce FSMR in the upcoming section. Thereafter,
we present our proposed procedure in Sec. III. It follows our exper-
imental setup in Sec. IV. Next, we show the results of our extensive
experiments in Sec. V. Finally, the paper closes with a conclusion.
II. FREQUENCY-SELECTIVE MESH-TO-GRID RESAMPLING
A common approach in image signal processing is to view the
image in the spectral domain. Images can be represented in the spatial
domain as well as in the spectral domain. The spectral domain of
images can be reached by transforming the image using Discrete
Fourier or Discrete Cosine Transform (DCT). This representation
gives the frequency distribution of the image. Natural images can
usually be represented in terms of only few basis functions. If pixels
are located on mesh positions due to, e.g., a geometrical transform, the
spectral discrete transforms cannot be carried out directly any more.
Thus, the frequencies contained in the geometrically transformed
image have to be estimated with a model. The main assumption of
our model is that an image f[m, n]can be represented in terms of a
weighted superposition of basis functions φk,l[m, n]
f[m, n] = X
k,l∈K
ck,lφk,l [m, n],(1)
where the indexes kand ldenote the frequency index in horizontal
and vertical direction, respectively. They are chosen from the set of
all possible basis functions Kat every coordinate position [m, n].
The model itself is set to zero in the beginning, i.e., g(0)[m, n]0.
During the model generation process the best fitting basis function
φu,v [m, n]with its according expansion coefficient ˆcu,v is added
in the current iteration νto the model from the previous iterations
g(ν1)[m, n]
g(ν)[m, n] = g(ν1)[m, n] + ˆcu,v φu,v [m, n].(2)
As the aim of the generated model is to estimate the original image
signal as good as possible, the differences between the model and the
Test Set
Add pattern
Reconstruct
Resize
Resample
Testing of network
1
2
3
(a) Common sequential approach.
Test Set
Add pattern
Resize
Reconstruct and Resample
Testing of network
1
3*
(b) Proposed joint approach.
Fig. 3: Testing set ups as flow chart.
original have to be minimized. Thus, the residual r(ν)is given as
r(ν)=f[m, n]g(ν)[m, n].(3)
The minimization of the residual is then formulated in terms of the
residual energy
E(ν)=X
(m,n)
w[m, n]r(ν)[m, n]2,(4)
where a spatial weighting function w[m, n]is incorporated addition-
ally. It is defined as an isotropically decaying window function with
its center in the middle of the currently processed block. This enables
the model to prefer center regions and adapt to local characteristics
of the image. The indexes (u, v)of the best fitting basis function that
is added in the current iteration can be determined using the residual
energy according to
(u, v) = argmax
(k,l)E(ν)
k,l wf[k, l].(5)
In (5), a spectral weighting function wf[k, l]is incorporated.
It favors low frequencies as it could be shown that natural images
are mainly composed of low frequency components [12]. High
frequencies tend to produce artifacts such as ringing. Nevertheless, if
high frequencies are dominant in the image, they can still be included
in the model. In every iteration, the basis function is chosen that
maximizes the residual energy reduction the most as the gap between
original signal and model should be closed as fast as possible.
Finally, the model is evaluated for the goal coordinates [o, p]on the
grid
f[o, p] = X
k,l∈K
ˆck,lφk,l[o, p].(6)
For the final evaluation the estimated expansion coefficient of every
possible basis function is multiplied with the according basis function
and summed up.
FSMR was shown to be a high performing method for various
resampling scenarios such as affine transforms [13] and motion
compensated frame-rate up-conversion [14].
III. PROPOSED APPROACH
In this study, we assume a transmission scenario where losses oc-
cur. The transmitted images should be classified into object categories
by a neural network. The network itself was trained for non-corrupted
images. Thus, the losses have to be reconstructed and the images
摘要:

JointlyResamplingandReconstructingCorruptedImagesforImageClassificationusingFrequency-SelectiveMesh-to-GridResamplingViktoriaHeimann,AndreasSpruck,andAndr´eKaupMultimediaCommunicationsandSignalProcessingFriedrich-Alexander-Universit¨atErlangen-N¨urnbergCauerstraße7,91058ErlangenEmail:{viktoria.heima...

展开>> 收起<<
Jointly Resampling and Reconstructing Corrupted Images for Image Classification using Frequency-Selective Mesh-to-Grid Resampling.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:7 页 大小:752.11KB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注