Jointly Resampling and Reconstructing Corrupted Images for Image Classification using Frequency-Selective Mesh-to-Grid Resampling

2025-05-05 0 0 752.11KB 7 页 10玖币

侵权投诉

Jointly Resampling and Reconstructing Corrupted Images

for Image Classiﬁcation using

Frequency-Selective Mesh-to-Grid Resampling

Viktoria Heimann, Andreas Spruck, and Andr´

e Kaup

Multimedia Communications and Signal Processing

Friedrich-Alexander-Universit¨

at Erlangen-N¨

urnberg

Cauerstraße 7, 91058 Erlangen

Email: {viktoria.heimann, andreas.spruck, andre.kaup}@fau.de

reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or

reuse of any copyrighted component of this work in other works.

Abstract—Neural networks became the standard technique for image

classiﬁcation throughout the last years. They are extracting image features

from a large number of images in a training phase. In a following test

phase, the network is applied to the problem it was trained for and its

performance is measured. In this paper, we focus on image classiﬁcation.

The amount of visual data that is interpreted by neural networks grows

with the increasing usage of neural networks. Mostly, the visual data

is transmitted from the application side to a central server where

the interpretation is conducted. If the transmission is disturbed, losses

occur in the transmitted images. These losses have to be reconstructed

using postprocessing. In this paper, we incorporate the widely applied

bilinear and bicubic interpolation and the high-quality reconstruction

Frequency-Selective Reconstruction (FSR) for the reconstruction of cor-

rupted images. However, we propose to use Frequency-Selective Mesh-

to-Grid Resampling (FSMR) for the joint reconstruction and resizing of

corrupted images. The performance in terms of classiﬁcation accuracy of

EfﬁcientNetB0, DenseNet121, DenseNet201, ResNet50 and ResNet152 is

examined. Results show that the reconstruction with FSMR leads to the

highest classiﬁcation accuracy for most networks. Average improvements

of up to 6.7 percentage points are possible for DenseNet121.

I. INTRODUCTION

Neural networks are widely applied during the last years for

image processing. They are especially used for interpreting images.

A task that is easy for humans but difﬁcult for machines. Neural

networks are able to extract features from image content. They do

not require the image content to be described analytically. They learn

to extract the important image features by training. In the training

phase, a large number of images is presented to the network. Thus,

if only a small number of images is available, techniques such as

data augmentation have to be applied in order to create a larger

number of images for training. Data augmentation refers to the task

of artiﬁcially creating additional data with new characteristics from

the already available data. Typical techniques for data augmentation

are geometric transformations such as zooming and rotating input

images. Furthermore, most of the networks demand a ﬁxed input size

and thus, the input images have to be resized. Resizing on pixel level

means, that pixels are transformed to a new position. Thereby, the

resized pixels are mostly located on mesh positions that are not ﬁxed

to an integer grid. An example on pixel level is shown in Fig. 1a.

The red circles denote the resized pixels on mesh positions, whereas

the black dots denote the pixel grid. The dashed lines support the

presentation of the pixel grid. For a further processing, the resized

pixels at mesh positions have to be resampled to the pixel grid.

Throughout the last years, efﬁcient network architectures were

developed [1]–[4]. They provide good results for image classiﬁcation.

(a) Original points after resizing

on mesh points. Resampling onto

pixel grid necessary.

(b) BLOCK loss covers the cen-

ter (gray). No original points in

this area. Pixel grid must be re-

constructed.

Fig. 1: Original data given as red circles. The full pixel grid as goal

of resizing and reconstruction given as black dots.

With the increasing usage of neural networks and the production of

a growing number of images and videos, an increasing amount of

visual data is interpreted by machines. Typically, the interpretation is

conducted after transmitting the data to a central server in both, cloud

and edge computing scenarios [5], [6]. For transmission, the visual

content has to be compressed. Current compression schemes typically

separate the images into smaller blocks [7], [8]. If the transmission

is disturbed, only a subset of blocks can be transmitted properly.

The remaining image parts have to be ﬁlled using reconstruction

algorithms. The scenario is demonstrated on pixel level in Figure 1b.

As in Figure 1a, the black dots denote the pixel grid. The gray

colored area in the middle represents a BLOCK loss. The original

data is given as red circles outside the gray area. The information

for the black dots in the gray backed area has to be reconstructed

from the red data points. If such an image should be analyzed by

a neural network, the loss in the image has to be reconstructed

ﬁrst. Thereafter, the image is resized to the expected input size of

the network. Reconstructing missing parts of images was mainly

investigated for humans as observers and thus, the algorithms are

optimized mainly in terms of Peak-Signal-to-Noise-Ratio (PSNR) and

Structural Similarity Index Measurement (SSIM) [9], [10]. A high-

quality reconstruction technique is Frequency-Selective Reconstruc-

tion (FSR) [9]. For the resizing step, a classical interpolation such

as bilinear or bicubic interpolation can typically be incorporated.

arXiv:2210.15444v3 [eess.IV] 27 Nov 2023

Training Set Validation Set

Data Augmentation

Rotation & Zoom

Resample

Resize

Resample

Training of network

Fig. 2: Training set up as ﬂow chart.

However, this preprocessing pipeline consists of two processing

parts that might both introduce errors to the ﬁnal image. Hence,

we propose to resize the image including its losses and combine

reconstruction and resampling into one computation step. Thus, we

aim at minimizing the error that was made in the preprocessing of a

neural network and thereby, increase the classiﬁcation accuracy. We

incorporate Frequency-Selective Mesh-to-Grid Resampling (FSMR)

for this purpose. FSMR could already proof to be beneﬁcial as a

preprocessing method for the resizing of test images [11].

We will brieﬂy introduce FSMR in the upcoming section. Thereafter,

we present our proposed procedure in Sec. III. It follows our exper-

imental setup in Sec. IV. Next, we show the results of our extensive

experiments in Sec. V. Finally, the paper closes with a conclusion.

II. FREQUENCY-SELECTIVE MESH-TO-GRID RESAMPLING

A common approach in image signal processing is to view the

image in the spectral domain. Images can be represented in the spatial

domain as well as in the spectral domain. The spectral domain of

images can be reached by transforming the image using Discrete

Fourier or Discrete Cosine Transform (DCT). This representation

gives the frequency distribution of the image. Natural images can

usually be represented in terms of only few basis functions. If pixels

are located on mesh positions due to, e.g., a geometrical transform, the

spectral discrete transforms cannot be carried out directly any more.

Thus, the frequencies contained in the geometrically transformed

image have to be estimated with a model. The main assumption of

our model is that an image f[m, n]can be represented in terms of a

weighted superposition of basis functions φk,l[m, n]

f[m, n] = X

k,l∈K

ck,lφk,l [m, n],(1)

where the indexes kand ldenote the frequency index in horizontal

and vertical direction, respectively. They are chosen from the set of

all possible basis functions Kat every coordinate position [m, n].

The model itself is set to zero in the beginning, i.e., g(0)[m, n]≡0.

During the model generation process the best ﬁtting basis function

φu,v [m, n]with its according expansion coefﬁcient ˆcu,v is added

in the current iteration νto the model from the previous iterations

g(ν−1)[m, n]

g(ν)[m, n] = g(ν−1)[m, n] + ˆcu,v φu,v [m, n].(2)

As the aim of the generated model is to estimate the original image

signal as good as possible, the differences between the model and the

Test Set

Add pattern

Reconstruct

Resize

Resample

Testing of network

(a) Common sequential approach.

Test Set

Add pattern

Resize

Reconstruct and Resample

Testing of network

(b) Proposed joint approach.

Fig. 3: Testing set ups as ﬂow chart.

original have to be minimized. Thus, the residual r(ν)is given as

r(ν)=f[m, n]−g(ν)[m, n].(3)

The minimization of the residual is then formulated in terms of the

residual energy

E(ν)=X

(m,n)

w[m, n]r(ν)[m, n]2,(4)

where a spatial weighting function w[m, n]is incorporated addition-

ally. It is deﬁned as an isotropically decaying window function with

its center in the middle of the currently processed block. This enables

the model to prefer center regions and adapt to local characteristics

of the image. The indexes (u, v)of the best ﬁtting basis function that

is added in the current iteration can be determined using the residual

energy according to

(u, v) = argmax

(k,l)∆E(ν)

k,l wf[k, l].(5)

In (5), a spectral weighting function wf[k, l]is incorporated.

It favors low frequencies as it could be shown that natural images

are mainly composed of low frequency components [12]. High

frequencies tend to produce artifacts such as ringing. Nevertheless, if

high frequencies are dominant in the image, they can still be included

in the model. In every iteration, the basis function is chosen that

maximizes the residual energy reduction the most as the gap between

original signal and model should be closed as fast as possible.

Finally, the model is evaluated for the goal coordinates [o, p]on the

grid

f[o, p] = X

k,l∈K

ˆck,lφk,l[o, p].(6)

For the ﬁnal evaluation the estimated expansion coefﬁcient of every

possible basis function is multiplied with the according basis function

and summed up.

FSMR was shown to be a high performing method for various

resampling scenarios such as afﬁne transforms [13] and motion

compensated frame-rate up-conversion [14].

III. PROPOSED APPROACH

In this study, we assume a transmission scenario where losses oc-

cur. The transmitted images should be classiﬁed into object categories

by a neural network. The network itself was trained for non-corrupted

images. Thus, the losses have to be reconstructed and the images

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

JointlyResamplingandReconstructingCorruptedImagesforImageClassificationusingFrequency-SelectiveMesh-to-GridResamplingViktoriaHeimann,AndreasSpruck,andAndr´eKaupMultimediaCommunicationsandSignalProcessingFriedrich-Alexander-Universit¨atErlangen-N¨urnbergCauerstraße7,91058ErlangenEmail:{viktoria.heima...

展开>> 收起<<

Jointly Resampling and Reconstructing Corrupted Images for Image Classification using Frequency-Selective Mesh-to-Grid Resampling.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Jointly Resampling and Reconstructing Corrupted Images for Image Classification using Frequency-Selective Mesh-to-Grid Resampling

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: