Anisotropic multiresolution analyses for deepfake detection Wei Huang Istituto Eulero

2025-04-30 0 0 1.88MB 13 页 10玖币

侵权投诉

Anisotropic multiresolution analyses for deepfake detection

Wei Huang

Istituto Eulero

Universit`a della Svizzera italiana

Michelangelo Valsecchi

Istituto Eulero

Universit`a della Svizzera italiana

Michael Multerer

Istituto Eulero

Universit`a della Svizzera italiana

November 8, 2022

Abstract

Generative Adversarial Networks (GANs) have paved

the path towards entirely new media generation ca-

pabilities at the forefront of image, video, and audio

synthesis. However, they can also be misused and

abused to fabricate elaborate lies, capable of stirring

up the public debate. The threat posed by GANs

has sparked the need to discern between genuine

content and fabricated one. Previous studies have

tackled this task by using classical machine learning

techniques, such as k-nearest neighbours and eigen-

faces, which unfortunately did not prove very eﬀec-

tive. Subsequent methods have focused on leverag-

ing on frequency decompositions, i.e., discrete co-

sine transform, wavelets, and wavelet packets, to

preprocess the input features for classiﬁers. How-

ever, existing approaches only rely on isotropic trans-

formations. We argue that, since GANs primarily

utilize isotropic convolutions to generate their out-

put, they leave clear traces, their ﬁngerprint, in the

coeﬃcient distribution on sub-bands extracted by

anisotropic transformations. We employ the fully

separable wavelet transform and multiwavelets to ob-

tain the anisotropic features to feed to standard CNN

classiﬁers. Lastly, we ﬁnd the fully separable trans-

form capable of improving the state-of-the-art.

1 Introduction

Generative Adversarial Networks (GANs) have be-

come a thriving topic in recent years after the initial

work by Goodfellow et al.in [16]. Since then, GANs

have quickly become a popular and rapidly changing

ﬁeld due to their ability to learn high-dimensional

complex real image distributions. As a result, nu-

merous GAN variants have emerged, like Cramer-

GAN ([5]), MMDGAN ([26]), ProGAN ([14]), SN-

DCGAN ([33]), and the state-of-the-art StyleGAN,

StyleGAN2, and StyleGAN3 ([22,23,21]). Among

various primary applications of GANs is fake image

and video generation, e.g., Deepfakes ([1]), FaceApp

([2]), and ZAO ([3]). In particular, Deepfakes is

the ﬁrst successful project taking advantages of deep

learning, which was started in 2017 on Reddit by an

account with the same name. Since then, deepfakes

are regarded as falsiﬁed images and videos created by

deep learning algorithms, see [39]. A major source of

motivation for investigation into the automatic deep-

arXiv:2210.14874v2 [cs.CV] 4 Nov 2022

fake detection is the visual indistinguishability be-

tween fake images created by GANs and real ones.

Moreover, the abuse of fake images potentially pose

threats to personal and national security. Therefore,

research on deepfake detection has become increas-

ingly important with the rapid iteration of GANs.

There are two kinds of tasks in the detection of

GAN-generated images. The easiest is identifying an

image as real or fake. The harder one consists of

attributing fake images to the corresponding GAN

that generated them. In this paper, we mainly focus

on the attribution task. Both tasks involve extract-

ing features from images and feeding them to classi-

ﬁers. For the classiﬁers, there are approaches based

on traditional machine learning methods, which are

relatively simple, but often reach relatively bad re-

sults, see [13,24]. Approaches based on deep learn-

ing, especially convolutional neural networks (CNN),

have proven powerful and are employed in many re-

cent papers, see [38,45,42,28,47,12,43]. For

feature extraction, the simplest method is just us-

ing raw pixels as input. The results are, however,

not of high accuracy and the classiﬁers fed with

raw pixels are not robust under common perturba-

tions, see [28,12]. Therefore, it is necessary to

develop methods to better extract features. One

stream is the learning-based method by Yu et al.in

[45,46,47], which found unique ﬁngerprints of each

GAN. Another stream is based on the mismatches

between real and fake in the frequency domain, see

[48,10,12,9,28,43]. Speciﬁcally, multiresolution

methods, e.g., the wavelet packet transform, have

recently been employed for deepfake detection, see

Wolter et al.in [43]. Their work demonstrates the ca-

pabilities of multiresolution analyses for the task at

hand and marks the starting point for our consider-

ations. In contrast to the isotropic transformations

considered there, we focus on anisotropic transforma-

tions, i.e., the fully separable wavelet transform ([40])

and samplets ([18]), which are a particular variant of

multiwavelets.

Because the generators in all GAN architectures

synthesize images in high resolution from low reso-

lution images using deconvolution layers with square

sliding windows, it is highly likely for the anisotropic

multi wavelet transforms of fake images to leave

artifacts on anisotropic sub-bands. In this pa-

per, we show that features from anisotropic (multi-

)wavelet transforms are promising descriptors of im-

ages. This is due to remarkable mismatches be-

tween the anisotropic multiwavelet transforms of real

and fake images, see Figure 3. To evaluate the

anisotropic features, we set up a lightweight multi-

class CNN classiﬁer as in [12,43] and compare our

results on the datasets consisting of authentic images

from one of the three commonly used image datasets:

Large-scale Celeb Faces Attributes (CelebA [27]),

LSUN bedrooms ([44]), and Flickr-Faces-HQ (FFHQ

[22]), and synthesized images generated by Cramer-

GAN, MMDGAN, ProGAN, and SN-DCGAN on the

CelebA and LSUN bedroom, or the StyleGANs on

the FFHQ. Finally, as in [12,43], we test the sensitiv-

ity to the number of training samples and the robust-

ness under the four common perturbations: Gaussian

blurring, image crop, JPEG based compression, and

addition of Gaussian noise.

2 Related work

Deepfake detection: A comprehensive statistical

studying of natural images shows that regularities al-

ways exist in natural images due to the strong cor-

relations among pixels, see [29]. However, such regu-

larity does not exist in synthesized images. Besides,

it is well-known that checkerboard artifacts exist in

CNNs-generated images due to downsampling and

upsampling layers, see examples in [37,4]. The ar-

tifacts make identiﬁcation of deepfakes possible. In

[31,38,42], the authors show that GAN-generated

fake images can be detected using CNNs fed by con-

ventional image foresics features, i.e., raw pixels. In

order to improve the accuracy and generalization of

classiﬁer, several methods are proposed to address

the problem of ﬁnding more discriminative features

instead of raw pixels. Several non-learnable features

are proposed, for example hand-crafted cooccurrence

features in [35], color cues in [32], layer-wise neuron

behavior in [41], and global texture in [28]. In [45],

Yu et al.discover the possibility of uniquely ﬁnger-

printing each GAN model and characterize the corre-

sponding output during the training procedure. With

this technique, responsible GAN developers could ﬁn-

gerprint their models and keep track of abnormal us-

age of their releases. In the follow-up paper ([47]),

Yu et al.scale up the GAN ﬁngerprinting mechanism.

However, in [36], Neves et al.propose GANprintR to

remove the ﬁngerprints of GANs, which renders this

identiﬁcation method useless.

Frequency artifacts: It is found that artifacts are

more visible in the frequency domain. State of the art

results are achieved using features in the frequency

domain, e.g., the coeﬃcients of the discrete cosine

transform ([48,10,12,9,28]) and the coeﬃcients of

the isotropic wavelet packet transform ([43]). In [12],

Frank et al.found that the grid-like patterns in the

frequency domain stem from the upsampling layers.

Even though ProGAN and StyleGANs are equipped

with improved upsampling methods, artifacts still

exit in their frequency domain. Combination of the

features in frequency domain and lightweight convo-

lutional neural networks can outperform the complex

heavyweight convolution neural networks using fea-

tures based on the pixel values of images. In [43],

features based on wave-packets are used, which out-

performs all the other state-of-the-art methods with

comparable lightweight CNN classiﬁers. The success

of the isotropic wave-packets inspired us to further in-

vestigate this direction and to also take into account

anisotropic multiresolution analyses, to extract more

distinguishable features for the deepfake detection.

3 Proposed Method

3.1 Motivation

Images are often composed of two types of regions:

mostly monochromatic patches, usually backgrounds,

and areas with sharp color gradients, found in corre-

spondence with borders that separate diﬀerent ob-

jects. This construction is similar to a square wave

in 1D, which is notoriously diﬃcult to approximate

with only cosine functions like the discrete cosine

transform (DCT) does. This fact is known as the

Gibbs phenomenon, see [15]. Similar to a square

wave in 1D, images can be considered as piecewise

constant functions in 2D, which makes using DCT

methods challenging as their supports are not local-

ized in space but only in frequency. This results in

redundant representations of images in the frequency

domain. One solution, proposed in [43], is to decom-

pose an image into frequencies while also maintain-

ing spatial information is using wavelets, which are

localized in both domains and are thus less suscepti-

ble to discontinuities. In order to manifest the eﬃ-

cient representation of images using wavelets, we con-

sider an isotropic pattern with discontinuities on the

boundaries of square blocks, and a anisotropic pat-

tern with discontinuities on the boundaries of rect-

angular blocks, see Figure 1. We then compute the

DCT and four diﬀerent kinds of wavelet transforms,

i.e., the discrete wavelet transform (DWT), the dis-

crete wavelet packet transform (DWPT), the fully

separable wavelet transform (FSWT), and the sam-

plet transform. From the bar plot in Figure 1, all

wavelet transforms overcome the Gibbs phenomenon,

in contrast to the DCT. However, anisotropic wavelet

transforms, i.e., FSWT and samplets, perform much

better than isotropic DWT and DWPT in the task

of ﬁnding eﬃcient representations for anisotropic pat-

terns which commonly exist in real images.

The previous works [48,12] have already analyzed

the eﬀectiveness of using the frequency domain in-

stead of the direct pixel representation when detect-

ing deepfakes. Moreover, the method in [43] has im-

proved the state-of-the-art result using the isotropic

wavelet, i.e., the DWPT. However, they usually re-

sult in redundant representations of images. More-

over, they rely on only isotropic decompositions. We

are convinced that anisotropic transforms can add a

new aspect to the challenge at hand. The intuition

behind this reasoning comes from the fact that GAN

architectures typically only use isotropic convolutions

(square sliding windows) to synthesize new samples,

thus being unaware of the ﬁngerprint they are leaving

in the hidden anisotropic coeﬃcients’ distribution of

the image.

We focus on two technologies that allow us to ex-

pose these ﬁngerprints and obtain a spatio-frequency

representation of the source image: the fully sepa-

rable wavelet transform and samplets, which are a

particular variant of multiwavelets.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AnisotropicmultiresolutionanalysesfordeepfakedetectionWeiHuangIstitutoEuleroUniversitadellaSvizzeraitalianaMichelangeloValsecchiIstitutoEuleroUniversitadellaSvizzeraitalianaMichaelMultererIstitutoEuleroUniversitadellaSvizzeraitalianaNovember8,2022AbstractGenerativeAdversarialNetworks(GANs)havepav...

展开>> 收起<<

Anisotropic multiresolution analyses for deepfake detection Wei Huang Istituto Eulero.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Anisotropic multiresolution analyses for deepfake detection Wei Huang Istituto Eulero

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: