fake detection is the visual indistinguishability be-
tween fake images created by GANs and real ones.
Moreover, the abuse of fake images potentially pose
threats to personal and national security. Therefore,
research on deepfake detection has become increas-
ingly important with the rapid iteration of GANs.
There are two kinds of tasks in the detection of
GAN-generated images. The easiest is identifying an
image as real or fake. The harder one consists of
attributing fake images to the corresponding GAN
that generated them. In this paper, we mainly focus
on the attribution task. Both tasks involve extract-
ing features from images and feeding them to classi-
fiers. For the classifiers, there are approaches based
on traditional machine learning methods, which are
relatively simple, but often reach relatively bad re-
sults, see [13,24]. Approaches based on deep learn-
ing, especially convolutional neural networks (CNN),
have proven powerful and are employed in many re-
cent papers, see [38,45,42,28,47,12,43]. For
feature extraction, the simplest method is just us-
ing raw pixels as input. The results are, however,
not of high accuracy and the classifiers fed with
raw pixels are not robust under common perturba-
tions, see [28,12]. Therefore, it is necessary to
develop methods to better extract features. One
stream is the learning-based method by Yu et al.in
[45,46,47], which found unique fingerprints of each
GAN. Another stream is based on the mismatches
between real and fake in the frequency domain, see
[48,10,12,9,28,43]. Specifically, multiresolution
methods, e.g., the wavelet packet transform, have
recently been employed for deepfake detection, see
Wolter et al.in [43]. Their work demonstrates the ca-
pabilities of multiresolution analyses for the task at
hand and marks the starting point for our consider-
ations. In contrast to the isotropic transformations
considered there, we focus on anisotropic transforma-
tions, i.e., the fully separable wavelet transform ([40])
and samplets ([18]), which are a particular variant of
multiwavelets.
Because the generators in all GAN architectures
synthesize images in high resolution from low reso-
lution images using deconvolution layers with square
sliding windows, it is highly likely for the anisotropic
multi wavelet transforms of fake images to leave
artifacts on anisotropic sub-bands. In this pa-
per, we show that features from anisotropic (multi-
)wavelet transforms are promising descriptors of im-
ages. This is due to remarkable mismatches be-
tween the anisotropic multiwavelet transforms of real
and fake images, see Figure 3. To evaluate the
anisotropic features, we set up a lightweight multi-
class CNN classifier as in [12,43] and compare our
results on the datasets consisting of authentic images
from one of the three commonly used image datasets:
Large-scale Celeb Faces Attributes (CelebA [27]),
LSUN bedrooms ([44]), and Flickr-Faces-HQ (FFHQ
[22]), and synthesized images generated by Cramer-
GAN, MMDGAN, ProGAN, and SN-DCGAN on the
CelebA and LSUN bedroom, or the StyleGANs on
the FFHQ. Finally, as in [12,43], we test the sensitiv-
ity to the number of training samples and the robust-
ness under the four common perturbations: Gaussian
blurring, image crop, JPEG based compression, and
addition of Gaussian noise.
2 Related work
Deepfake detection: A comprehensive statistical
studying of natural images shows that regularities al-
ways exist in natural images due to the strong cor-
relations among pixels, see [29]. However, such regu-
larity does not exist in synthesized images. Besides,
it is well-known that checkerboard artifacts exist in
CNNs-generated images due to downsampling and
upsampling layers, see examples in [37,4]. The ar-
tifacts make identification of deepfakes possible. In
[31,38,42], the authors show that GAN-generated
fake images can be detected using CNNs fed by con-
ventional image foresics features, i.e., raw pixels. In
order to improve the accuracy and generalization of
classifier, several methods are proposed to address
the problem of finding more discriminative features
instead of raw pixels. Several non-learnable features
are proposed, for example hand-crafted cooccurrence
features in [35], color cues in [32], layer-wise neuron
behavior in [41], and global texture in [28]. In [45],
Yu et al.discover the possibility of uniquely finger-
printing each GAN model and characterize the corre-
sponding output during the training procedure. With
2