Multi-Scale Wavelet Transformer for Face Forgery Detection Jie Liu Jingjing Wang Peng Zhang

2025-05-02 0 0 3.59MB 17 页 10玖币
侵权投诉
Multi-Scale Wavelet Transformer for Face
Forgery Detection
Jie Liu?, Jingjing Wang?, Peng Zhang
, Chunmao Wang, Di Xie, and Shiliang PuB
Hikvision Research Institute
{liujie54, wangjingjing9, zhangpeng45, wangchunmao, xiedi,
pushiliang.hri}@hikvision.com
Abstract. Currently, many face forgery detection methods aggregate
spatial and frequency features to enhance the generalization ability and
gain promising performance under the cross-dataset scenario. However,
these methods only leverage one level frequency information which lim-
its their expressive ability. To overcome these limitations, we propose
a multi-scale wavelet transformer framework for face forgery detection.
Specifically, to take full advantage of the multi-scale and multi-frequency
wavelet representation, we gradually aggregate the multi-scale wavelet
representation at different stages of the backbone network. To better fuse
the frequency feature with the spatial features, frequency-based spatial
attention is designed to guide the spatial feature extractor to concentrate
more on forgery traces. Meanwhile, cross-modality attention is proposed
to fuse the frequency features with the spatial features. These two at-
tention modules are calculated through a unified transformer block for
efficiency. A wide variety of experiments demonstrate that the proposed
method is efficient and effective for both within and cross datasets.
1 Introduction
Due to the various image-editing software and publicly available deep generator
models, it is easy to manipulate existing faces and make forged faces very re-
alistic and indistinguishable from genuine ones. These photo-realistic fake faces
may be abused for malicious purposes, raising severe security and privacy issues
in our society. Therefore, it is extremely necessary to develop effective meth-
ods for face forgery detection. To defend against the possible malicious usage of
face forgery, various face forgery detection methods have been proposed. Previ-
ous researchers [1,2] mainly designed methods based on texture artifacts caused
by the face forgery techniques in the spatial domain. Due to the fast evolution
of face forgery techniques, these artifacts are gradually concealed. Therefore,
although these methods achieved high within-dataset detection accuracy, their
performance dropped severely in the cross-dataset scenario, especially when con-
fronted with new face forgery methods.
?Equal contribution.
arXiv:2210.03899v1 [cs.CV] 8 Oct 2022
2 Liu, J. et al.
Level Sub-bands Deepfakes(DF) Face2Face(F2F) FaceSwap(FS) NeuralTextures(NT)
- Ori-Img 1.301 1.092 1.307 1.296
Level-1
LL 1.281 1.008 1.265 1.208
LH 2.688 2.709 2.970 2.959
HL 2.716 2.778 2.720 2.857
HH 2.582 2.914 3.258 2.758
Level-2
LL 1.208 0.958 1.165 1.162
LH 2.817 2.840 2.882 3.106
HL 2.598 2.686 2.549 2.871
HH 3.184 2.929 3.162 3.127
Level-3
LL 1.189 1.055 1.136 1.246
LH 2.473 2.493 2.510 2.826
HL 2.135 2.409 2.166 2.837
HH 2.774 2.917 2.936 2.985
Table 1: EMD of multi-level frequency components. Cropping the face in the
first frame of every video in FF++ dataset, and then calculating the EMD
of the original images or sub-bands frequency features between the fake and
corresponding real images. These sub-bands are obtained by three level discrete
wavelet transform.
To make the algorithm generalize well to unseen forgery methods, recently,
many face forgery detection methods attempt to aggregate information from
frequency domains. Yu et al. [3] utilized channel difference images and the spec-
trum obtained by DCT to detect fake faces. Other researchers leveraged Discrete
Fourier Transform (DFT) [4] and Discrete Cosine Transform (DCT) and block
DCT [5] for frequency information extracting. However, these methods only uti-
lized one level frequency information. And we found that multi-level frequency
features have more discriminable details between real and fake images. Only
using one level frequency may be less effective for extracting the abundant fre-
quency information, which limits the expressive ability of the obtained features.
As we all know, Discrete Wavelet Transform (DWT) is often used to obtain
multi-level frequency, so we choose Haar DWT to extract frequency features.
The filter fLL,fLH ,fH L, and fH H of DWT are 1
21 1
1 1,1
21 1
11,1
211
11,
and 1
211
1 1 , and they are used to calculate the frequency (LL, LH, HL,
HH) of an image I. The LL, LH, HL, and HH are defined as LL =fLL I,
LH =fLH I,HL =fHL I,HH =fHH I. DWT divides an image into four
frequency components with half resolution of the original image: a low-frequency
component (LL) and three high-frequency components (LH, HL, HH). And the
LL can be further decomposed into four frequency components recursively. In
this way, we can get multi-level wavelet representations. Earth Mover’s Distance
(EMD) [6] is used to measure the dissimilarity between two multidimensional
distributions, whose formula is defined in [6]. The total EMD distance of FF++
MSWT for Face Forgery Detection 3
dataset is calculated by three level frequency components between the real and
fake data, whose results are shown in Table 1. We observe that the distance of
high-frequency information between real and fake facial images is bigger than
low-frequency one at each level, which demonstrates that different level high fre-
quencies are all useful so that fusing multi-level high frequencies can make the
representations more expressive for face forgery detection.
fakerealfakereal
Gray LH HL HH Gray LH HL HH
Fig. 1: High-frequency sub-bands are obtained by DWT. The images in the 1st
and 3rd lines are fake images, and the others are real images. In this figure, we
show the fake facial images and their corresponding real images. The 1st and 5th
column are the gray images, and the column 2 to 4 and 6 to 8 are high frequency
sub-bands corresponding to the cropped red box. The forged pixels have fewer
high-frequency details (LH, HL, HH) compared with the real ones.
We also visualize the examples of the real and fake high frequency by DWT
in Figure 1 and 2. In Figure 1, we enlarge the local region of the first level DWT,
so we can see that there are more details in low-level high frequency. Figure 2
shows the whole high-frequency sub-bands of the three-level DWT, and there is
more global semantic information in high-level frequency. So the low-level and
high-level high-frequency features are all important for facial forgery detection.
Taking the above considerations, we take the multi-scale analysis of wavelet
decomposition into consideration and propose a multi-scale wavelet transformer
framework for face forgery detection named MSWT. Specifically, we gradually
aggregate the multi-scale wavelet features at different stages of the backbone
network to take full advantage of multi-level high-frequency representation. To
better fuse the frequency feature with the spatial features, frequency-based spa-
tial attention is designed to guide the spatial feature extractor to concentrate
more on forgery traces. Meanwhile, cross-modality attention is proposed to fuse
4 Liu, J. et al.
Gray LH L1 HL L1 HH L1 LH L2 HL L2 HH L2 LH L3 HL L3 HH L3
Fig. 2: The images in 1st and 2nd lines are the fake and the real facial images,
respectively. Columns 2 to 10 are level 1, 2, and 3 high frequency (LH, HL, and
HH) sub-bands by DWT. There is more details in lower levels, and more global
semantic structure in higher levels.
the RGB spatial features and the frequency features. These two attention mod-
ules are calculated through a unified transformer block for efficiency named
frequency and spatial feature fusion (FSF) module. The main contributions are
summarized as follows:
To make full use of frequency features, we are the first to utilize the multi-
scale properties of wavelet decomposition to improve the feature fusion of
spatial and frequency domains, and propose a multi-scale wavelet trans-
former framework for face forgery detection.
To better capture the manipulation trace, frequency-based spatial attention
is designed to guide spatial feature extractor to focus on forgery regions.
To better fuse the frequency features with the RGB spatial features, cross-
modality attention is introduced.
Experiments demonstrate that the proposed method works well on both
within-dataset and cross-dataset testing compared with other approaches.
2 Related Work
2.1 Forgery Detection
Forgery Detection based on Spatial Feature. In order to resist manipu-
lated faces and protect media security, many forgery detection algorithms have
been proposed in academia. Because deep learning can learn good feature repre-
sentation, some methods are proposed to extract RGB spatial features based on
deep learning. These approaches mainly include consisrency-based [7], attention-
based [8], and domain generalization methods [9]. Zhao et al. [8] proposed a
method based on multi-attention and textural feature enhancement to enlarge
artifacts in shallow features and capture discriminative details for face forgery
detection, fusing the low-level and high-level features by attention maps. Zhao
et al. [7] proposed patch-wise consistency learning between patches from the fea-
ture maps, which utilizes consistency loss to learn and optimize the consistency
of the patches from real or fake regions. Wodajo et al. [10] proposed a convolu-
tional vision transformer for deepfake video detection, and the network consists
摘要:

Multi-ScaleWaveletTransformerforFaceForgeryDetectionJieLiu?,JingjingWang?,PengZhang,ChunmaoWang,DiXie,andShiliangPuBHikvisionResearchInstitutefliujie54,wangjingjing9,zhangpeng45,wangchunmao,xiedi,pushiliang.hrig@hikvision.comAbstract.Currently,manyfaceforgerydetectionmethodsaggregatespatialandfreque...

展开>> 收起<<
Multi-Scale Wavelet Transformer for Face Forgery Detection Jie Liu Jingjing Wang Peng Zhang.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:3.59MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注