DEEPFAKE CLI Accelerated Deepfake Detection using FPGAs Omkar Bhilare12 Rahul Singh12 Vedant Paranjape12 Sravan

2025-05-06 0 0 1.73MB 12 页 10玖币
侵权投诉
DEEPFAKE CLI: Accelerated Deepfake
Detection using FPGAs
Omkar Bhilare?1,2, Rahul Singh?1,2, Vedant Paranjape?1,2, Sravan
Chittupalli?1,2, Shraddha Suratkar1,2, and Faruk Kazi1,2
1Department of Electrical Engineering, V.J.T.I, Mumbai, India.
2{oabhilare b19,rsingh b18, vvparanjape b18 schittupalli b18@el.vjti.ac.in,
sssuratkar@ce.vjti.ac.in, fskazi@el.vjti.ac.in
Abstract. Because of the availability of larger datasets and recent
improvements in the generative model, more realistic Deepfake videos
are being produced each day. People consume around one billion hours
of video on social media platforms every day, and that’s why it is very
important to stop the spread of fake videos as they can be damaging,
dangerous, and malicious. There has been a significant improvement in
the field of deepfake classification, but deepfake detection and inference
have remained a difficult task. To solve this problem in this paper, we
propose a novel DEEPFAKE C-L-I (Classification – Localization –
Inference) in which we have explored the idea of accelerating Quantized
Deepfake Detection Models using FPGAs due to their ability of
maximum parallelism and energy efficiency compared to generalized
GPUs. In this paper, we have used light MesoNet with EFF-YNet
structure and accelerated it on VCK5000 FPGA, powered by
state-of-the-art VC1902 Versal Architecture which uses AI, DSP, and
Adaptable Engines for acceleration. We have benchmarked our
inference speed with other state-of-the-art inference nodes, got 316.8
FPS on VCK5000 while maintaining 93% Accuracy.
Keywords: Generative Models ·Deepfake Detection ·Deepfake
Classification ·Machine Learning ·Quantized ·FPGAs ·MesoNet ·
EFF-YNet ·VCK5000 ·VC1902 ·Versal Architecture ·AI ·DSP ·
Adaptable Engines
1 Introduction
Deepfake is artificially created media in which a frame is created synthetically
using someone else’s features like face, structure, lip movements ,etc. They are
usually created by leveraging a Generative Adversarial Network to create a
picture or video which looks realistic enough to deceive any person. While
Deepfakes were initially created to prank individuals, they started getting
attention due to their use in illegal activities like celebrity pornographic videos,
fake news, and bullying. Hence, detecting deepfakes has become a very
?These Authors contributed equally to this work
arXiv:2210.14743v1 [cs.AR] 26 Oct 2022
2 O. Bhilare, R. Singh, V. Paranjape et al.
important issue in recent years.
Upon going through relevant literature and existing methods of deepfake
classification, we observed that it is essentially an image classification problem,
but the catch here is that pathological differences between the real and fake
images are quite small, as a result existing CNN models need to be modified
and tuned to detect these minute differences.
MesoNet is a CNN architecture which is used to detect Face2Face and
Deepfakes manipulations accurately [1]. It is designed in such a way that it
uses cropped faces from videos and analyses mid-level features. The model
focuses on the right amount of details by using an architecture with small
number of layers. XceptionNet is a complex model originally designed for
working with 2D Images which uses depthwise separable convolutional layers
with residual connections [2], as a result it gives higher performance than
MesoNet [1]. However, this model takes a lot of time to train. EfficientNets
which are relatively a newer family of CNN models aimed at providing efficient
resource management by balancing model parameters like width, depth, and
resolution, outperform models which have a similar number of parameters [3].
Researchers have taken a step further and proposed classifying each pixel of
the image as real or fake. The U-Net architecture addresses this issue by
employing an encoder-decoder based network with skip connections [4]. To
improve classification accuracy, Eff-Ynet describes a novel architecture which
combines EfficientNet encoder with a classification and segmentation
branch [5]. It is designed to classify an image and also find regions where the
image is real and where it is fake. The job of segmentation helps train the
classifier, and at the same time it also produces useful segmentation masks.
The inference of neural networks is usually slow on general-purpose GPUs [6].
One way to accelerate the inference of these networks is to use soft cores
emulating on the FPGAs, which allows the user to utilize the task and data
level parallelism to reach the performance of ASIC implementations while
taking reduced design time [7]. It is also possible to make RNN or CNN specific
hardware architectures. In this paper, they have accelerated Deep Recurrent
Neural Network (DRNN) on a hardware accelerator running on XILINX
ZYNQ FPGA [8]. These ZYNQ boards are heterogeneous in nature, which
means it has a hardcore CPU besides FPGA. In one of the paper, researchers
have mapped BNNs (Binarized neural networks) onto an FPGA device while
FP32 networks are mapped to the CPU, this hybrid mapping increases overall
neural network efficiency while maintaining inference speed [9]. Researchers
have found these ZYNQ boards perform better than CPUs and GPUs [10].
One of the paper, transforms the model into FP81format for speed up [11].
Upon evaluating this, we decided to go with INT82in our implementation,
thus quantized our model to INT8 precision. This means that our model, which
was originally trained on FP323precision, was converted to INT81precision.
1FP8 means 8-bit floating point representation
2INT8 means 8-bit integer representation
3FP32 means 32-bit floating point representation
DEEPFAKE CLI: Accelerated Deepfake Detection using FPGAs 3
By reducing the precision, the number of bits required to store the model
parameters are reduced, which means it requires less amount of memory to
store and reduces the number of clock cycles to transfer data between memory
and the accelerator over the PCIe bus.
We propose U-YNet, a combined segmentation, and classification model which
consists of UNet Encoder and Decoder responsible for producing a
segmentation map to show the altered regions in the deepfake content and a
classification branch present at the end of the UNet Encoder branch
responsible for classifying if the media content is real or deepfake origin. The
segmentation map is a novel way to identify the regions of a face that have
been mangled to create the deepfake, giving an insight into the construction of
a deepfake and paving the way for creating models to reverse a deepfake as
well. This model runs on a Deep Learning Processing Unit(DPU)6present in
the AMD-XILINX VCK 5000 Versal FPGA device. It consists of dedicated
processing units like hardware accelerated convolution engine which enables
convolution based model like UNet (and U-YNet) to run with higher inference
speeds, enabling real-time classification of deepfake content.
2 Motivation and Background
Improvement in generative models and abundance of datasets has led to
evolution of models that can generate realistic looking deepfake videos,
deceiving the human eye and machines as well (Fig. 1). There is a huge
potential to spread deepfaked videos by malicious actors for their gains, as
more than 100 million hours of video content is watched every day on social
media.
Fig. 1: Completely believable deepfakes can be generated with ease nowadays.
Looking at this with the perspective of computation, faster computation
speeds and easily available compute resources means that deepfake videos can
摘要:

DEEPFAKECLI:AcceleratedDeepfakeDetectionusingFPGAsOmkarBhilare?1;2,RahulSingh?1;2,VedantParanjape?1;2,SravanChittupalli?1;2,ShraddhaSuratkar1;2,andFarukKazi1;21DepartmentofElectricalEngineering,V.J.T.I,Mumbai,India.2foabhilareb19,rsinghb18,vvparanjapeb18schittupallib18@el.vjti.ac.in,sssuratkar@ce.vj...

展开>> 收起<<
DEEPFAKE CLI Accelerated Deepfake Detection using FPGAs Omkar Bhilare12 Rahul Singh12 Vedant Paranjape12 Sravan.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.73MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注