DETECTION OF REAL-TIME DEEPFAKES IN VIDEO CONFERENCING WITH ACTIVE PROBING AND CORNEAL REFLECTION Hui Guo Xin Wang Siwei Lyu_2

2025-05-06 0 0 4.17MB 6 页 10玖币
侵权投诉
DETECTION OF REAL-TIME DEEPFAKES IN VIDEO CONFERENCING WITH ACTIVE
PROBING AND CORNEAL REFLECTION
Hui Guo, Xin Wang, Siwei Lyu
Department of Computer Science and Engineering
University at Buffalo, State University of New York, USA.
{hguo8,xwang264,siweilyu}@buffalo.edu
ABSTRACT
The COVID pandemic has led to the wide adoption of online
video calls in recent years. However, the increasing reliance
on video calls provides opportunities for new impersonation
attacks by fraudsters using the advanced real-time DeepFakes.
Real-time DeepFakes pose new challenges to detection meth-
ods, which have to run in real-time as a video call is ongoing.
In this paper, we describe a new active forensic method to de-
tect real-time DeepFakes. Specifically, we authenticate video
calls by displaying a distinct pattern on the screen and using
the corneal reflection extracted from the images of the call
participant’s face. This pattern can be induced by a call partic-
ipant displaying on a shared screen or directly integrated into
the video-call client. In either case, no specialized imaging or
lighting hardware is required. Through large-scale simulations,
we evaluate the reliability of this approach under a range in a
variety of real-world imaging scenarios.
Index TermsReal-time DeepFake, Corneal Reflection
1. INTRODUCTION
Video calls are increasingly replacing in-person meetings and
phone calls in recent years, mainly due to the high demand of
remote working during the COVID pandemic. For instance,
at the end of 2019, the Zoom video conferencing platform
had only about 10 million users. By late April of 2021, that
figure had surged to over 200 million, a 20-fold increase. How-
ever, the wide adoption of video calls as a means of meeting
and inter-person communication has also given rise to new
forms of deception. In particular, the lack of physical presence
opens the gate for digital impersonation in video calls using
DeepFakes (i.e., AI-synthesized human face videos). The most
recent tools (e.g., Avartarify [
1
] and DeepFaceLive [
2
]) have
enabled the synthesis of DeepFakes in real-time and piped
through a virtual camera. The DeepFakes are either in the
form of face-swap or face puppetry [
3
]. Although there are
still artifacts in the real-time DeepFakes [
4
], the continuing
improvement of the synthesis technology means that it will be-
come increasingly difficult to distinguish a real person from an
AI-synthesized person at the other end of a video call. Indeed,
Fig. 1
:
Left:
A video call attendant is being actively authenti-
cated with the live patterns shown on the screen.
Right:
A real
person’s cornea will produce an image of the pattern shown
on the screen, while a real-time DeepFake cannot. The figures
are for demonstration. For actual results, see Fig. 5.
recent years are seeing such frauds emerged with an alarming
speed and start causing real damage [5].
The real-time DeepFakes pose new challenges to existing
detection methods, which are mostly passive, in that they clas-
sify an input video into the category of authentic or DeepFake.
Most of these methods struggle to achieve the levels of accu-
racy that would be needed to be incorporated into a practical
video-conferencing application and run in real-time. On the
other hand, new approaches based on active forensics, which
interfere with the generation process to improve detection
efficiency and accuracy, e.g., [
6
,
7
], are gaining momentum
recently. In particular, the work of [
7
] exploits the unique con-
strained environment afforded by a video-conferencing call
to detect real-time DeepFakes by varying the lighting pattern
on screen and extracting the same lighting variation from the
attendant’s face. As the current real-time DeepFake synthesis
methods are not sufficiently adaptable to capture such subtle
changes, the lack of consistent lighting variation can be used
as a telltale sign of synthesis and impersonation. However,
controlling and estimating the subtle change of screen lighting
may not be reliable as it can be affected by other environmental
factors, such as the ambient light, room setting, and makeup.
In this work, we describe a new active forensic approach
to exposing real-time DeepFakes. The main idea is illustrated
in Fig. 1. This method can be initiated by a call participant
arXiv:2210.14153v1 [cs.CV] 21 Oct 2022
or directly integrated into the video-call client
1
. First, we
briefly display a distinct pattern, which will be referred to as
the probing pattern, on the shared screen during an ongoing
video call. The image of the attendant’s face will be captured
by the camera, and we will focus on the cornea areas. As the
attendant sits before the camera in a video call and the human
cornea is mirror-like and highly reflective, the probing pattern
on the screen should leave a reflective image on the cornea
that can subsequently be extracted from the face image and
compared with the probing pattern. We provide an automatic
pipeline to display the probing pattern, capture the face image,
extract the cornea reflections, and compare with the original
probing pattern. Our experiments with several state-of-the-art
real-time DeepFake synthesis models show that they cannot
recreate the probing pattern at the synthesized cornea region
at all in a variety of real-world settings. Compared with the
work in [
7
], our active detection method is less limited by the
lighting environment. In addition, our method does not rely on
complicated trained models, which allows use in a real-time
video conferencing environment easily. On the other hand, our
method can reliably extract and compare probing patterns to
authenticate real persons under a range of imaging scenarios
and validate this approach.
2. RELATED WORKS
Real-time DeepFake Synthesis.
DeepFakes are made for
real-time synthesis in recent years. DeepFaceLive [
2
] was pro-
posed to DeepFake in the real video-conferencing scenario. It
obtains higher visual quality and real-time speed that could be
used in practice. Using the DeepFaceLive, the users can swap
their faces from a real webcam using trained face-swapping
models in real-time. The generated fake screen in the Deep-
FaceLive software can be passed to the video-conferencing
software via virtual camera software (e.g., OBS-VirtualCam
[
2
]). For example, in the Zoom software [
8
], the host can select
to use a virtual camera instead of the actual camera to display
the fake screen from the DeepFaceLive in the Zoom meeting.
Examples of running DeepFaceLive in a Zoom meeting are
shown in Fig. 2.
DeepFake Detection Using Eye Biometrics.
Biometric cues
from the eyes have been used for the detection of GAN-
generated still images [
9
,
10
,
11
,
12
,
13
,
14
]. The work [
10
]
uses the inconsistency of corneal specular highlights in the two
eyes to identify AI-synthesised faces. More recently, the work
[
11
] spot the AI-synthesised faces by detecting the inconsis-
tency of pupil shapes. These methods are further extended in
[
12
] by using an attention-based robust deep network, where
the inconsistent components and artifacts in the iris region of
the GAN-face are highlighted in the attention maps clearly.
Although effective in exposing GAN-generated faces in high-
1
We assume that a consensus can be obtained from the attendants to use
their imagery for authentication purposes without privacy issues. This would
be the same agreement required when the video call is live recorded.
Fig. 2
:Examples of video-conferencing DeepFake using Deep-
FaceLive [
2
]. For each pair,
Left:
The template Faces,
Right:
The DeepFakes.
resolution still images in a passive setting, these methods may
not work to catch real-time DeepFake videos that are used in
video conferences.
Active Detection of DeepFakes.
The active detection for
DeepFakes differs from the existing detection methods [
15
] in
that it interferes with the generation process to make detection
easier. Early work in [
6
] obstructs the DeepFake generation
by attacking a key step of the generation pipeline, i.e., fa-
cial landmark extraction. The method generates adversarial
perturbations [
16
] to disrupt the facial landmark extraction,
such that the DeepFake models cannot locate the real face to
swap. Active illumination artifacts are studied for exploring
the DeepFakes. For example, the work [
17
] shows that the cor-
respondence of brightness of the facial appearance in different
active illumination can be used as a signal for active Deep-
Fakes detection. Motivated by this work, [
7
] proposed a new
active method for video-conferencing DeepFakes detection
using active illumination.
3. METHOD
The overall process of our method is shown in Fig. 3. In a
standard video conference setting, a person sits in front of a
laptop computer, and her eyes are captured by the webcam,
Fig. 3(a). To verify if the attendant(s) is indeed a real person
instead of a synthesis from real-time DeepFake models, the
host will briefly put up the probing pattern on the shared
screen. The probing pattern is a simple geometric shape on a
white background to have good contrast. We expect the real
attendants’ eyes have reflections of the probing pattern, while
a real-time DeepFake will not. We first capture an image of
the attendant’s face and then run a face detector and extract
facial landmarks using Dlib [
18
], Fig. 3(b). From the facial
landmarks, we localize the eye region, Fig. 3(c), and then
segment out the iris part using an edge detector and the Hough
transform as in [
10
], Fig. 3(d). The segmented iris images
are then passed to the template matching steps for automatic
DeepFake detection, Fig. 3(e), where we compare the corneal
reflection with the probing pattern. The matching of the two
indicates a real person, and the lack of matching suggests a
possible real-time DeepFake impersonation.
摘要:

DETECTIONOFREAL-TIMEDEEPFAKESINVIDEOCONFERENCINGWITHACTIVEPROBINGANDCORNEALREFLECTIONHuiGuo,XinWang,SiweiLyuDepartmentofComputerScienceandEngineeringUniversityatBuffalo,StateUniversityofNewYork,USA.fhguo8,xwang264,siweilyug@buffalo.eduABSTRACTTheCOVIDpandemichasledtothewideadoptionofonlinevideocalls...

展开>> 收起<<
DETECTION OF REAL-TIME DEEPFAKES IN VIDEO CONFERENCING WITH ACTIVE PROBING AND CORNEAL REFLECTION Hui Guo Xin Wang Siwei Lyu_2.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:6 页 大小:4.17MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注