
or directly integrated into the video-call client
1
. First, we
briefly display a distinct pattern, which will be referred to as
the probing pattern, on the shared screen during an ongoing
video call. The image of the attendant’s face will be captured
by the camera, and we will focus on the cornea areas. As the
attendant sits before the camera in a video call and the human
cornea is mirror-like and highly reflective, the probing pattern
on the screen should leave a reflective image on the cornea
that can subsequently be extracted from the face image and
compared with the probing pattern. We provide an automatic
pipeline to display the probing pattern, capture the face image,
extract the cornea reflections, and compare with the original
probing pattern. Our experiments with several state-of-the-art
real-time DeepFake synthesis models show that they cannot
recreate the probing pattern at the synthesized cornea region
at all in a variety of real-world settings. Compared with the
work in [
7
], our active detection method is less limited by the
lighting environment. In addition, our method does not rely on
complicated trained models, which allows use in a real-time
video conferencing environment easily. On the other hand, our
method can reliably extract and compare probing patterns to
authenticate real persons under a range of imaging scenarios
and validate this approach.
2. RELATED WORKS
Real-time DeepFake Synthesis.
DeepFakes are made for
real-time synthesis in recent years. DeepFaceLive [
2
] was pro-
posed to DeepFake in the real video-conferencing scenario. It
obtains higher visual quality and real-time speed that could be
used in practice. Using the DeepFaceLive, the users can swap
their faces from a real webcam using trained face-swapping
models in real-time. The generated fake screen in the Deep-
FaceLive software can be passed to the video-conferencing
software via virtual camera software (e.g., OBS-VirtualCam
[
2
]). For example, in the Zoom software [
8
], the host can select
to use a virtual camera instead of the actual camera to display
the fake screen from the DeepFaceLive in the Zoom meeting.
Examples of running DeepFaceLive in a Zoom meeting are
shown in Fig. 2.
DeepFake Detection Using Eye Biometrics.
Biometric cues
from the eyes have been used for the detection of GAN-
generated still images [
9
,
10
,
11
,
12
,
13
,
14
]. The work [
10
]
uses the inconsistency of corneal specular highlights in the two
eyes to identify AI-synthesised faces. More recently, the work
[
11
] spot the AI-synthesised faces by detecting the inconsis-
tency of pupil shapes. These methods are further extended in
[
12
] by using an attention-based robust deep network, where
the inconsistent components and artifacts in the iris region of
the GAN-face are highlighted in the attention maps clearly.
Although effective in exposing GAN-generated faces in high-
1
We assume that a consensus can be obtained from the attendants to use
their imagery for authentication purposes without privacy issues. This would
be the same agreement required when the video call is live recorded.
Fig. 2
:Examples of video-conferencing DeepFake using Deep-
FaceLive [
2
]. For each pair,
Left:
The template Faces,
Right:
The DeepFakes.
resolution still images in a passive setting, these methods may
not work to catch real-time DeepFake videos that are used in
video conferences.
Active Detection of DeepFakes.
The active detection for
DeepFakes differs from the existing detection methods [
15
] in
that it interferes with the generation process to make detection
easier. Early work in [
6
] obstructs the DeepFake generation
by attacking a key step of the generation pipeline, i.e., fa-
cial landmark extraction. The method generates adversarial
perturbations [
16
] to disrupt the facial landmark extraction,
such that the DeepFake models cannot locate the real face to
swap. Active illumination artifacts are studied for exploring
the DeepFakes. For example, the work [
17
] shows that the cor-
respondence of brightness of the facial appearance in different
active illumination can be used as a signal for active Deep-
Fakes detection. Motivated by this work, [
7
] proposed a new
active method for video-conferencing DeepFakes detection
using active illumination.
3. METHOD
The overall process of our method is shown in Fig. 3. In a
standard video conference setting, a person sits in front of a
laptop computer, and her eyes are captured by the webcam,
Fig. 3(a). To verify if the attendant(s) is indeed a real person
instead of a synthesis from real-time DeepFake models, the
host will briefly put up the probing pattern on the shared
screen. The probing pattern is a simple geometric shape on a
white background to have good contrast. We expect the real
attendants’ eyes have reflections of the probing pattern, while
a real-time DeepFake will not. We first capture an image of
the attendant’s face and then run a face detector and extract
facial landmarks using Dlib [
18
], Fig. 3(b). From the facial
landmarks, we localize the eye region, Fig. 3(c), and then
segment out the iris part using an edge detector and the Hough
transform as in [
10
], Fig. 3(d). The segmented iris images
are then passed to the template matching steps for automatic
DeepFake detection, Fig. 3(e), where we compare the corneal
reflection with the probing pattern. The matching of the two
indicates a real person, and the lack of matching suggests a
possible real-time DeepFake impersonation.