DigiFace-1M 1 Million Digital Face Images for Face Recognition Gwangbin Bae University of Cambridge

2025-04-27 0 0 2.41MB 10 页 10玖币
侵权投诉
DigiFace-1M: 1 Million Digital Face Images for Face Recognition
Gwangbin Bae
University of Cambridge
gb585@cam.ac.uk
Martin de La Gorce
Microsoft
madelago@microsoft.com
Tadas Baltruˇ
saitis
Microsoft
tabaltru@microsoft.com
Charlie Hewitt
Microsoft
chewitt@microsoft.com
Dong Chen
Microsoft
doch@microsoft.com
Julien Valentin
Microsoft
juvalen@microsoft.com
Roberto Cipolla
University of Cambridge
rc10001@cam.ac.uk
Jingjing Shen
Microsoft
jinshen@microsoft.com
Abstract
State-of-the-art face recognition models show impressive
accuracy, achieving over 99.8% on Labeled Faces in the
Wild (LFW) dataset. Such models are trained on large-scale
datasets that contain millions of real human face images
collected from the internet. Web-crawled face images are
severely biased (in terms of race, lighting, make-up, etc)
and often contain label noise. More importantly, the face
images are collected without explicit consent, raising ethi-
cal concerns. To avoid such problems, we introduce a large-
scale synthetic dataset for face recognition, obtained by
rendering digital faces using a computer graphics pipeline1.
We first demonstrate that aggressive data augmentation can
significantly reduce the synthetic-to-real domain gap. Hav-
ing full control over the rendering pipeline, we also study
how each attribute (e.g., variation in facial pose, acces-
sories and textures) affects the accuracy. Compared to Syn-
Face, a recent method trained on GAN-generated synthetic
faces, we reduce the error rate on LFW by 52.5% (accu-
racy from 91.93% to 96.17%). By fine-tuning the network
on a smaller number of real face images that could reason-
ably be obtained with consent, we achieve accuracy that is
comparable to the methods trained on millions of real face
images.
1. Introduction
Learning-based face recognition models [29, 23, 33, 35,
8, 15, 24, 18] use Deep Neural Networks (DNNs) to encode
the given face image into an embedding vector of fixed di-
1DigiFace-1M dataset can be downloaded from https://github.
com/microsoft/DigiFace1M
mension (e.g., 512). These embeddings can then be used for
various tasks, such as face identification (who is this person)
and verification (are they the same person). To learn diverse,
discriminative embeddings, the training dataset should con-
tain a large number of unique identities. To learn robust
embeddings, i.e., which are not sensitive to the changes
in pose, expression, accessories, camera and lighting, the
dataset should also contain a sufficient number of images
per identity with these variations.
Publicly available face recognition datasets satisfy both.
MS1MV2 [8] contains 5.8M images of 85K identities
(approx. 68 images per ID). Recently released Web-
Face260M [43] contains 260M images of 4M identities (ap-
prox. 65 images per ID). While such datasets have driven
recent advances in face recognition models, there are sev-
eral problems associated with them.
(1) Ethical issues. Large-scale face recognition datasets
are often criticized for ethical issues including privacy vi-
olation and the lack of informed consent. For example,
datasets like [39, 12, 8, 43] are obtained by crawling web
images of celebrities without consent. To increase the num-
ber of identities, some datasets exploited the term “celebri-
ties” to include anyone with online presence. Datasets like
[17, 26] collected face images of the general public (includ-
ing children) from Flickr [3]. Projects like MegaPixels [4]
are exposing the ethical problems of such web-crawled face
recognition datasets. Following severe criticism, public ac-
cess to several datasets has been removed [2].
(2) Label noise. Web images collected by searching the
names of celebrities often contain label errors. For example,
the Labeled Faces in the Wild (LFW) dataset [14] contains
several known errors including: (1) mislabeled images; (2)
distinct persons with the same name labeled as the same per-
arXiv:2210.02579v1 [cs.CV] 5 Oct 2022
Figure 1. Examples of synthetic face images in our dataset. Our dataset captures a wide variety of facial geometry, pose, textures, expres-
sions, accessories and environments.
son; and (3) the same person that goes by different names
labeled as different persons.
(3) Data bias. Face recognition models are generally
trained and tested on celebrity faces, many of which
are taken with strong lighting and make-up. Celebrity
faces also have imbalanced racial distribution (e.g., 84.5%
of the faces in CASIA-WebFace [39] are Caucasian
faces [34]), leading to poor recognition accuracy for the
under-represented racial groups [34].
In order to circumvent all these issues that affect the ex-
isting real face datasets, we introduce a new large-scale face
recognition dataset consisting only of photo-realistic digital
face images rendered using a computer graphics pipeline
and make this dataset available to the community. Specifi-
cally, we build upon the face generation pipeline introduced
by Wood et al. [36], tailoring the amount of variability for
each attribute (e.g., pose and accessories) for our recogni-
tion task, and generate 1.22M images with 110K unique
identities. Each identity is generated by randomizing the
facial geometry and texture as well as the hair style. The
generated face is then rendered with different poses, ex-
pressions, hair color, hair thickness and density, accessories
(including clothes, make-ups, glasses, and head/face wear),
cameras and environments, to encourage the network to
learn a robust embedding. Figure 1 shows examples of syn-
thetic face images in this new dataset. We generated 1.22M
images, but in practice the number of identities and images
you can generate with synthetics pipeline is only limited by
the cost of generating and storing these images.
Digital synthetic faces can solve the aforementioned
problems associated with the real face datasets. Firstly, the
generated faces are free of label noise. Secondly, the bias in
lighting, make-up and skin color can be reduced as we have
full control over those attributes. Most importantly, the face
generation pipeline does not rely on any privacy-sensitive
data obtained without consent.
This is a critical difference from the GAN-generated syn-
thetic faces; face GANs rely (either directly or indirectly) on
large-scale real face datasets to train some components of
their pipeline, leaving unresolved ethical problems. For ex-
ample, a recent method called SynFace [28] was trained on
synthetic faces generated using DiscoFaceGAN [9]. While
the generated face images are free of label noise, millions
of real face images were used for training DiscoFaceGAN.
The GANs may also inherit any bias that exists in the real
face images used to train them. For our dataset, only 511
face scans, obtained with consent, were used to build a
parametric model of face geometry and texture library [36].
From this limited source data, we can generate infinite num-
ber of identities, making our approach easily scalable.
Our contributions can be summarized as below:
We release a new large-scale synthetic dataset for face
recognition that is free from privacy violations and lack
of consent. To the best of our knowledge, our dataset,
containing 1.22M images of 110K identities, is the largest
public synthetic dataset for face recognition.
Compared to SynFace [28], which is trained on GAN-
generated faces, we reduce the error rate on LFW by
52.5% (accuracy from 91.93% to 96.17%). For five popu-
lar benchmarks [14, 30, 41, 25, 42], the average error rate
is reduced by 46.0% (accuracy from 74.75% to 86.37%).
We demonstrate how the proposed synthetic dataset can
be used in conjunction with a small number of real face
images to substantially improve the accuracy. This sim-
ulates a scenario where a small number of curated (i.e.,
no label noise and reduced bias) real face images are col-
lected with consent. By fine-tuning our network with only
120K real face images (i.e., 2% of the commonly-used
MS1MV2 dataset [8]), we achieve 99.33% accuracy on
LFW and 93.61% on average across the five benchmarks,
which is comparable to the methods trained on millions
of real face images.
Having full control over the rendering pipeline, we per-
form extensive experiments to study how each attribute
(e.g., variation in facial pose, accessories and textures)
affects the face recognition accuracy.
摘要:

DigiFace-1M:1MillionDigitalFaceImagesforFaceRecognitionGwangbinBaeUniversityofCambridgegb585@cam.ac.ukMartindeLaGorceMicrosoftmadelago@microsoft.comTadasBaltruˇsaitisMicrosofttabaltru@microsoft.comCharlieHewittMicrosoftchewitt@microsoft.comDongChenMicrosoftdoch@microsoft.comJulienValentinMicrosoftju...

展开>> 收起<<
DigiFace-1M 1 Million Digital Face Images for Face Recognition Gwangbin Bae University of Cambridge.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:2.41MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注