DigiFace-1M 1 Million Digital Face Images for Face Recognition Gwangbin Bae University of Cambridge

2025-04-27 0 0 2.41MB 10 页 10玖币

侵权投诉

DigiFace-1M: 1 Million Digital Face Images for Face Recognition

Gwangbin Bae

University of Cambridge

gb585@cam.ac.uk

Martin de La Gorce

Microsoft

madelago@microsoft.com

Tadas Baltruˇ

saitis

Microsoft

tabaltru@microsoft.com

Charlie Hewitt

Microsoft

chewitt@microsoft.com

Dong Chen

Microsoft

doch@microsoft.com

Julien Valentin

Microsoft

juvalen@microsoft.com

Roberto Cipolla

University of Cambridge

rc10001@cam.ac.uk

Jingjing Shen

Microsoft

jinshen@microsoft.com

Abstract

State-of-the-art face recognition models show impressive

accuracy, achieving over 99.8% on Labeled Faces in the

Wild (LFW) dataset. Such models are trained on large-scale

datasets that contain millions of real human face images

collected from the internet. Web-crawled face images are

severely biased (in terms of race, lighting, make-up, etc)

and often contain label noise. More importantly, the face

images are collected without explicit consent, raising ethi-

cal concerns. To avoid such problems, we introduce a large-

scale synthetic dataset for face recognition, obtained by

rendering digital faces using a computer graphics pipeline1.

We ﬁrst demonstrate that aggressive data augmentation can

signiﬁcantly reduce the synthetic-to-real domain gap. Hav-

ing full control over the rendering pipeline, we also study

how each attribute (e.g., variation in facial pose, acces-

sories and textures) affects the accuracy. Compared to Syn-

Face, a recent method trained on GAN-generated synthetic

faces, we reduce the error rate on LFW by 52.5% (accu-

racy from 91.93% to 96.17%). By ﬁne-tuning the network

on a smaller number of real face images that could reason-

ably be obtained with consent, we achieve accuracy that is

comparable to the methods trained on millions of real face

images.

1. Introduction

Learning-based face recognition models [29, 23, 33, 35,

8, 15, 24, 18] use Deep Neural Networks (DNNs) to encode

the given face image into an embedding vector of ﬁxed di-

1DigiFace-1M dataset can be downloaded from https://github.

com/microsoft/DigiFace1M

mension (e.g., 512). These embeddings can then be used for

various tasks, such as face identiﬁcation (who is this person)

and veriﬁcation (are they the same person). To learn diverse,

discriminative embeddings, the training dataset should con-

tain a large number of unique identities. To learn robust

embeddings, i.e., which are not sensitive to the changes

in pose, expression, accessories, camera and lighting, the

dataset should also contain a sufﬁcient number of images

per identity with these variations.

Publicly available face recognition datasets satisfy both.

MS1MV2 [8] contains 5.8M images of 85K identities

(approx. 68 images per ID). Recently released Web-

Face260M [43] contains 260M images of 4M identities (ap-

prox. 65 images per ID). While such datasets have driven

recent advances in face recognition models, there are sev-

eral problems associated with them.

(1) Ethical issues. Large-scale face recognition datasets

are often criticized for ethical issues including privacy vi-

olation and the lack of informed consent. For example,

datasets like [39, 12, 8, 43] are obtained by crawling web

images of celebrities without consent. To increase the num-

ber of identities, some datasets exploited the term “celebri-

ties” to include anyone with online presence. Datasets like

[17, 26] collected face images of the general public (includ-

ing children) from Flickr [3]. Projects like MegaPixels [4]

are exposing the ethical problems of such web-crawled face

recognition datasets. Following severe criticism, public ac-

cess to several datasets has been removed [2].

(2) Label noise. Web images collected by searching the

names of celebrities often contain label errors. For example,

the Labeled Faces in the Wild (LFW) dataset [14] contains

several known errors including: (1) mislabeled images; (2)

distinct persons with the same name labeled as the same per-

arXiv:2210.02579v1 [cs.CV] 5 Oct 2022

Figure 1. Examples of synthetic face images in our dataset. Our dataset captures a wide variety of facial geometry, pose, textures, expres-

sions, accessories and environments.

son; and (3) the same person that goes by different names

labeled as different persons.

(3) Data bias. Face recognition models are generally

trained and tested on celebrity faces, many of which

are taken with strong lighting and make-up. Celebrity

faces also have imbalanced racial distribution (e.g., 84.5%

of the faces in CASIA-WebFace [39] are Caucasian

faces [34]), leading to poor recognition accuracy for the

under-represented racial groups [34].

In order to circumvent all these issues that affect the ex-

isting real face datasets, we introduce a new large-scale face

recognition dataset consisting only of photo-realistic digital

face images rendered using a computer graphics pipeline

and make this dataset available to the community. Speciﬁ-

cally, we build upon the face generation pipeline introduced

by Wood et al. [36], tailoring the amount of variability for

each attribute (e.g., pose and accessories) for our recogni-

tion task, and generate 1.22M images with 110K unique

identities. Each identity is generated by randomizing the

facial geometry and texture as well as the hair style. The

generated face is then rendered with different poses, ex-

pressions, hair color, hair thickness and density, accessories

(including clothes, make-ups, glasses, and head/face wear),

cameras and environments, to encourage the network to

learn a robust embedding. Figure 1 shows examples of syn-

thetic face images in this new dataset. We generated 1.22M

images, but in practice the number of identities and images

you can generate with synthetics pipeline is only limited by

the cost of generating and storing these images.

Digital synthetic faces can solve the aforementioned

problems associated with the real face datasets. Firstly, the

generated faces are free of label noise. Secondly, the bias in

lighting, make-up and skin color can be reduced as we have

full control over those attributes. Most importantly, the face

generation pipeline does not rely on any privacy-sensitive

data obtained without consent.

This is a critical difference from the GAN-generated syn-

thetic faces; face GANs rely (either directly or indirectly) on

large-scale real face datasets to train some components of

their pipeline, leaving unresolved ethical problems. For ex-

ample, a recent method called SynFace [28] was trained on

synthetic faces generated using DiscoFaceGAN [9]. While

the generated face images are free of label noise, millions

of real face images were used for training DiscoFaceGAN.

The GANs may also inherit any bias that exists in the real

face images used to train them. For our dataset, only 511

face scans, obtained with consent, were used to build a

parametric model of face geometry and texture library [36].

From this limited source data, we can generate inﬁnite num-

ber of identities, making our approach easily scalable.

Our contributions can be summarized as below:

• We release a new large-scale synthetic dataset for face

recognition that is free from privacy violations and lack

of consent. To the best of our knowledge, our dataset,

containing 1.22M images of 110K identities, is the largest

public synthetic dataset for face recognition.

• Compared to SynFace [28], which is trained on GAN-

generated faces, we reduce the error rate on LFW by

52.5% (accuracy from 91.93% to 96.17%). For ﬁve popu-

lar benchmarks [14, 30, 41, 25, 42], the average error rate

is reduced by 46.0% (accuracy from 74.75% to 86.37%).

• We demonstrate how the proposed synthetic dataset can

be used in conjunction with a small number of real face

images to substantially improve the accuracy. This sim-

ulates a scenario where a small number of curated (i.e.,

no label noise and reduced bias) real face images are col-

lected with consent. By ﬁne-tuning our network with only

120K real face images (i.e., 2% of the commonly-used

MS1MV2 dataset [8]), we achieve 99.33% accuracy on

LFW and 93.61% on average across the ﬁve benchmarks,

which is comparable to the methods trained on millions

of real face images.

• Having full control over the rendering pipeline, we per-

form extensive experiments to study how each attribute

(e.g., variation in facial pose, accessories and textures)

affects the face recognition accuracy.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DigiFace-1M:1MillionDigitalFaceImagesforFaceRecognitionGwangbinBaeUniversityofCambridgegb585@cam.ac.ukMartindeLaGorceMicrosoftmadelago@microsoft.comTadasBaltruˇsaitisMicrosofttabaltru@microsoft.comCharlieHewittMicrosoftchewitt@microsoft.comDongChenMicrosoftdoch@microsoft.comJulienValentinMicrosoftju...

展开>> 收起<<

DigiFace-1M 1 Million Digital Face Images for Face Recognition Gwangbin Bae University of Cambridge.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DigiFace-1M 1 Million Digital Face Images for Face Recognition Gwangbin Bae University of Cambridge

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: