Photo-realistic 360Head Avatars in the Wild Stanislaw Szymanowicz12 Virginia Estellers1 Tadas Baltruˇ saitis1 and Matthew Johnson1

2025-05-02 0 0 3.59MB 8 页 10玖币
侵权投诉
Photo-realistic 360Head Avatars in the Wild
Stanislaw Szymanowicz1,2, Virginia Estellers1, Tadas Baltruˇsaitis1, and
Matthew Johnson1
1Microsoft
2University of Oxford
stan@robots.ox.ac.uk,
{virginia.estellers,tadas.baltrusaitis,matjoh}@microsoft.com
Abstract. Delivering immersive, 3D experiences for human communi-
cation requires a method to obtain 360photo-realistic avatars of hu-
mans. To make these experiences accessible to all, only commodity hard-
ware, like mobile phone cameras, should be necessary to capture the data
needed for avatar creation. For avatars to be rendered realistically from
any viewpoint, we require training images and camera poses from all
angles. However, we cannot rely on there being trackable features in the
foreground or background of all images for use in estimating poses, es-
pecially from the side or back of the head. To overcome this, we propose
a novel landmark detector trained on synthetic data to estimate cam-
era poses from 360mobile phone videos of a human head for use in a
multi-stage optimization process which creates a photo-realistic avatar.
We perform validation experiments with synthetic data and showcase
our method on 360avatars trained from mobile phone videos.
1 Introduction
Immersive interaction scenarios on Mixed Reality devices require rendering hu-
man avatars from all angles. To avoid the uncanny valley effect, these avatars
must have faces that are photo-realistic. It is likely that in the future virtual
spaces will become a ubiquitous part of every-day life, impacting everything
from a friendly gathering to obtaining a bank loan. For this reason we believe
high-quality, 360avatars should be affordable and accessible to all: created from
images captured by commodity hardware, e.g., from a handheld mobile phone,
without restrictions on the surrounding environment.
Obtaining data to train a 360photo-realistic avatar ‘in the wild’ is challeng-
ing due to the potential difficulty of camera registration: traditional Structure-
from-Motion pipelines rely on reliable feature matches of static objects across
different images. Prior work limits the captures to a 120frontal angle which
allows the use of textured planar objects that are amenable to traditional feature
detectors and descriptors (e.g., a book, markers, detailed wall decoration). How-
ever, in many 360captures from a mobile phone in unconstrained environments
neither the background nor the foreground can be depended upon to provide a
source of such matches.
Work done while at Microsoft.
arXiv:2210.11594v1 [cs.CV] 20 Oct 2022
2 S. Szymanowicz et al.
360ophone capture Dense landmarks 360oNeRF avatar, optimized camera poses
Fig. 1. Our system creates photo-realistic 360avatars from captures from a mobile
phone capture and without constraints on the environment. Cameras are registered
from full 360pose variation, and our multi-stage optimization pipeline allows for high
quality avatars.
There are several properties of 360captures in the wild which pose serious
challenges to camera registration and avatar model learning. First, the space be-
ing captured is likely to either have plain backgrounds (e.g., white walls) and/or
portions of the capture in which the background is an open space, leading to de-
focus blur and the inclusion of extraneous, potentially mobile objects (e.g., pets,
cars, other people). Second, in order to obtain the needed details on the face
and hair the foreground subject will likely occupy much of the frame. While
the face can provide some useful features for camera registration, its non-planar
nature combined with changes in appearance due to lighting effects make it less
than ideal. Further, while the back of the head can produce many features for
tracking the matching can become highly ambiguous due to issues with hair,
i.e., specular effects and repeated texture.
To address the challenges of 360captures we propose a multi-stage pipeline
to create 3D photo-realistic avatars from a mobile phone camera video. We
propose using head landmarks to estimate the camera pose. However, as most
facial landmark detectors are not reliable at oblique or backward-facing angles,
we propose using synthetic data to train landmark detectors capable of working
in full 360range. We use the predicted landmarks to provide initialization of
the 6DoF camera poses, for a system which jointly optimizes a simplified Neural
Radiance Field with the camera poses. Finally, we use the optimized camera
poses to train a high quality, photo-realistic NeRF of the subject.
The contributions of our work are three-fold: (1) a reliable system for camera
registration which only requires the presence of a human head in each photo,
(2) a demonstration of how to leverage synthetic data in a novel manner to
obtain a DNN capable of predicting landmark locations from all angles, and
(3) a multi-stage optimization pipeline which builds 360photo-realistic avatars
with high-frequency visual details using images obtained from a handheld mobile
phone ‘in the wild’.
摘要:

Photo-realistic360◦HeadAvatarsintheWildStanislawSzymanowicz1,2⋆,VirginiaEstellers1,TadasBaltruˇsaitis1,andMatthewJohnson11Microsoft2UniversityofOxfordstan@robots.ox.ac.uk,{virginia.estellers,tadas.baltrusaitis,matjoh}@microsoft.comAbstract.Deliveringimmersive,3Dexperiencesforhumancommuni-cationrequi...

展开>> 收起<<
Photo-realistic 360Head Avatars in the Wild Stanislaw Szymanowicz12 Virginia Estellers1 Tadas Baltruˇ saitis1 and Matthew Johnson1.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:3.59MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注