BOYNE CHARLES CIPOLLA FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1 FIND An Unsupervised Implicit 3D Model of Articulated Human Feet

2025-04-30 0 0 9MB 17 页 10玖币
侵权投诉
BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1
FIND: An Unsupervised Implicit 3D Model of
Articulated Human Feet
Oliver Boyne
ob312@cam.ac.uk
James Charles
http://www.jjcvision.com
Roberto Cipolla
cipolla@eng.cam.ac.uk
Machine Intelligence Lab
Department of Engineering
University of Cambridge
Cambridge, U.K.
Abstract
In this paper we present a high fidelity and articulated 3D human foot model. The
model is parameterised by a disentangled latent code in terms of shape, texture and ar-
ticulated pose. While high fidelity models are typically created with strong supervision
such as 3D keypoint correspondences or pre-registration, we focus on the difficult case
of little to no annotation. To this end, we make the following contributions: (i) we de-
velop a Foot Implicit Neural Deformation field model, named FIND, capable of tailoring
explicit meshes at any resolution i.e. for low or high powered devices; (ii) an approach
for training our model in various modes of weak supervision with progressively better
disentanglement as more labels, such as pose categories, are provided; (iii) a novel unsu-
pervised part-based loss for fitting our model to 2D images which is better than traditional
photometric or silhouette losses; (iv) finally, we release a new dataset of high resolution
3D human foot scans, Foot3D. On this dataset, we show our model outperforms a strong
PCA implementation trained on the same data in terms of shape quality and part cor-
respondences, and that our novel unsupervised part-based loss improves inference on
images.
1 Introduction
Shape reconstruction from single-view or few-view images is a vital but difficult computer
vision task. Current approaches are dependent on strong priors to help constrain this ill-posed
problem, usually in the form of well constructed and parameterised 3D models [22,41]. The
process of model creation is also often time consuming, requiring a lot of supervision e.g.
registration and correspondence annotation. For the case of human feet, the current state-of-
the-art in reconstruction involves the use of expensive scanning equipment [1,3,4], unavail-
able to the average consumer at home. Thus, there is growing interest for easier methods of
foot reconstruction, particularly for home health monitoring, custom orthotics and the grow-
ing online shoe retail industry. As such, these models must be capable of running on low
powered devices such as mobile phones, but also displayable at high resolution e.g. for a
doctor’s inspection. To this end, we develop FIND, a Foot Implicit Neural Deformation field
© 2022. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.
arXiv:2210.12241v3 [cs.CV] 24 Mar 2025
2BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET
Figure 1: We produce (1) a coordinate based multilayer perceptron (MLP) architecture,
FIND, which receives a 3D vertex position on a template mesh, and a shape, pose and texture
embedding, and predicts a corresponding vertex deformation and colour. (2) We train our
model with 3D based losses against ground truth scans, and learn a sensible pose space us-
ing a contrastive loss (3) We use differentiable rendering at inference to enforce a silhouette
based loss, in addition to our novel unsupervised part-based loss.
model, which is easy to train as it requires little to no labels and can produce explicit meshes
at tailored resolutions, useful for targeting specific hardware or needs. We can improve the
model further by introducing minimal extra supervision: (i) we add the weak label of foot
identities to disentangle shape and pose; and (ii) pose descriptions to produce a more inter-
pretable pose latent space.
We outline our key contributions as follows: (1) a novel implicit foot model, FIND, ca-
pable of producing high fidelity human feet and fully paramaterised by a disentangled latent
space of shape, texture and pose. The model can also be tuned to specific hardware consider-
ations or needs; (2) an approach to train the model under various levels of weak supervision,
with stronger supervision producing better disentanglement; (3) a novel, unsupervised part-
based loss for fitting FIND to images using unsupervised feature learning; and (4) we release
Foot3D, a dataset of high resolution foot scans, providing shape and texture information for
training of 3D foot models.
2 Related work
Generative shape models. These models endeavour to capture the distribution of an object
category’s shape in a set of controllable parameters e.g. height, width, pose, etc. Much of the
work in generative shape modelling looks at human bodies where a number of parameterised
models have been built [22,40]. For construction, the models are typically trained with
strong supervision via 2D annotation [7,20] of rich 3D scans [13,35]. The building of
generative models of more niche object categories often uses similar principles, but usually
BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET 3
on much smaller, custom datasets, often due to the time and expense in collecting the data.
For examples of the difficulties involved, analogous generative quadruped models have been
constructed from scans of toys of real animals [41] (as real animals don’t keep still long
enough), and parametric hand models have been constructed from 3D scans [32] and even
at larger expense from MRI scans [19]. These models typically use Principal Component
Analysis (PCA) to compose a small set of parameters which control linear offsets from a
template mesh. We show here, that our implicit surface deformation field performs better
against this type of strong PCA baseline on feet.
Implicit shape representations. Recent interest has shifted towards implicit representa-
tions of shape - functions, often deep networks, that take in coordinates corresponding to
a point in space, and return spacial information relating to an object - such as whether
the point in space is inside the object [8], or how near that point is to the object’s surface
[29]. These methods are naturally differentiable and can represent scenes at arbitrary scales,
so lend themselves to many reconstruction tasks - for example, reconstructing meshes from
point clouds [11] or single view images [36]. One particular method, Neural Radiance Fields
(NeRFs) [25], has become increasingly popular due to the very high fidelity in which it can
synthesis novel views of a scene. Many extensions to NeRF have investigated how to param-
eterise shape and texture for generic object classes [14], decompose scenes into individual,
controllable components [27], and directly make edits to geometry and texture [21]. This
has also been extended to posed humans, with several works [34,37] showing promise in
modelling dynamic human models using NeRF architectures. We take inspiration from this
approach and, similar to [24], build our implicit foot model FIND by sampling points on the
surface of a manifold, which in our case are vertices of a high resolution template foot mesh,
and learn a parameterised deformation field for deforming vertices to different types of foot
shapes and poses.
Unsupervised representation learning for 3D objects. Generative Adversarial Networks
(GANs) [10] have shown promise in learning unsupervised representations of 2D data, even
learning information about the underlying 3D geometry [28]. The intermediate feature rep-
resentations are sufficiently powerful that Zhang et al. [39] showed they can be trained to
solve downstream tasks, such as semantic segmentations, with very simple MLP classifiers
and an incredibly small amount of ground truth labels. We leverage this for our novel unsu-
pervised part-based loss, allowing us to learn foot part classes in 3D (from 2D images) to aid
fitting to 2D images.
Foot reconstruction. Solutions exist for high resolution, accurate 3D scanners [1,3,4].
These are expensive and difficult to use outside of specific environments, so do not lend
themselves to the average consumer. While PCA parameterised foot models do exist [6],
many existing foot reconstruction solutions instead directly predict an unconstrained mesh
from pointclouds or depth maps [23]. Large scale, proprietary datasets of feet, scanned at
low resolution [15,16] only release the population measurement statistics to the public. For
our work, we build a new foot scan dataset, Foot3D, for model building and evaluation,
which we release to the community.
4BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET
T-Pose Toe flexion Medial rotation,
Dorsiflexion
Toe Abduction Eversion
Figure 2: 5 feet from our dataset and corresponding pose descriptions. The pose types are
based on foot articulation described in foot anatomy literature [12] - further details in the
supplementary
Figure 3: Our implicit surface field is an MLP which takes as input a 3D point queried on a
template mesh’s surface, and texture, shape and pose embeddings, and provides as output a
vertex colour value and displacement.
3 Method
3.1 Foot3D Dataset
We produce Foot3D, a dataset of high resolution, textured 3D human feet. For acquisition,
we use an Artec Leo 3D scanner [1], which has a 3D point accuracy of up to 0.1mm. A
total of 61 scans of the left feet on 34 subjects in a variety of poses was collect. To capture
the entire surface of the foot, subjects would sit with their leg on a table for stability, and
their foot suspended over the edge, this allowed the scanner to view the entire foot surface.
Subjects then hold a static pose for approximately 2 minutes while the scan takes place.
Details of the nature of the articulation requested can be found in the supplementary.
The raw data is then processed in Artec Studio 16 [2] to produce 100K polygon meshes.
These meshes were cut-off at an approximate position on the shin, by a plane approximately
perpendicular to the leg. Next, we leverage this slice-plane as a basis for loosely registering
all of the meshes in the dataset, such that these planes are parallel to the XY-plane and a
vector from the slice plane to the foot’s centroid lies in the X direction. We then slice each
foot at a uniform height above the heel to provide a consistent ankle length. Figure 3.1 shows
dataset samples, and further details about this process are provided in the supplementary.
3.2 The FIND model
The FIND model is controlled using a shape, pose and texture embedding to produce a
surface deformation and colour field over a template mesh. This function is implemented as
an implicit coordinate based neural network. The architecture is now explained in detail.
摘要:

BOYNE,CHARLES,CIPOLLA:FIND-3DMODELOFARTICULATEDHUMANFEET1FIND:AnUnsupervisedImplicit3DModelofArticulatedHumanFeetOliverBoyneob312@cam.ac.ukJamesCharleshttp://www.jjcvision.comRobertoCipollacipolla@eng.cam.ac.ukMachineIntelligenceLabDepartmentofEngineeringUniversityofCambridgeCambridge,U.K.AbstractIn...

展开>> 收起<<
BOYNE CHARLES CIPOLLA FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1 FIND An Unsupervised Implicit 3D Model of Articulated Human Feet.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:9MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注