BOYNE CHARLES CIPOLLA FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1 FIND An Unsupervised Implicit 3D Model of Articulated Human Feet

2025-04-30 0 0 9MB 17 页 10玖币

侵权投诉

BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1

FIND: An Unsupervised Implicit 3D Model of

Articulated Human Feet

Oliver Boyne

ob312@cam.ac.uk

James Charles

http://www.jjcvision.com

Roberto Cipolla

cipolla@eng.cam.ac.uk

Machine Intelligence Lab

Department of Engineering

University of Cambridge

Cambridge, U.K.

Abstract

In this paper we present a high ﬁdelity and articulated 3D human foot model. The

model is parameterised by a disentangled latent code in terms of shape, texture and ar-

ticulated pose. While high ﬁdelity models are typically created with strong supervision

such as 3D keypoint correspondences or pre-registration, we focus on the difﬁcult case

of little to no annotation. To this end, we make the following contributions: (i) we de-

velop a Foot Implicit Neural Deformation ﬁeld model, named FIND, capable of tailoring

explicit meshes at any resolution i.e. for low or high powered devices; (ii) an approach

for training our model in various modes of weak supervision with progressively better

disentanglement as more labels, such as pose categories, are provided; (iii) a novel unsu-

pervised part-based loss for ﬁtting our model to 2D images which is better than traditional

photometric or silhouette losses; (iv) ﬁnally, we release a new dataset of high resolution

3D human foot scans, Foot3D. On this dataset, we show our model outperforms a strong

PCA implementation trained on the same data in terms of shape quality and part cor-

respondences, and that our novel unsupervised part-based loss improves inference on

images.

1 Introduction

Shape reconstruction from single-view or few-view images is a vital but difﬁcult computer

vision task. Current approaches are dependent on strong priors to help constrain this ill-posed

problem, usually in the form of well constructed and parameterised 3D models [22,41]. The

process of model creation is also often time consuming, requiring a lot of supervision e.g.

registration and correspondence annotation. For the case of human feet, the current state-of-

the-art in reconstruction involves the use of expensive scanning equipment [1,3,4], unavail-

able to the average consumer at home. Thus, there is growing interest for easier methods of

foot reconstruction, particularly for home health monitoring, custom orthotics and the grow-

ing online shoe retail industry. As such, these models must be capable of running on low

powered devices such as mobile phones, but also displayable at high resolution e.g. for a

doctor’s inspection. To this end, we develop FIND, a Foot Implicit Neural Deformation ﬁeld

It may be distributed unchanged freely in print or electronic forms.

arXiv:2210.12241v3 [cs.CV] 24 Mar 2025

2BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET

Figure 1: We produce (1) a coordinate based multilayer perceptron (MLP) architecture,

FIND, which receives a 3D vertex position on a template mesh, and a shape, pose and texture

embedding, and predicts a corresponding vertex deformation and colour. (2) We train our

model with 3D based losses against ground truth scans, and learn a sensible pose space us-

ing a contrastive loss (3) We use differentiable rendering at inference to enforce a silhouette

based loss, in addition to our novel unsupervised part-based loss.

model, which is easy to train as it requires little to no labels and can produce explicit meshes

at tailored resolutions, useful for targeting speciﬁc hardware or needs. We can improve the

model further by introducing minimal extra supervision: (i) we add the weak label of foot

identities to disentangle shape and pose; and (ii) pose descriptions to produce a more inter-

pretable pose latent space.

We outline our key contributions as follows: (1) a novel implicit foot model, FIND, ca-

pable of producing high ﬁdelity human feet and fully paramaterised by a disentangled latent

space of shape, texture and pose. The model can also be tuned to speciﬁc hardware consider-

ations or needs; (2) an approach to train the model under various levels of weak supervision,

with stronger supervision producing better disentanglement; (3) a novel, unsupervised part-

based loss for ﬁtting FIND to images using unsupervised feature learning; and (4) we release

Foot3D, a dataset of high resolution foot scans, providing shape and texture information for

training of 3D foot models.

2 Related work

Generative shape models. These models endeavour to capture the distribution of an object

category’s shape in a set of controllable parameters e.g. height, width, pose, etc. Much of the

work in generative shape modelling looks at human bodies where a number of parameterised

models have been built [22,40]. For construction, the models are typically trained with

strong supervision via 2D annotation [7,20] of rich 3D scans [13,35]. The building of

generative models of more niche object categories often uses similar principles, but usually

BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET 3

on much smaller, custom datasets, often due to the time and expense in collecting the data.

For examples of the difﬁculties involved, analogous generative quadruped models have been

constructed from scans of toys of real animals [41] (as real animals don’t keep still long

enough), and parametric hand models have been constructed from 3D scans [32] and even

at larger expense from MRI scans [19]. These models typically use Principal Component

Analysis (PCA) to compose a small set of parameters which control linear offsets from a

template mesh. We show here, that our implicit surface deformation ﬁeld performs better

against this type of strong PCA baseline on feet.

Implicit shape representations. Recent interest has shifted towards implicit representa-

tions of shape - functions, often deep networks, that take in coordinates corresponding to

a point in space, and return spacial information relating to an object - such as whether

the point in space is inside the object [8], or how near that point is to the object’s surface

[29]. These methods are naturally differentiable and can represent scenes at arbitrary scales,

so lend themselves to many reconstruction tasks - for example, reconstructing meshes from

point clouds [11] or single view images [36]. One particular method, Neural Radiance Fields

(NeRFs) [25], has become increasingly popular due to the very high ﬁdelity in which it can

synthesis novel views of a scene. Many extensions to NeRF have investigated how to param-

eterise shape and texture for generic object classes [14], decompose scenes into individual,

controllable components [27], and directly make edits to geometry and texture [21]. This

has also been extended to posed humans, with several works [34,37] showing promise in

modelling dynamic human models using NeRF architectures. We take inspiration from this

approach and, similar to [24], build our implicit foot model FIND by sampling points on the

surface of a manifold, which in our case are vertices of a high resolution template foot mesh,

and learn a parameterised deformation ﬁeld for deforming vertices to different types of foot

shapes and poses.

Unsupervised representation learning for 3D objects. Generative Adversarial Networks

(GANs) [10] have shown promise in learning unsupervised representations of 2D data, even

learning information about the underlying 3D geometry [28]. The intermediate feature rep-

resentations are sufﬁciently powerful that Zhang et al. [39] showed they can be trained to

solve downstream tasks, such as semantic segmentations, with very simple MLP classiﬁers

and an incredibly small amount of ground truth labels. We leverage this for our novel unsu-

pervised part-based loss, allowing us to learn foot part classes in 3D (from 2D images) to aid

ﬁtting to 2D images.

Foot reconstruction. Solutions exist for high resolution, accurate 3D scanners [1,3,4].

These are expensive and difﬁcult to use outside of speciﬁc environments, so do not lend

themselves to the average consumer. While PCA parameterised foot models do exist [6],

many existing foot reconstruction solutions instead directly predict an unconstrained mesh

from pointclouds or depth maps [23]. Large scale, proprietary datasets of feet, scanned at

low resolution [15,16] only release the population measurement statistics to the public. For

our work, we build a new foot scan dataset, Foot3D, for model building and evaluation,

which we release to the community.

4BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET

T-Pose Toe ﬂexion Medial rotation,

Dorsiﬂexion

Toe Abduction Eversion

Figure 2: 5 feet from our dataset and corresponding pose descriptions. The pose types are

based on foot articulation described in foot anatomy literature [12] - further details in the

supplementary

Figure 3: Our implicit surface ﬁeld is an MLP which takes as input a 3D point queried on a

template mesh’s surface, and texture, shape and pose embeddings, and provides as output a

vertex colour value and displacement.

3 Method

3.1 Foot3D Dataset

We produce Foot3D, a dataset of high resolution, textured 3D human feet. For acquisition,

we use an Artec Leo 3D scanner [1], which has a 3D point accuracy of up to 0.1mm. A

total of 61 scans of the left feet on 34 subjects in a variety of poses was collect. To capture

the entire surface of the foot, subjects would sit with their leg on a table for stability, and

their foot suspended over the edge, this allowed the scanner to view the entire foot surface.

Subjects then hold a static pose for approximately 2 minutes while the scan takes place.

Details of the nature of the articulation requested can be found in the supplementary.

The raw data is then processed in Artec Studio 16 [2] to produce 100K polygon meshes.

These meshes were cut-off at an approximate position on the shin, by a plane approximately

perpendicular to the leg. Next, we leverage this slice-plane as a basis for loosely registering

all of the meshes in the dataset, such that these planes are parallel to the XY-plane and a

vector from the slice plane to the foot’s centroid lies in the X direction. We then slice each

foot at a uniform height above the heel to provide a consistent ankle length. Figure 3.1 shows

dataset samples, and further details about this process are provided in the supplementary.

3.2 The FIND model

The FIND model is controlled using a shape, pose and texture embedding to produce a

surface deformation and colour ﬁeld over a template mesh. This function is implemented as

an implicit coordinate based neural network. The architecture is now explained in detail.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BOYNE,CHARLES,CIPOLLA:FIND-3DMODELOFARTICULATEDHUMANFEET1FIND:AnUnsupervisedImplicit3DModelofArticulatedHumanFeetOliverBoyneob312@cam.ac.ukJamesCharleshttp://www.jjcvision.comRobertoCipollacipolla@eng.cam.ac.ukMachineIntelligenceLabDepartmentofEngineeringUniversityofCambridgeCambridge,U.K.AbstractIn...

展开>> 收起<<

BOYNE CHARLES CIPOLLA FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1 FIND An Unsupervised Implicit 3D Model of Articulated Human Feet.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BOYNE CHARLES CIPOLLA FIND - 3D MODEL OF ARTICULATED HUMAN FEET 1 FIND An Unsupervised Implicit 3D Model of Articulated Human Feet

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: