
BOYNE, CHARLES, CIPOLLA: FIND - 3D MODEL OF ARTICULATED HUMAN FEET 3
on much smaller, custom datasets, often due to the time and expense in collecting the data.
For examples of the difficulties involved, analogous generative quadruped models have been
constructed from scans of toys of real animals [41] (as real animals don’t keep still long
enough), and parametric hand models have been constructed from 3D scans [32] and even
at larger expense from MRI scans [19]. These models typically use Principal Component
Analysis (PCA) to compose a small set of parameters which control linear offsets from a
template mesh. We show here, that our implicit surface deformation field performs better
against this type of strong PCA baseline on feet.
Implicit shape representations. Recent interest has shifted towards implicit representa-
tions of shape - functions, often deep networks, that take in coordinates corresponding to
a point in space, and return spacial information relating to an object - such as whether
the point in space is inside the object [8], or how near that point is to the object’s surface
[29]. These methods are naturally differentiable and can represent scenes at arbitrary scales,
so lend themselves to many reconstruction tasks - for example, reconstructing meshes from
point clouds [11] or single view images [36]. One particular method, Neural Radiance Fields
(NeRFs) [25], has become increasingly popular due to the very high fidelity in which it can
synthesis novel views of a scene. Many extensions to NeRF have investigated how to param-
eterise shape and texture for generic object classes [14], decompose scenes into individual,
controllable components [27], and directly make edits to geometry and texture [21]. This
has also been extended to posed humans, with several works [34,37] showing promise in
modelling dynamic human models using NeRF architectures. We take inspiration from this
approach and, similar to [24], build our implicit foot model FIND by sampling points on the
surface of a manifold, which in our case are vertices of a high resolution template foot mesh,
and learn a parameterised deformation field for deforming vertices to different types of foot
shapes and poses.
Unsupervised representation learning for 3D objects. Generative Adversarial Networks
(GANs) [10] have shown promise in learning unsupervised representations of 2D data, even
learning information about the underlying 3D geometry [28]. The intermediate feature rep-
resentations are sufficiently powerful that Zhang et al. [39] showed they can be trained to
solve downstream tasks, such as semantic segmentations, with very simple MLP classifiers
and an incredibly small amount of ground truth labels. We leverage this for our novel unsu-
pervised part-based loss, allowing us to learn foot part classes in 3D (from 2D images) to aid
fitting to 2D images.
Foot reconstruction. Solutions exist for high resolution, accurate 3D scanners [1,3,4].
These are expensive and difficult to use outside of specific environments, so do not lend
themselves to the average consumer. While PCA parameterised foot models do exist [6],
many existing foot reconstruction solutions instead directly predict an unconstrained mesh
from pointclouds or depth maps [23]. Large scale, proprietary datasets of feet, scanned at
low resolution [15,16] only release the population measurement statistics to the public. For
our work, we build a new foot scan dataset, Foot3D, for model building and evaluation,
which we release to the community.