Human Body Measurement Estimation with Adversarial Augmentation Nataniel Ruiz2yMiriam Bellver1Timo Bolkart1Ambuj Arora1 Ming C. Lin1Javier Romero3yRaja Bala1

2025-04-24 0 0 3.77MB 14 页 10玖币
侵权投诉
Human Body Measurement Estimation with Adversarial Augmentation
Nataniel Ruiz2Miriam Bellver1Timo Bolkart1Ambuj Arora1
Ming C. Lin1Javier Romero3Raja Bala1
1Amazon 2Boston University 3Reality Labs Research
nruiz9@bu.edu {mbellver, timbolka, ambarora, minglinz, rajabl}@amazon.com
Abstract
We present a Body Measurement network (BMnet) for esti-
mating 3D anthropomorphic measurements of the human
body shape from silhouette images. Training of BMnet
is performed on data from real human subjects, and
augmented with a novel adversarial body simulator (ABS)
that finds and synthesizes challenging body shapes.
ABS is based on the skinned multiperson linear (SMPL)
body model, and aims to maximize BMnet measurement
prediction error with respect to latent SMPL shape pa-
rameters. ABS is fully differentiable with respect to these
parameters, and trained end-to-end via backpropagation
with BMnet in the loop. Experiments show that ABS
effectively discovers adversarial examples, such as bodies
with extreme body mass indices (BMI), consistent with
the rarity of extreme-BMI bodies in BMnet’s training
set. Thus ABS is able to reveal gaps in training data
and potential failures in predicting under-represented
body shapes. Results show that training BMnet with ABS
improves measurement prediction accuracy on real bodies
by up to 10%, when compared to no augmentation or
random body shape sampling. Furthermore, our method
significantly outperforms SOTA measurement estimation
methods by as much as 3x. Finally, we release BodyM,
the first challenging, large-scale dataset of photo silhou-
ettes and body measurements of real human subjects, to
further promote research in this area. Project website:
https://adversarialbodysim.github.io.
1. Introduction
Reconstruction of the 3D human body shape from im-
ages is an important problem in computer vision which
has received much attention in the last few years [9,14
17,22,23,27,3133,35,39,40,45,50,54,61,62,68,84
86,92,93,96]. However, 3D shape is not directly usable for
applications where anthropomorphic body measurements
are required. In healthcare, for example, measurements
†This research was performed while NR and JR were at Amazon.
such as waist girth are a key indicator of body fat; while in
the fashion industry, metric body measurements enable size
recommendations and made-to-measure garments. Surpris-
ingly, much less work has been published on directly esti-
mating body measurements from images. This is the prob-
lem that we address in this paper. Note that body measure-
ments can be viewed as a compact yet rich descriptor for
3D body shape. Indeed, previous work has shown that it is
possible to accurately map a few body measurements to a
3D body mesh in a reference pose [57,71].
Most existing body reconstruction methods do not in-
corporate knowledge of camera intrinsics or scale, and thus
cannot guarantee metric accuracy (i.e. the distance between
two points on the recovered mesh may not correspond to
physical distances on a person’s body) [34,72,80]. Fur-
thermore, since these approaches have only been trained
to generate a posed 3D avatar of a human, the body mea-
surements have to be derived from the predicted mesh,
which can limit resolution and accuracy. Finally, acquir-
ing physical body measurements at scale is costly and time-
consuming; hence, there is a dearth of training datasets
pairing images with measurements of real humans. To cir-
cumvent this challenge, previous efforts have used synthetic
data for training [19,78], and evaluated on very small num-
bers (2-4) of human subjects [11,19].
We present a method to predict body measurements from
images that alleviates these shortcomings. We train a con-
volutional body measurement network (BMnet) to directly
predict measurements from two silhouette images of a per-
son’s body. Silhouettes effectively convey body shape in-
formation, while preserving user privacy. To resolve scale
ambiguity, we include height and weight as additional in-
puts to BMnet. We introduce a novel adversarial body sim-
ulator (ABS) that automatically discovers and synthesizes
body shapes for which BMnet produces large prediction er-
rors. ABS is fully differentiable with BMnet in-the-loop.
It uncovers weaknesses in the model and gaps in the train-
ing data. For example, body shapes returned by ABS tend
to be of predominantly high body-mass-index (BMI), con-
sistent with the fact that these shapes are under-represented
1
arXiv:2210.05667v1 [cs.CV] 11 Oct 2022
(a) (b)
Figure 1: (a) Differentiable silhouette simulator: SMPL model M generates a body mesh from shape and pose parameters
βand θ, which is passed to silhouette renderer R(parameterized by lighting ιand camera γ), and measurement extractor g.
Regressor hgenerates height ξand weight ωfrom β.(b) Adversarial shape optimization: The simulator renders silhouettes
that are passed to BMnet (f) along with height ξand weight ωto obtain measurement estimates, which are compared to ground
truth measurements (also generated by the simulator). The error is maximized with respect to shape βunder fixed pose θ.
in training. Fine-tuning BMnet with samples generated by
ABS improves accuracy (up to 10%) and robustness on real
data, achieving state-of-art results. To train and evaluate
BMnet, we introduce a new dataset, BodyM, comprising
full-body silhouette images of 2,505 subjects in frontal and
lateral poses, accompanied by height, weight, and 14 body
measurements derived from 3D scans. To our knowledge,
this is the first dataset that pairs photo silhouettes and body
measurements for real humans at such a scale.
The main contributions of this work are:
BMnet: A deep CNN to directly regress physical body
measurements from 2 silhouettes, height and weight;
ABS: A novel differentiable simulator for generating
adversarial body shapes with BMnet in-the-loop, un-
covering training gaps and improving BMnet perfor-
mance on real data (up to 3x);
BodyM: A new dataset for body measurement esti-
mation comprising silhouettes, height, weight and 14
physical body measurements for 2,505 humans, pub-
licly available for research purposes 1.
2. Related Work
Body reconstruction from RGB images: The litera-
ture on recovering 3D human representations from RGB im-
ages is vast; see [83] and [85] for excellent surveys. Tech-
niques fall broadly into two categories. Parametric meth-
ods characterize the human body in terms of a parametric
model such as SMPL{-X}[45,53], Adam [30], SCAPE [3],
STAR [51], or GHUM [88]. Model parameters defining
body pose and shape are then estimated from images via
direct optimization [10,53,87,93], regression with deep
networks [9,16,17,27,3133,40,50,61,62,94], or a
1https://adversarialbodysim.github.io
combination of the two [34]. In contrast, non-parametric
methods directly regress a 3D body representation from im-
ages using graph convolutional neural networks [14,35],
transformers [41], combinations of both [42], intermedi-
ate representations such as 1D heatmaps [49] or 2D depth
maps [79], or with implicit functions [18,68]. Recently,
there have been successful explorations on probabilistic ap-
proaches for shape and pose estimation [36,6971].
Body reconstruction from silhouettes: Methods have
been proposed to predict 3D body model parameters from
binary human silhouette images [4,5,20,55,72]. Our ap-
proach is similar in flavor, but addresses a different task
of predicting physical body measurements from silhouettes.
Our constrained pose setting, height and weight inputs, and
adversarial training scheme enable measurement prediction
with state-of-art metric accuracy.
Body measurement estimation: Dibra et al. [19] re-
ported the first attempt at using a CNN to recover a 3D body
mesh and anthropomorphic measurements from silhouettes.
The silhouettes are generated synthetically by rendering 3D
meshes from the CAESAR (Civilian American and Euro-
pean Surface Anthropometry Resource) dataset [60] onto
frontal and side views, and body measurements are derived
as geodesic distances on 3D meshes. In contrast, our ap-
proach is trained on data from both real and synthetic hu-
mans, directly regresses measurements, and employs adver-
sarial training for improved performance. Our approach is
most closely related to the works of [78] and [90]. Yan et
al. [90] use their BodyFit dataset to train a CNN to pre-
dict measurements from silhouette pairs. Smith et al. [78]
proposed a multitask CNN to estimate body measurements,
body mesh, and 3D pose from height, weight, two silhou-
ette images and segmentation confidence maps. For train-
ing, they generate synthetic body shapes by sampling the
SMPL shape space with multivariate Gaussian shape dis-
tributions and stochastic perturbations of body shapes from
CAESAR. In contrast to both these methods, our approach
seeks adversarial samples in the low performance regime
of BMnet, enabling automatic discovery and mitigation of
weaknesses in dataset and network in a principled manner.
Synthesis for training: With advances in simulation
quality and realism, it has become increasingly common
to train deep neural networks using synthetic data [21,24,
40,59,63]. Recently, there have been attempts at learn-
ing to adapt distributions of generated synthetic data to im-
prove model training [2,68,25,46,66,74,91]. These
approaches focus on approximating a distribution that is
either similar to the natural test distribution or that mini-
mizes prediction error. Another flavor of approaches probes
the weaknesses of machine learning models using synthetic
data [28,37,38,48,56,67,77]. The works of [1,76,95]
generate robust synthetic training data for object recogni-
tion and visual-question-answering by varying scene pa-
rameters such as pose and lighting, while preserving object
characteristics. Shen et al. [75] tackle vehicle self-driving
by introducing adversarial camera corruptions in training.
In our work, we explore the impact of varying interpretable
parameters that directly control human body shape.
Adversarial techniques: We take inspiration from the
literature on adversarial attacks of neural networks [12,26,
52,81] and draw from ideas for improving network robust-
ness by training on images that have undergone white-box
adversarial attacks [47]. The main difference lies in the
search space: previous works search the image space while
we search the interpretable latent shape space of the body
model. The works by [58,65] find synthetic adversarial
samples for faces using either a GAN or a face simula-
tor. They are successful in finding interpretable attributes
leading to false predictions; however, they do not incorpo-
rate this knowledge in training to improve predictions on
real examples. In our work, we both discover adversarial
samples and use them in training to improve body measure-
ment estimation. Different from previous methods, we find
adversarial bodies by searching the latent space of a body
simulator comprising a pipeline of differentiable submod-
ules, namely: a 3D body shape model, body measurement
estimation network, height and weight regressors, and a ren-
derer based on a soft rasterizer [43].
Datasets: Widely used human body datasets such as
CAESAR [60] contain high volumes of 3D scans and body
measurements; however these do not come with real im-
ages, which must therefore be simulated from the scans
with a virtual camera. Recently Yan et al. [90] published
the BodyFit dataset comprising over 4K body scans from
which body measurements are computed, and silhouettes
are simulated. They also present a small collection of pho-
tographs and tape measurements of 194 subjects. To resolve
scale, they assume a fixed camera distance. Our BodyM is
the first large-scale dataset comprising body measurements
paired with silhouettes obtained by applying semantic seg-
mentation on real photographs. To resolve scale, we store
height and weight (easy to acquire) rather than assume fixed
camera distance (hard to enforce in practice).
3. Method
We use the SMPL model [45] as our basis for adversar-
ial body simulation. SMPL characterizes the human form
in terms of a finite number of shape parameters βand pose
parameters θ. Shape is modeled as a linear weighted com-
bination of basis shapes (with weights β) derived from the
CAESAR dataset, while pose is modeled as local 3D rota-
tion angles θon 24 skeleton joints. SMPL learns a regressor
M(β, θ)for generating an articulated body mesh of 6890
vertices from specified shape and pose using blend shapes.
3.1. Body Measurement Estimation Network
BMnet takes as input either single or multi-view silhou-
ette masks. For single-view, only a frontal segmentation
mask is used. For multi-view, the model also leverages the
lateral silhouette which provides crucial cues for accurate
measurement in the chest and waist areas. Additionally, we
use height and weight as input metadata. Height removes
the ambiguity in scale when predicting measurements from
subjects with variable distance to the camera, while weight
provides important cues for body size and shape. Our multi-
view measurement estimation network can be written as:
y=fψ(xf, xl, ξ, ω),(1)
where xfand xlare respectively the frontal and lateral sil-
houettes, (ξ, ω)are the height and the weight of the subject,
and ψrepresents network weights.
The network architecture comprises a MNASNet back-
bone [82] with a depth multiplier of 1 to extract features
from the silhouettes. Each silhouette is of size 640 ×480
and the two views are concatenated spatially to form a
640 ×960 image. Constant-valued images of the same size
representing height and weight are then concatenated depth-
wise to the silhouettes to produce an input tensor of dimen-
sion 3×640 ×960 for the network. The resulting feature
maps from MNASNet are fed into an MLP comprising a
hidden layer of 128 neurons and 14 outputs corresponding
to body measurements. Unlike previous approaches that at-
tempt the highly ambiguous problem of predicting a high-
dimensional body mesh and then subsequently computing
the measurements from the mesh [19], we directly regress
measurements, thus requiring a simpler architecture and ob-
viating the need for storing 3D body mesh ground truth.
3.2. Adversarial Body Simulator
We present an adversarial body simulator (ABS) that
searches the latent shape space of the SMPL model in order
摘要:

HumanBodyMeasurementEstimationwithAdversarialAugmentationNatanielRuiz2yMiriamBellver1TimoBolkart1AmbujArora1MingC.Lin1JavierRomero3yRajaBala11Amazon2BostonUniversity3RealityLabsResearchnruiz9@bu.edufmbellver,timbolka,ambarora,minglinz,rajablg@amazon.comAbstractWepresentaBodyMeasurementnetwork(BMnet)...

展开>> 收起<<
Human Body Measurement Estimation with Adversarial Augmentation Nataniel Ruiz2yMiriam Bellver1Timo Bolkart1Ambuj Arora1 Ming C. Lin1Javier Romero3yRaja Bala1.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:3.77MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注