Human Body Measurement Estimation with Adversarial Augmentation Nataniel Ruiz2yMiriam Bellver1Timo Bolkart1Ambuj Arora1 Ming C. Lin1Javier Romero3yRaja Bala1

2025-04-24 0 0 3.77MB 14 页 10玖币

侵权投诉

Human Body Measurement Estimation with Adversarial Augmentation

Nataniel Ruiz2†Miriam Bellver1Timo Bolkart1Ambuj Arora1

Ming C. Lin1Javier Romero3†Raja Bala1

1Amazon 2Boston University 3Reality Labs Research

nruiz9@bu.edu {mbellver, timbolka, ambarora, minglinz, rajabl}@amazon.com

Abstract

We present a Body Measurement network (BMnet) for esti-

mating 3D anthropomorphic measurements of the human

body shape from silhouette images. Training of BMnet

is performed on data from real human subjects, and

augmented with a novel adversarial body simulator (ABS)

that ﬁnds and synthesizes challenging body shapes.

ABS is based on the skinned multiperson linear (SMPL)

body model, and aims to maximize BMnet measurement

prediction error with respect to latent SMPL shape pa-

rameters. ABS is fully differentiable with respect to these

parameters, and trained end-to-end via backpropagation

with BMnet in the loop. Experiments show that ABS

effectively discovers adversarial examples, such as bodies

with extreme body mass indices (BMI), consistent with

the rarity of extreme-BMI bodies in BMnet’s training

set. Thus ABS is able to reveal gaps in training data

and potential failures in predicting under-represented

body shapes. Results show that training BMnet with ABS

improves measurement prediction accuracy on real bodies

by up to 10%, when compared to no augmentation or

random body shape sampling. Furthermore, our method

signiﬁcantly outperforms SOTA measurement estimation

methods by as much as 3x. Finally, we release BodyM,

the ﬁrst challenging, large-scale dataset of photo silhou-

ettes and body measurements of real human subjects, to

further promote research in this area. Project website:

https://adversarialbodysim.github.io.

1. Introduction

Reconstruction of the 3D human body shape from im-

ages is an important problem in computer vision which

has received much attention in the last few years [9,14–

17,22,23,27,31–33,35,39,40,45,50,54,61,62,68,84–

86,92,93,96]. However, 3D shape is not directly usable for

applications where anthropomorphic body measurements

are required. In healthcare, for example, measurements

†This research was performed while NR and JR were at Amazon.

such as waist girth are a key indicator of body fat; while in

the fashion industry, metric body measurements enable size

recommendations and made-to-measure garments. Surpris-

ingly, much less work has been published on directly esti-

mating body measurements from images. This is the prob-

lem that we address in this paper. Note that body measure-

ments can be viewed as a compact yet rich descriptor for

3D body shape. Indeed, previous work has shown that it is

possible to accurately map a few body measurements to a

3D body mesh in a reference pose [57,71].

Most existing body reconstruction methods do not in-

corporate knowledge of camera intrinsics or scale, and thus

cannot guarantee metric accuracy (i.e. the distance between

two points on the recovered mesh may not correspond to

physical distances on a person’s body) [34,72,80]. Fur-

thermore, since these approaches have only been trained

to generate a posed 3D avatar of a human, the body mea-

surements have to be derived from the predicted mesh,

which can limit resolution and accuracy. Finally, acquir-

ing physical body measurements at scale is costly and time-

consuming; hence, there is a dearth of training datasets

pairing images with measurements of real humans. To cir-

cumvent this challenge, previous efforts have used synthetic

data for training [19,78], and evaluated on very small num-

bers (2-4) of human subjects [11,19].

We present a method to predict body measurements from

images that alleviates these shortcomings. We train a con-

volutional body measurement network (BMnet) to directly

predict measurements from two silhouette images of a per-

son’s body. Silhouettes effectively convey body shape in-

formation, while preserving user privacy. To resolve scale

ambiguity, we include height and weight as additional in-

puts to BMnet. We introduce a novel adversarial body sim-

ulator (ABS) that automatically discovers and synthesizes

body shapes for which BMnet produces large prediction er-

rors. ABS is fully differentiable with BMnet in-the-loop.

It uncovers weaknesses in the model and gaps in the train-

ing data. For example, body shapes returned by ABS tend

to be of predominantly high body-mass-index (BMI), con-

sistent with the fact that these shapes are under-represented

arXiv:2210.05667v1 [cs.CV] 11 Oct 2022

(a) (b)

Figure 1: (a) Differentiable silhouette simulator: SMPL model M generates a body mesh from shape and pose parameters

βand θ, which is passed to silhouette renderer R(parameterized by lighting ιand camera γ), and measurement extractor g.

Regressor hgenerates height ξand weight ωfrom β.(b) Adversarial shape optimization: The simulator renders silhouettes

that are passed to BMnet (f) along with height ξand weight ωto obtain measurement estimates, which are compared to ground

truth measurements (also generated by the simulator). The error is maximized with respect to shape βunder ﬁxed pose θ.

in training. Fine-tuning BMnet with samples generated by

ABS improves accuracy (up to 10%) and robustness on real

data, achieving state-of-art results. To train and evaluate

BMnet, we introduce a new dataset, BodyM, comprising

full-body silhouette images of 2,505 subjects in frontal and

lateral poses, accompanied by height, weight, and 14 body

measurements derived from 3D scans. To our knowledge,

this is the ﬁrst dataset that pairs photo silhouettes and body

measurements for real humans at such a scale.

The main contributions of this work are:

•BMnet: A deep CNN to directly regress physical body

measurements from 2 silhouettes, height and weight;

•ABS: A novel differentiable simulator for generating

adversarial body shapes with BMnet in-the-loop, un-

covering training gaps and improving BMnet perfor-

mance on real data (up to 3x);

•BodyM: A new dataset for body measurement esti-

mation comprising silhouettes, height, weight and 14

physical body measurements for 2,505 humans, pub-

licly available for research purposes 1.

2. Related Work

Body reconstruction from RGB images: The litera-

ture on recovering 3D human representations from RGB im-

ages is vast; see [83] and [85] for excellent surveys. Tech-

niques fall broadly into two categories. Parametric meth-

ods characterize the human body in terms of a parametric

model such as SMPL{-X}[45,53], Adam [30], SCAPE [3],

STAR [51], or GHUM [88]. Model parameters deﬁning

body pose and shape are then estimated from images via

direct optimization [10,53,87,93], regression with deep

networks [9,16,17,27,31–33,40,50,61,62,94], or a

1https://adversarialbodysim.github.io

combination of the two [34]. In contrast, non-parametric

methods directly regress a 3D body representation from im-

ages using graph convolutional neural networks [14,35],

transformers [41], combinations of both [42], intermedi-

ate representations such as 1D heatmaps [49] or 2D depth

maps [79], or with implicit functions [18,68]. Recently,

there have been successful explorations on probabilistic ap-

proaches for shape and pose estimation [36,69–71].

Body reconstruction from silhouettes: Methods have

been proposed to predict 3D body model parameters from

binary human silhouette images [4,5,20,55,72]. Our ap-

proach is similar in ﬂavor, but addresses a different task

of predicting physical body measurements from silhouettes.

Our constrained pose setting, height and weight inputs, and

adversarial training scheme enable measurement prediction

with state-of-art metric accuracy.

Body measurement estimation: Dibra et al. [19] re-

ported the ﬁrst attempt at using a CNN to recover a 3D body

mesh and anthropomorphic measurements from silhouettes.

The silhouettes are generated synthetically by rendering 3D

meshes from the CAESAR (Civilian American and Euro-

pean Surface Anthropometry Resource) dataset [60] onto

frontal and side views, and body measurements are derived

as geodesic distances on 3D meshes. In contrast, our ap-

proach is trained on data from both real and synthetic hu-

mans, directly regresses measurements, and employs adver-

sarial training for improved performance. Our approach is

most closely related to the works of [78] and [90]. Yan et

al. [90] use their BodyFit dataset to train a CNN to pre-

dict measurements from silhouette pairs. Smith et al. [78]

proposed a multitask CNN to estimate body measurements,

body mesh, and 3D pose from height, weight, two silhou-

ette images and segmentation conﬁdence maps. For train-

ing, they generate synthetic body shapes by sampling the

SMPL shape space with multivariate Gaussian shape dis-

tributions and stochastic perturbations of body shapes from

CAESAR. In contrast to both these methods, our approach

seeks adversarial samples in the low performance regime

of BMnet, enabling automatic discovery and mitigation of

weaknesses in dataset and network in a principled manner.

Synthesis for training: With advances in simulation

quality and realism, it has become increasingly common

to train deep neural networks using synthetic data [21,24,

40,59,63]. Recently, there have been attempts at learn-

ing to adapt distributions of generated synthetic data to im-

prove model training [2,6–8,25,46,66,74,91]. These

approaches focus on approximating a distribution that is

either similar to the natural test distribution or that mini-

mizes prediction error. Another ﬂavor of approaches probes

the weaknesses of machine learning models using synthetic

data [28,37,38,48,56,67,77]. The works of [1,76,95]

generate robust synthetic training data for object recogni-

tion and visual-question-answering by varying scene pa-

rameters such as pose and lighting, while preserving object

characteristics. Shen et al. [75] tackle vehicle self-driving

by introducing adversarial camera corruptions in training.

In our work, we explore the impact of varying interpretable

parameters that directly control human body shape.

Adversarial techniques: We take inspiration from the

literature on adversarial attacks of neural networks [12,26,

52,81] and draw from ideas for improving network robust-

ness by training on images that have undergone white-box

adversarial attacks [47]. The main difference lies in the

search space: previous works search the image space while

we search the interpretable latent shape space of the body

model. The works by [58,65] ﬁnd synthetic adversarial

samples for faces using either a GAN or a face simula-

tor. They are successful in ﬁnding interpretable attributes

leading to false predictions; however, they do not incorpo-

rate this knowledge in training to improve predictions on

real examples. In our work, we both discover adversarial

samples and use them in training to improve body measure-

ment estimation. Different from previous methods, we ﬁnd

adversarial bodies by searching the latent space of a body

simulator comprising a pipeline of differentiable submod-

ules, namely: a 3D body shape model, body measurement

estimation network, height and weight regressors, and a ren-

derer based on a soft rasterizer [43].

Datasets: Widely used human body datasets such as

CAESAR [60] contain high volumes of 3D scans and body

measurements; however these do not come with real im-

ages, which must therefore be simulated from the scans

with a virtual camera. Recently Yan et al. [90] published

the BodyFit dataset comprising over 4K body scans from

which body measurements are computed, and silhouettes

are simulated. They also present a small collection of pho-

tographs and tape measurements of 194 subjects. To resolve

scale, they assume a ﬁxed camera distance. Our BodyM is

the ﬁrst large-scale dataset comprising body measurements

paired with silhouettes obtained by applying semantic seg-

mentation on real photographs. To resolve scale, we store

height and weight (easy to acquire) rather than assume ﬁxed

camera distance (hard to enforce in practice).

3. Method

We use the SMPL model [45] as our basis for adversar-

ial body simulation. SMPL characterizes the human form

in terms of a ﬁnite number of shape parameters βand pose

parameters θ. Shape is modeled as a linear weighted com-

bination of basis shapes (with weights β) derived from the

CAESAR dataset, while pose is modeled as local 3D rota-

tion angles θon 24 skeleton joints. SMPL learns a regressor

M(β, θ)for generating an articulated body mesh of 6890

vertices from speciﬁed shape and pose using blend shapes.

3.1. Body Measurement Estimation Network

BMnet takes as input either single or multi-view silhou-

ette masks. For single-view, only a frontal segmentation

mask is used. For multi-view, the model also leverages the

lateral silhouette which provides crucial cues for accurate

measurement in the chest and waist areas. Additionally, we

use height and weight as input metadata. Height removes

the ambiguity in scale when predicting measurements from

subjects with variable distance to the camera, while weight

provides important cues for body size and shape. Our multi-

view measurement estimation network can be written as:

y=fψ(xf, xl, ξ, ω),(1)

where xfand xlare respectively the frontal and lateral sil-

houettes, (ξ, ω)are the height and the weight of the subject,

and ψrepresents network weights.

The network architecture comprises a MNASNet back-

bone [82] with a depth multiplier of 1 to extract features

from the silhouettes. Each silhouette is of size 640 ×480

and the two views are concatenated spatially to form a

640 ×960 image. Constant-valued images of the same size

representing height and weight are then concatenated depth-

wise to the silhouettes to produce an input tensor of dimen-

sion 3×640 ×960 for the network. The resulting feature

maps from MNASNet are fed into an MLP comprising a

hidden layer of 128 neurons and 14 outputs corresponding

to body measurements. Unlike previous approaches that at-

tempt the highly ambiguous problem of predicting a high-

dimensional body mesh and then subsequently computing

the measurements from the mesh [19], we directly regress

measurements, thus requiring a simpler architecture and ob-

viating the need for storing 3D body mesh ground truth.

3.2. Adversarial Body Simulator

We present an adversarial body simulator (ABS) that

searches the latent shape space of the SMPL model in order

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HumanBodyMeasurementEstimationwithAdversarialAugmentationNatanielRuiz2yMiriamBellver1TimoBolkart1AmbujArora1MingC.Lin1JavierRomero3yRajaBala11Amazon2BostonUniversity3RealityLabsResearchnruiz9@bu.edufmbellver,timbolka,ambarora,minglinz,rajablg@amazon.comAbstractWepresentaBodyMeasurementnetwork(BMnet)...

展开>> 收起<<

Human Body Measurement Estimation with Adversarial Augmentation Nataniel Ruiz2yMiriam Bellver1Timo Bolkart1Ambuj Arora1 Ming C. Lin1Javier Romero3yRaja Bala1.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Human Body Measurement Estimation with Adversarial Augmentation Nataniel Ruiz2yMiriam Bellver1Timo Bolkart1Ambuj Arora1 Ming C. Lin1Javier Romero3yRaja Bala1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: