BlanketGen - A synthetic blanket occlusion
augmentation pipeline for MoCap datasets
Jo˜
ao Carmona∗†, Tam´
as Kar´
acsony∗†‡, Jo˜
ao Paulo Silva Cunha∗† Senior Member, IEEE
∗Center for Biomedical Engineering Research, INESC TEC, Porto, Portugal
†Faculty of Engineering (FEUP), University of Porto, Porto, Portugal
‡Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Abstract—Human motion analysis has seen drastic improve-
ments recently, however, due to the lack of representative
datasets, for clinical in-bed scenarios it is still lagging be-
hind. To address this issue, we implemented BlanketGen, a
pipeline that augments videos with synthetic blanket occlu-
sions. With this pipeline, we generated an augmented ver-
sion of the pose estimation dataset 3DPW called BlanketGen-
3DPW. We then used this new dataset to fine-tune a Deep
Learning model to improve its performance in these sce-
narios with promising results. Code and further informa-
tion are available at https://gitlab.inesctec.pt/brain-lab/brain-
lab-public/blanket-gen-releases.
Index Terms—Human pose estimation, Motion capture, Syn-
thetic occlusions, Cloth simulation, Deep learning, Dataset
augmentation
I. INTRODUCTION
Human motion analysis is an active research area that
has recently seen drastic advancements by making use of
Deep Learning (DL), with which incredible results have been
attained [1]. Due to this, it has become a hot topic in other
areas of research as a tool to help solve complex problems
related to the human body.
One such area is the semiology of epileptic seizures:
accurate and effective diagnosis and classification of epilepsy
requires visiting an Epilepsy Monitoring Unit (EMU), where
the patients are monitored with video-electroencephalogram
(video-EEG) systems; the outputs of these systems during
epileptic seizures are then subjectively analyzed by epilep-
tologists. Our group has been exploring automatic human
motion analysis to aid epileptologists with quantitative results
[2]–[4], however acquisition of the clinical data required for
it is challenging [5].
Human Pose Estimation (HPE) is a task within the broader
concept of human motion analysis that focuses exclusively
on the position of the subjects. Different approaches have
been studied to tackle the task of HPE, but for the specific
use-case of seizure semiology in EMUs video-based systems
are the most promising solutions.
However, most of the research effort invested into video-
based HPE has been focused on the most common scenarios
of subjects standing up and moving with few occlusions
[6]; whereas the scenario considered in this paper has the
This work is financed by National Funds through the Portuguese funding
agency, FCT - Fundac¸˜
ao para a Ciˆ
encia e a Tecnologia, within project
LA/P/0063/2020 as well as under the scope of the CMU Portugal (Ref
PRT/BD/152202/2021).
subjects lying down in beds, usually with blankets covering
them at least partially, and recorded with a fixed camera. The
blanket occlusions in particular are of concern since accurate
estimation of occluded joints is especially difficult. Current
state-of-the-art systems generally make use of DL which is
extremely dependent on the datasets used for training and
blanket occlusions are rarely present in the datasets used for
training [6].
In order to allow DL HPE systems to make use of the infor-
mation hidden in blanket occlusions, we propose BlanketGen,
a pipeline to augment a dataset with computer-generated
(CG) blanket occlusions. We used this pipeline to generate
BlanketGen-3DPW, a version of 3DPW [7] augmented with
CG blanket occlusions. We then used this new dataset to fine-
tune a DL HPE model to improve its performance in these
scenarios with promising results.
II. RELATED WORKS
Older approaches to video-based HPE used traditional
computer vision techniques, such as the system proposed
in [3] for clinical in-bed HPE, which uses optical flow to
automatically track manually selected masks that surround
the body parts of interest. However, more recently the focus
of general HPE research has shifted to improving DL HPE
systems.
In [8] a human body model was proposed which used
principal component analysis to describe variations in body
shape with as few parameters as possible, this approach
proved extremely robust and efficient so the SMPL model
became a standard in HPE; [9] then used 2D joint positions
acquired with the method proposed in [10] to optimize a
SMPL mesh; in [11] inverse kinematics were used to fuse
3D joint positions and SMPL parameters that were estimated
by separate systems; [12] employed online unsupervised
learning to adapt to data in new domains even without
ground truth annotations; [13] used DL to interpolate pose
estimations between keyframes which allowed for state-of-
the-art results even while only analyzing one-tenth of the
frames. Due to these and other developments, DL HPE
systems are now exceptional in scenarios where there are
few occlusions. However, their performance drops drastically
when blanket occlusions are present.
In order to improve DL systems in such scenarios, [14]
experimented with different approaches to improve HPE
systems in clinical scenarios using depth video, among them
arXiv:2210.12035v2 [cs.CV] 19 Mar 2023