Skeleton2Humanoid Animating Simulated Characters for Physically-plausible Motion In-betweening Yunhao Li

2025-05-03 0 0 7.2MB 13 页 10玖币
侵权投诉
Skeleton2Humanoid: Animating Simulated Characters for
Physically-plausible Motion In-betweening
Yunhao Li
Institute of Image Communication
and Network Engineering
Shanghai Jiao Tong University
Shanghai, China
lyhsjtu@sjtu.edu.cn
Zhenbo Yu
Shanghai Jiao Tong University
Shanghai, China
yuzhenbo@sjtu.edu.cn
Yucheng Zhu
Institute of Image Communication
and Network Engineering
Shanghai Jiao Tong University
Shanghai, China
zyc420@sjtu.edu.cn
Bingbing Ni
Shanghai Jiao Tong University
Shanghai, China
nibingbing@sjtu.edu.cn
Guangtao Zhai
Institute of Image Communication
and Network Engineering
Shanghai Jiao Tong University
Shanghai, China
zhaiguangtao@sjtu.edu.cn
Wei Shen
MoE Key Lab of Articial Intelligence,
AI Institute
Shanghai Jiao Tong University
Shanghai, China
wei.shen@sjtu.edu.cn
Figure 1: Our Skeleton2Humanoid system can directly synthesize a complete humanoid character transition motion in a
physics simulator (Bottom) given past keyframes and a future keyframe (Top). Our system can produce both accurate and
physically-plausible character motions.
ABSTRACT
Human motion synthesis is a long-standing problem with various
applications in digital twins and the Metaverse. However, modern
deep learning based motion synthesis approaches barely consider
the physical plausibility of synthesized motions and consequently
they usually produce unrealistic human motions. In order to solve
this problem, we propose a system “Skeleton2Humanoid” which
Equal contribution.
Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MM ’22, October 10–14, 2022, Lisboa, Portugal
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00
https://doi.org/10.1145/3503161.3548093
performs physics-oriented motion correction at test time by regu-
larizing synthesized skeleton motions in a physics simulator. Con-
cretely, our system consists of three sequential stages: (I) test time
motion synthesis network adaptation, (II) skeleton to humanoid
matching and (III) motion imitation based on reinforcement learn-
ing (RL). Stage I introduces a test time adaptation strategy, which
improves the physical plausibility of synthesized human skeleton
motions by optimizing skeleton joint locations. Stage II performs
an analytical inverse kinematics strategy, which converts the op-
timized human skeleton motions to humanoid robot motions in a
physics simulator, then the converted humanoid robot motions can
be served as reference motions for the RL policy to imitate. Stage III
introduces a curriculum residual force control policy, which drives
the humanoid robot to mimic complex converted reference motions
in accordance with the physical law. We verify our system on a
typical human motion synthesis task, motion-in-betweening. Ex-
periments on the challenging LaFAN1 dataset show our system can
outperform prior methods signicantly in terms of both physical
plausibility and accuracy. Code will be released for research pur-
poses at: https://github.com/michaelliyunhao/Skeleton2Humanoid.
arXiv:2210.04294v1 [cs.CV] 9 Oct 2022
MM ’22, October 10–14, 2022, Lisboa, Portugal Yunhao Li et al.
CCS CONCEPTS
Computing methodologies Motion capture.
KEYWORDS
3D motion in-betweening; inverse kinematics; reinforcement learn-
ing; 3D animation
ACM Reference Format:
Yunhao Li, Zhenbo Yu, Yucheng Zhu, Bingbing Ni, Guangtao Zhai, and Wei
Shen. 2022. Skeleton2Humanoid: Animating Simulated Characters for Physically-
plausible Motion In-betweening. In Proceedings of the 30th ACM International
Conference on Multimedia (MM ’22), October 10–14, 2022, Lisboa, Portugal.
ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3503161.3548093
1 INTRODUCTION
Synthesizing both accurate and realistic virtual human motions has
been a widely explored but challenging task in computer vision
and graphics [
48
,
49
] with various applications in digital twins and
the Metaverse. Recently, deep learning sheds light onto a way to
generate accurate human motions and has been applied to various
motion synthesis tasks, such as human motion prediction [
2
4
,
11
,
16
,
17
,
53
55
,
60
], human motion completion [
31
,
58
,
59
] and
human motion in-betweening [
1
,
47
,
56
,
57
]. Although they have
shown great performance on synthesizing accurate human body
motions with small skeleton joint errors comparing with ground
truth motions, they fail to model the motions under the physics
laws. Consequently, the synthesized motions are usually physically
implausible. For example, the synthesized feet often penetrate the
ground, the body joints are rotated with impossible angles, the
whole body motions are unsmooth, the synthesized feet slide back
and forth while they should be in static and touch the ground.
These synthesized artifacts signicantly limits the application of
motion synthesis on the virtual human animation and the incoming
Metaverse because they easily make humans feel unrealistic.
Utilizing humanoid characters in a physics simulator to optimize
motions is a promising solution because the physics simulator can
guarantee the physical plausibility of the generated motions. Prior
works [
39
,
40
,
52
] utilized reinforcement learning (RL) to actuate
the humanoid character to imitate various reference mocap data
for creating physical character animation. Inspired by them, Recent
works [
8
,
29
] also attempted to utilize RL to imitate motions synthe-
sized by deep neural networks, in the format of skeletons or SMPL
[
9
] models, aiming at producing physically-plausible motions for
3D pose estimation. However, these methods are only validated
on simple motions such as walking and talking in the Human3.6m
dataset and cannot generalize well to complex motions or irregular
motions. In addition, RL based imitation requires transferring syn-
thesized human skeleton motions to humanoid motions, where a
humanoid character should be carefully designed to exactly match
the human skeletons in terms of both shapes and the kinematics
tree. This limits RL based imitation to transfer motions between
skeleton and humanoid with dierent shapes and kinematics trees.
To address these issues, we propose Skeleton2Humanoid, a novel
system which is able to improve the physical plausibility of the
motions synthesized from motion synthesis networks, though the
transfer from human skeleton motions to humanoid character mo-
tions. Our Skeleton2Humanoid system consists of three sequential
stages:
(I) Test Time Motion Synthesis Network Adaptation:
We adapt the motion synthesis network with a few gradients on
the test data using two new self-supervised losses, a foot contact
consistency loss and a motion smoothness loss, which can improve
the physical plausibility of the predicted motions.
(II) Skeleton to
Humanoid Matching:
We match the synthesized human skeleton
motions to humanoid character motions by a novel general analyti-
cal inverse kinematic method. Inverse kinematics is able to convert
human skeleton motions to humanoid motions even when the body
structure is dierent from the human skeleton.
(III) Motion Imi-
tation base on RL:
Finally, we animate the humanoid character to
imitate various synthesized motions. Specically, based on recent
work [
26
,
29
], we propose a curriculum residual force control hu-
manoid control policy (CRP) by introducing a curriculum learning
paradigm that dynamically adjusts a residual force scale during RL
training, which can improves asymptotic RL performance on imi-
tating various synthesized motions. To verify the eectiveness of
our Skeleton2Humanoid system, we select “motion in-betweening”
task, as it is a recent proposed challenging motion prediction task
[
1
,
47
] for evaluation. Motion in-betweening aims at predicting
the transition motions between the past given keyframes and a
provided future keyframe. Experiments on challenging LaFAN1
dataset show the superiority of our Skeleton2Humanoid system.
The main contributions of this paper are as follows:
(1)
We
present Skeleton2Humanoid, a new system that converts human
skeleton motions to humanoid character motions to produce physi-
cal plausible motions.
(2)
Our proposed test time adaptation stage
can further improve the prediction accuracy and physical plausibil-
ity on large mocap dataset LaFAN1 for the motion in-betweening
task. With test time adaptation, we achieve a new benchmark accu-
racy on the motion in-betweening task.
(3)
Our proposed curricu-
lum residual force control policy enables ner character control and
outperforms prior arts on motion imitation.
(4)
Our whole Skele-
ton2Humanoid system signicantly improves the performance of
human in-betweening motions on physical plausibility and achieves
comparable motion prediction accuracy.
2 RELATED WORK
Human/character motion synthesis
: Motion synthesis is a gen-
eral term which contains several tasks including motion prediction,
in-betweening and completion. Motion prediction aims at predict-
ing future human motions given past motions. Deterministic mo-
tion prediction estimates a single accurate motion and prior works
used various network architectures including recurrent neural net-
work [
2
4
], graph convolution network [
61
] or transformer [
16
]
to model human motions. Stochastic motion prediction produces
diverse future human motions by utilizing generative model such
as VAE [
6
,
17
,
55
,
66
], GAN [
12
,
14
,
65
]. Motion completion and
in-betweening aim at lling gaps of motion with predened key-
frame constraints. Current works utilized convolution networks
[
31
,
57
,
59
,
62
], recurrent networks [
1
,
63
] or transformers [
47
] to
synthesize accurate and consistent results. For instance, Harvey et
al. [
1
] proposed a transition generation technique based on recur-
rent neural networks for motion in-betweening task. Duan et al.
[
47
] utilized transformer architecture to model human motions in a
sequence-to-sequence manner for the motion in-betweening task.
Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening MM ’22, October 10–14, 2022, Lisboa, Portugal
Figure 2: An overview of our Skeleton2Humanoid system on the motion in-betweening task. Given the test data containing
past keyframes𝑚9:0 and a future keyframe𝑚𝑇, Stage I optimizes skeleton joint locations by test time adaptation and produces
more plausible skeleton motions ^
𝑚1:𝑇1. Stage II converts the optimized skeleton motions ^
𝑚1:𝑇1to humanoid motions ˜
𝑚1:𝑇1
in the physics simulator by analytical inverse kinematics. Stage III nally drives the humanoid to mimic converted skeleton
motions ˜
𝑚1:𝑇1to produce physically-plausible humanoid motions 𝑚1:𝑇1.
Test Time Adaptation
: Test time adaptation is a recently proposed
method that utilize the self-supervised distribution information
from the test data presented at test time to quickly adapt models
with a few gradient steps [
32
,
33
,
36
], which can further improve
the model performance on test data. The rst work [
32
] introduced
test time adaptation by proposing an auxiliary branch with self-
supervised rotation prediction loss to adapt the classication model.
Wang et al. [
36
] minimized the predicted entropy of classication
model on test data to improve the performance. Recently, more
works start to utilize test time adaptation on the 2D/3D human
pose related task [
34
,
35
]: For instance, Guan et al. [
35
] proposed
an online bilevel adaptation framework for 3D human mesh recon-
struction which greatly improves model generalization. In contrast
to other works, Our approach is the rst one to study test time
adaptation on the human motion in-betweening task.
Reinforcement Learning for Humanoid Character Control
:
Deep RL is a promising approach for learning character control
policies [
37
41
] to help character perform various motions. Peng et
al. [
39
] rst utilize hand craft rewards to imitate a single sequence
of human poses. Recently, some works [
8
,
29
,
42
,
43
,
51
] used RL to
produce simple human motions from egocentric videos for ego-pose
estimation or 3d human pose estimation. Yuan et al. [
26
] proposed
to add external residual forces and help characters to better imitate
agile single reference motions. In addition, some works [
44
46
]
utilized deep RL to learn a interactive controllable policies from
large motion capture data for character animation. However, Prior
works mostly focused on learning control policies on motion cap-
ture data, while we learn a policy to imitate synthesized motions.
We propose a curriculum residual force control policy (CRP) that
can better imitate diverse motions.
3 APPROACH
3.1 System Overview
The human motion in-betweening task can be formulated as: given
the past 10 human skeleton poses
𝑚9:0
and a future skeleton
keyframe
𝑚𝑇
at time T, we want to recover the ground truth
Figure 3: Details of our test time adaptation method. ^
𝑞,
^
𝑐𝑜𝑛𝑡𝑎𝑐𝑡, FK and ^
𝑝represent the predicted root positions, the
contact prediction of feet joints, the forward kinematic pro-
cess and 3d joint positions for human skeleton, respectively.
transition motions
𝑚1:𝑇1
. Given a pretrained typical motion in-
betweening network [
1
], our Skeleton2Humanoid performs a physics-
oriented motion correction consists of 3 stages as presented in Fig.
2 to optimize synthesized in-betweening motions. Stage I optimizes
the pretrained motion in-betweening network at test time to pre-
dict more physically-plausible skeleton transition motions
^
𝑚1:𝑇1
.
Then Stage II transfers the optimized skeleton motions
^
𝑚1:𝑇1
to
humanoid motions
˜
𝑚1:𝑇1
through analytical inverse kinematics.
Finally, Stage III learns a curriculum residual force control policy
to imitate the transferred humanoid motions
˜
𝑚1:𝑇1
to produce
physically-plausible humanoid motions 𝑚1:𝑇1.
In our Skeleton2Humanoid framework,
𝑚𝑡
and
^
𝑚𝑡
are skeleton
motions, and
𝑚𝑡
is represented by
𝑚𝑡(𝑞𝑡, 𝑟𝑡, 𝑝𝑡)
, where
𝑞𝑡
and
𝑟𝑡
denote body joint angles in quaternions and root translation,
𝑝𝑡
de-
notes 3d joint positions calculated by forward kinematics. Similarly,
^
𝑚𝑡(^
𝑞𝑡,^
𝑟𝑡,^
𝑝𝑡)
. In addition,
˜
𝑚𝑡
and
𝑚𝑡
are humanoid motions,
˜
𝑚𝑡
is represented by
˜
𝑚𝑡(˜
𝑞𝑡,˜
𝑟𝑡,˜
𝑝𝑡)
, where
˜
𝑞𝑡
,
˜
𝑟𝑡
and
˜
𝑝𝑡
denote joint
angles in euler angles, root translation and 3d joint positions of the
reference humanoid motions. Similarly, 𝑚𝑡(𝑞𝑡, 𝑟𝑡, 𝑝𝑡).
3.2 Test Time Motion In-betweening Network
Adaptation
3.2.1 Adaptation for Physically-plausible Skeleton Motion. Previ-
ous human motion in-betweening model [
1
] has shown great per-
formance on synthesizing accurate human motions. However, it
摘要:

Skeleton2Humanoid:AnimatingSimulatedCharactersforPhysically-plausibleMotionIn-betweeningYunhaoLi∗InstituteofImageCommunicationandNetworkEngineeringShanghaiJiaoTongUniversityShanghai,Chinalyhsjtu@sjtu.edu.cnZhenboYu∗ShanghaiJiaoTongUniversityShanghai,Chinayuzhenbo@sjtu.edu.cnYuchengZhuInstituteofImag...

展开>> 收起<<
Skeleton2Humanoid Animating Simulated Characters for Physically-plausible Motion In-betweening Yunhao Li.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:7.2MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注