Skeleton2Humanoid Animating Simulated Characters for Physically-plausible Motion In-betweening Yunhao Li

2025-05-03 0 0 7.2MB 13 页 10玖币

侵权投诉

Skeleton2Humanoid: Animating Simulated Characters for

Physically-plausible Motion In-betweening

Yunhao Li∗

Institute of Image Communication

and Network Engineering

Shanghai Jiao Tong University

Shanghai, China

lyhsjtu@sjtu.edu.cn

Zhenbo Yu∗

Shanghai Jiao Tong University

Shanghai, China

yuzhenbo@sjtu.edu.cn

Yucheng Zhu

Institute of Image Communication

and Network Engineering

Shanghai Jiao Tong University

Shanghai, China

zyc420@sjtu.edu.cn

Bingbing Ni

Shanghai Jiao Tong University

Shanghai, China

nibingbing@sjtu.edu.cn

Guangtao Zhai†

Institute of Image Communication

and Network Engineering

Shanghai Jiao Tong University

Shanghai, China

zhaiguangtao@sjtu.edu.cn

Wei Shen†

MoE Key Lab of Articial Intelligence,

AI Institute

Shanghai Jiao Tong University

Shanghai, China

wei.shen@sjtu.edu.cn

Figure 1: Our Skeleton2Humanoid system can directly synthesize a complete humanoid character transition motion in a

physics simulator (Bottom) given past keyframes and a future keyframe (Top). Our system can produce both accurate and

physically-plausible character motions.

ABSTRACT

Human motion synthesis is a long-standing problem with various

applications in digital twins and the Metaverse. However, modern

deep learning based motion synthesis approaches barely consider

the physical plausibility of synthesized motions and consequently

they usually produce unrealistic human motions. In order to solve

this problem, we propose a system “Skeleton2Humanoid” which

∗Equal contribution.

†Corresponding author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

MM ’22, October 10–14, 2022, Lisboa, Portugal

ACM ISBN 978-1-4503-9203-7/22/10. . . $15.00

https://doi.org/10.1145/3503161.3548093

performs physics-oriented motion correction at test time by regu-

larizing synthesized skeleton motions in a physics simulator. Con-

cretely, our system consists of three sequential stages: (I) test time

motion synthesis network adaptation, (II) skeleton to humanoid

matching and (III) motion imitation based on reinforcement learn-

ing (RL). Stage I introduces a test time adaptation strategy, which

improves the physical plausibility of synthesized human skeleton

motions by optimizing skeleton joint locations. Stage II performs

an analytical inverse kinematics strategy, which converts the op-

timized human skeleton motions to humanoid robot motions in a

physics simulator, then the converted humanoid robot motions can

be served as reference motions for the RL policy to imitate. Stage III

introduces a curriculum residual force control policy, which drives

the humanoid robot to mimic complex converted reference motions

in accordance with the physical law. We verify our system on a

typical human motion synthesis task, motion-in-betweening. Ex-

periments on the challenging LaFAN1 dataset show our system can

outperform prior methods signicantly in terms of both physical

plausibility and accuracy. Code will be released for research pur-

poses at: https://github.com/michaelliyunhao/Skeleton2Humanoid.

arXiv:2210.04294v1 [cs.CV] 9 Oct 2022

MM ’22, October 10–14, 2022, Lisboa, Portugal Yunhao Li et al.

CCS CONCEPTS

•Computing methodologies →Motion capture.

KEYWORDS

3D motion in-betweening; inverse kinematics; reinforcement learn-

ing; 3D animation

ACM Reference Format:

Yunhao Li, Zhenbo Yu, Yucheng Zhu, Bingbing Ni, Guangtao Zhai, and Wei

Shen. 2022. Skeleton2Humanoid: Animating Simulated Characters for Physically-

plausible Motion In-betweening. In Proceedings of the 30th ACM International

Conference on Multimedia (MM ’22), October 10–14, 2022, Lisboa, Portugal.

ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3503161.3548093

1 INTRODUCTION

Synthesizing both accurate and realistic virtual human motions has

been a widely explored but challenging task in computer vision

and graphics [

] with various applications in digital twins and

the Metaverse. Recently, deep learning sheds light onto a way to

generate accurate human motions and has been applied to various

motion synthesis tasks, such as human motion prediction [

–

], human motion completion [

] and

human motion in-betweening [

]. Although they have

shown great performance on synthesizing accurate human body

motions with small skeleton joint errors comparing with ground

truth motions, they fail to model the motions under the physics

laws. Consequently, the synthesized motions are usually physically

implausible. For example, the synthesized feet often penetrate the

ground, the body joints are rotated with impossible angles, the

whole body motions are unsmooth, the synthesized feet slide back

and forth while they should be in static and touch the ground.

These synthesized artifacts signicantly limits the application of

motion synthesis on the virtual human animation and the incoming

Metaverse because they easily make humans feel unrealistic.

Utilizing humanoid characters in a physics simulator to optimize

motions is a promising solution because the physics simulator can

guarantee the physical plausibility of the generated motions. Prior

works [

] utilized reinforcement learning (RL) to actuate

the humanoid character to imitate various reference mocap data

for creating physical character animation. Inspired by them, Recent

works [

] also attempted to utilize RL to imitate motions synthe-

sized by deep neural networks, in the format of skeletons or SMPL

[

] models, aiming at producing physically-plausible motions for

3D pose estimation. However, these methods are only validated

on simple motions such as walking and talking in the Human3.6m

dataset and cannot generalize well to complex motions or irregular

motions. In addition, RL based imitation requires transferring syn-

thesized human skeleton motions to humanoid motions, where a

humanoid character should be carefully designed to exactly match

the human skeletons in terms of both shapes and the kinematics

tree. This limits RL based imitation to transfer motions between

skeleton and humanoid with dierent shapes and kinematics trees.

To address these issues, we propose Skeleton2Humanoid, a novel

system which is able to improve the physical plausibility of the

motions synthesized from motion synthesis networks, though the

transfer from human skeleton motions to humanoid character mo-

tions. Our Skeleton2Humanoid system consists of three sequential

stages:

(I) Test Time Motion Synthesis Network Adaptation:

We adapt the motion synthesis network with a few gradients on

the test data using two new self-supervised losses, a foot contact

consistency loss and a motion smoothness loss, which can improve

the physical plausibility of the predicted motions.

(II) Skeleton to

Humanoid Matching:

We match the synthesized human skeleton

motions to humanoid character motions by a novel general analyti-

cal inverse kinematic method. Inverse kinematics is able to convert

human skeleton motions to humanoid motions even when the body

structure is dierent from the human skeleton.

(III) Motion Imi-

tation base on RL:

Finally, we animate the humanoid character to

imitate various synthesized motions. Specically, based on recent

work [

], we propose a curriculum residual force control hu-

manoid control policy (CRP) by introducing a curriculum learning

paradigm that dynamically adjusts a residual force scale during RL

training, which can improves asymptotic RL performance on imi-

tating various synthesized motions. To verify the eectiveness of

our Skeleton2Humanoid system, we select “motion in-betweening”

task, as it is a recent proposed challenging motion prediction task

[

] for evaluation. Motion in-betweening aims at predicting

the transition motions between the past given keyframes and a

provided future keyframe. Experiments on challenging LaFAN1

dataset show the superiority of our Skeleton2Humanoid system.

The main contributions of this paper are as follows:

(1)

present Skeleton2Humanoid, a new system that converts human

skeleton motions to humanoid character motions to produce physi-

cal plausible motions.

(2)

Our proposed test time adaptation stage

can further improve the prediction accuracy and physical plausibil-

ity on large mocap dataset LaFAN1 for the motion in-betweening

task. With test time adaptation, we achieve a new benchmark accu-

racy on the motion in-betweening task.

(3)

Our proposed curricu-

lum residual force control policy enables ner character control and

outperforms prior arts on motion imitation.

(4)

Our whole Skele-

ton2Humanoid system signicantly improves the performance of

human in-betweening motions on physical plausibility and achieves

comparable motion prediction accuracy.

2 RELATED WORK

Human/character motion synthesis

: Motion synthesis is a gen-

eral term which contains several tasks including motion prediction,

in-betweening and completion. Motion prediction aims at predict-

ing future human motions given past motions. Deterministic mo-

tion prediction estimates a single accurate motion and prior works

used various network architectures including recurrent neural net-

work [

–

], graph convolution network [

] or transformer [

]

to model human motions. Stochastic motion prediction produces

diverse future human motions by utilizing generative model such

as VAE [

], GAN [

]. Motion completion and

in-betweening aim at lling gaps of motion with predened key-

frame constraints. Current works utilized convolution networks

[

], recurrent networks [

] or transformers [

] to

synthesize accurate and consistent results. For instance, Harvey et

al. [

] proposed a transition generation technique based on recur-

rent neural networks for motion in-betweening task. Duan et al.

[

] utilized transformer architecture to model human motions in a

sequence-to-sequence manner for the motion in-betweening task.

Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening MM ’22, October 10–14, 2022, Lisboa, Portugal

Figure 2: An overview of our Skeleton2Humanoid system on the motion in-betweening task. Given the test data containing

past keyframes𝑚−9:0 and a future keyframe𝑚𝑇, Stage I optimizes skeleton joint locations by test time adaptation and produces

more plausible skeleton motions ^

𝑚1:𝑇−1. Stage II converts the optimized skeleton motions ^

𝑚1:𝑇−1to humanoid motions ˜

𝑚1:𝑇−1

in the physics simulator by analytical inverse kinematics. Stage III nally drives the humanoid to mimic converted skeleton

motions ˜

𝑚1:𝑇−1to produce physically-plausible humanoid motions 𝑚1:𝑇−1.

Test Time Adaptation

: Test time adaptation is a recently proposed

method that utilize the self-supervised distribution information

from the test data presented at test time to quickly adapt models

with a few gradient steps [

], which can further improve

the model performance on test data. The rst work [

] introduced

test time adaptation by proposing an auxiliary branch with self-

supervised rotation prediction loss to adapt the classication model.

Wang et al. [

] minimized the predicted entropy of classication

model on test data to improve the performance. Recently, more

works start to utilize test time adaptation on the 2D/3D human

pose related task [

]: For instance, Guan et al. [

] proposed

an online bilevel adaptation framework for 3D human mesh recon-

struction which greatly improves model generalization. In contrast

to other works, Our approach is the rst one to study test time

adaptation on the human motion in-betweening task.

Reinforcement Learning for Humanoid Character Control

Deep RL is a promising approach for learning character control

policies [

–

] to help character perform various motions. Peng et

al. [

] rst utilize hand craft rewards to imitate a single sequence

of human poses. Recently, some works [

] used RL to

produce simple human motions from egocentric videos for ego-pose

estimation or 3d human pose estimation. Yuan et al. [

] proposed

to add external residual forces and help characters to better imitate

agile single reference motions. In addition, some works [

–

]

utilized deep RL to learn a interactive controllable policies from

large motion capture data for character animation. However, Prior

works mostly focused on learning control policies on motion cap-

ture data, while we learn a policy to imitate synthesized motions.

We propose a curriculum residual force control policy (CRP) that

can better imitate diverse motions.

3 APPROACH

3.1 System Overview

The human motion in-betweening task can be formulated as: given

the past 10 human skeleton poses

𝑚−9:0

and a future skeleton

keyframe

𝑚𝑇

at time T, we want to recover the ground truth

Figure 3: Details of our test time adaptation method. ^

𝑞,

𝑐𝑜𝑛𝑡𝑎𝑐𝑡, FK and ^

𝑝represent the predicted root positions, the

contact prediction of feet joints, the forward kinematic pro-

cess and 3d joint positions for human skeleton, respectively.

transition motions

𝑚1:𝑇−1

. Given a pretrained typical motion in-

betweening network [

], our Skeleton2Humanoid performs a physics-

oriented motion correction consists of 3 stages as presented in Fig.

2 to optimize synthesized in-betweening motions. Stage I optimizes

the pretrained motion in-betweening network at test time to pre-

dict more physically-plausible skeleton transition motions

𝑚1:𝑇−1

Then Stage II transfers the optimized skeleton motions

𝑚1:𝑇−1

humanoid motions

𝑚1:𝑇−1

through analytical inverse kinematics.

Finally, Stage III learns a curriculum residual force control policy

to imitate the transferred humanoid motions

𝑚1:𝑇−1

to produce

physically-plausible humanoid motions 𝑚1:𝑇−1.

In our Skeleton2Humanoid framework,

𝑚𝑡

and

𝑚𝑡

are skeleton

motions, and

𝑚𝑡

is represented by

𝑚𝑡≜(𝑞𝑡, 𝑟𝑡, 𝑝𝑡)

, where

𝑞𝑡

and

𝑟𝑡

denote body joint angles in quaternions and root translation,

𝑝𝑡

de-

notes 3d joint positions calculated by forward kinematics. Similarly,

𝑚𝑡≜(^

𝑞𝑡,^

𝑟𝑡,^

𝑝𝑡)

. In addition,

𝑚𝑡

and

𝑚𝑡

are humanoid motions,

𝑚𝑡

is represented by

𝑚𝑡≜(˜

𝑞𝑡,˜

𝑟𝑡,˜

𝑝𝑡)

, where

𝑞𝑡

𝑟𝑡

and

𝑝𝑡

denote joint

angles in euler angles, root translation and 3d joint positions of the

reference humanoid motions. Similarly, 𝑚𝑡≜(𝑞𝑡, 𝑟𝑡, 𝑝𝑡).

3.2 Test Time Motion In-betweening Network

Adaptation

3.2.1 Adaptation for Physically-plausible Skeleton Motion. Previ-

ous human motion in-betweening model [

] has shown great per-

formance on synthesizing accurate human motions. However, it

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Skeleton2Humanoid:AnimatingSimulatedCharactersforPhysically-plausibleMotionIn-betweeningYunhaoLi∗InstituteofImageCommunicationandNetworkEngineeringShanghaiJiaoTongUniversityShanghai,Chinalyhsjtu@sjtu.edu.cnZhenboYu∗ShanghaiJiaoTongUniversityShanghai,Chinayuzhenbo@sjtu.edu.cnYuchengZhuInstituteofImag...

展开>> 收起<<

Skeleton2Humanoid Animating Simulated Characters for Physically-plausible Motion In-betweening Yunhao Li.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Skeleton2Humanoid Animating Simulated Characters for Physically-plausible Motion In-betweening Yunhao Li

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: