Contact2Grasp 3D Grasp Synthesis via Hand-Object Contact Constraint Haoming Li1Xinzhuo Lin1Yang Zhou2Xiang Li2Yuchi Huo3Jiming Chen1andQi Ye1 1Key Lab of CS AUS Zhejiang University Hangzhou China

2025-05-02 0 0 2.43MB 9 页 10玖币

侵权投诉

Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint

Haoming Li1,Xinzhuo Lin1,Yang Zhou2,Xiang Li2,Yuchi Huo3,Jiming Chen1and Qi Ye1∗

1Key Lab of CS&AUS, Zhejiang University, Hangzhou, China

2OPPO US Research Center, Palo Alto, USA

3State Key Lab of CAD&CG and Zhejiang Lab, Zhejiang University, Hangzhou, China

{haomingli, linxinzhuo, cjm, qi.ye}@zju.edu.cn, {yang.zhou, xiang.li}@oppo.com,

huo.yuchi.sc@gmail.com

Abstract

3D grasp synthesis generates grasping poses given

an input object. Existing works tackle the prob-

lem by learning a direct mapping from objects

to the distributions of grasping poses. However,

because the physical contact is sensitive to small

changes in pose, the high-nonlinear mapping be-

tween 3D object representation to valid poses is

considerably non-smooth, leading to poor genera-

tion efﬁciency and restricted generality. To tackle

the challenge, we introduce an intermediate vari-

able for grasp contact areas to constrain the grasp

generation; in other words, we factorize the map-

ping into two sequential stages by assuming that

grasping poses are fully constrained given contact

maps: 1) we ﬁrst learn contact map distributions to

generate the potential contact maps for grasps; 2)

then learn a mapping from the contact maps to the

grasping poses. Further, we propose a penetration-

aware optimization with the generated contacts as a

consistency constraint for grasp reﬁnement. Exten-

sive validations on two public datasets show that

our method outperforms state-of-the-art methods

regarding grasp generation on various metrics.

1 Introduction

3D grasp synthesis studies the problem of generating grasp-

ing poses given an input object. It has wide applications rang-

ing from animation, human-computer interaction to robotic

grasping. Though it has been researched for many years,

only a limited number of works about 3D grasp generation

using deep learning have been proposed due to the lack of

large grasping data [Corona et al., 2020; Taheri et al., 2020;

Jiang et al., 2021; Karunratanakul et al., 2020; Zhang et al.,

2021; Taheri et al., 2021]. Recently, a dataset for human

grasping objects with annotations of full body meshes and

objects meshes have been collected by a multi-view capture

rig, and a coarse-to-ﬁne hand pose generation network based

on a conditional autoencoder (CVAE) is proposed [Taheri et

al., 2020]. In [Karunratanakul et al., 2020], a new implicit

representation is proposed for hand and object interactions,

∗Corresponding author.

TypeA TypeB

(a)

(b)

TypeA TypeB

(a)

Figure 1: Interpolated contact maps and grasps between different

generated contacts (TypeA and TypeB) by our method. Note that

the grasping poses (e.g. ﬁnger positions denoted in the yellow circle

and arrow) change with transitions between two types of contacts

and a small change in a valid contact map produces another valid

grasp. The intermediate contact maps reduce the non-smooth high-

nonlinear pose generation problem to a map generation problem in

a low-dimension and smooth manifold, beneﬁting generation efﬁ-

ciency and generality.

and a similar CVAE method is used for static grasps genera-

tion. Taheri et al. [Taheri et al., 2021]take a step further to

learn dynamic grasping sequences including the whole body

motion given an object, instead of static grasping poses.

Existing methods treat the generation as a black box map-

ping from an object to its grasp pose distribution. However,

this formulation has its defects. On one hand, the mapping

from the 3D object space to the pose space represented by

rotations is highly non-linear. On the other hand, physical

contact is sensitive to small changes in pose, e.g., less than

a millimeter change in the pose of a ﬁngertip normal to the

surface of an object can make the difference between the ob-

ject being held or dropped on the ﬂoor [Grady et al., 2021].

Therefore, the mapping between 3D object representation to

valid poses is non-smooth, as a small change in the pose could

make a valid pose invalid. These defects raise a challenge for

the network to learn the sparse mapping and generalize to un-

seen valid poses in the highly non-linear space.

In robotics, contact areas between agents and objects are

found to be important [Deng et al., 2021; Roy and Todor-

ovic, 2016; Zhu et al., 2015]because localizing the position

of possible grasps can greatly help the planning of actions for

robotic hands [Mo et al., 2021; Wu et al., 2021; Mandikal

and Grauman, 2021; Mandikal and Grauman, 2022]. For ex-

arXiv:2210.09245v3 [cs.RO] 6 May 2023

Figure 2: The framework of our method. It consists of three stages: ContactCVAE, GraspNet and Penetration-aware Partial Optimization.

ContactCVAE takes an object point cloud Oas input and generates a contact map C0. GraspNet estimates a grasp parameterized by θfrom

the contact map C0. Finally, penetration-aware partial (PAP) optimization reﬁnes θto get the ﬁnal grasp.

ample, [Mo et al., 2021]and [Wu et al., 2021]ﬁrst estimate

the contact points for parallel-jaw grippers and plan paths to

grasp the target objects. The common assumption in the lit-

erature is that the contact area is a point and the contact point

generation is treated as a per-point (or pixel voxel) detection

problem, i.e. classifying each 3D object point to be a con-

tact or not, which cannot be applied to dexterous hand grasps

demonstrating much more complex contact. For dexterous

robotic hand grasping, recent work [Mandikal and Grauman,

2021]ﬁnds that leveraging contact areas from human grasp

can improve the grasping success rate in a reinforcement

learning framework. However, it assumes an object only af-

fords one grasp, which contradicts the real case and limits its

application.

To tackle the limitations, we propose to leverage contact

maps to constrain the grasp synthesis. Speciﬁcally, we fac-

torize the learning task into two sequential stages, rather than

taking a black-box hand pose generative network that directly

maps an object to the possible grasping poses in previous

work. In the ﬁrst stage, we generate multiple hypotheses of

the grasping contact areas, represented by binary 3D segmen-

tation maps. In the second stage, we learn a mapping from the

contact to the grasping pose by assuming the grasping pose is

fully constrained given a contact map.

The intermediate segmentation contact maps align with the

smooth manifold of the object surface: for example, a small

change in a valid contact map would likely produce another

valid solution (as illustrated in Figure 1), then the correspond-

ing pose can be deterministically established by the follow-

ing GraspNet and PAP optimization. This manner reduces

the challenging pose generation to an easier map generation

problem in a low-dimension and smooth manifold, beneﬁting

generation efﬁciency and generality.

The other beneﬁt of the intermediate contact representation

is enabling the optimization from the contacts. Different from

the optimization for the full grasps from scratch [Brahmbhatt

et al., 2019b; Xing et al., 2022], we propose a penetration-

aware partial (PAP) optimization with the intermediate con-

tacts. It detects partial poses causing penetration and lever-

ages the generated contact maps as a consistency constraint

for the reﬁnement of the partial poses. The PAP optimization

constrains gradients from wrong partial poses to affect these

poses requiring adjustment only, which results in better grasp

quality than a global optimization method.

In summary, our key contributions are: 1) we tackle the

high non-linearity problem of the 3D generation problem by

introducing the contact map constraint and factorizing the

generation in two stages: contact map generation and map-

ping from contact maps to grasps; 2) we propose a PAP op-

timization with the intermediate contacts for the grasp re-

ﬁnement; 3) beneﬁting from the two decomposed learning

stages and partial optimization, our method outperforms ex-

isting methods both quantitatively and qualitatively.

2 Related Works

Human grasp generation is a challenging task due to the

higher degrees of freedom of human hands and the require-

ment of the generated hands to interact with objects in a phys-

ically reasonable manner. Most methods use models such as

MANO [Romero et al., 2017]to parameterize hand poses,

aiming to directly learn a latent conditional distribution of the

hand parameters given objects via large datasets. The distri-

bution is usually learned by generative network models such

as Conditional Variational Auto-Encoder [Sohn et al., 2015],

or Adversarial Generative Networks [Arjovsky et al., 2017].

To get ﬁner poses, many existing works adopt a coarse-to-ﬁne

strategy by learning the residuals of the grasping poses in the

reﬁnement stage. [Corona et al., 2020]uses a generative ad-

versarial network to obtain an initial grasp, and then an extra

network to reﬁne it. [Taheri et al., 2020]follows a similar

strategy but uses a CVAE model to output an initial grasp.

In recent works, contact maps are exploited to improve

robotic grasping, hand object reconstruction, and 3D grasp

synthesis. [Brahmbhatt et al., 2019b]introduces a loss for

robotic grasping optimization using contact maps captured

from thermal cameras [Brahmbhatt et al., 2019a; Brahmb-

hatt et al., 2020]to ﬁlter and rank random grasps sampled

by GraspIt! [Miller and Allen, 2004]. It concludes that syn-

thesized grasping poses optimized directly from the contact

demonstrate superior quality to other approaches which kine-

matically re-target observed human grasps to the target hand

model. In the reconstruction of the hand-object interaction,

[Grady et al., 2021]propose a differentiable contact optimiza-

tion to reﬁne the hand pose reconstructed from an image. In

the 3D grasp synthesis, [Jiang et al., 2021]also exploits con-

tact maps but they only use them to reﬁne generated grasps

during inference. Our work differs from these works using

contact maps in three aspects: 1) these works use contact

maps as a loss for the grasp optimization or post-processing

for further grasp reﬁnement while our work exploits the con-

tact maps as an intermediate constraint for the learning of the

grasp distribution; 2) in contrast to the learning-based works

with contact maps which treat objects-to-grasps as a black

box, our work factorizes the grasp synthesis into objects-to-

contact maps and contact maps-to-grasps; 3) moreover, these

works reﬁne the whole grasps with global optimization meth-

ods using contact maps while our penetration-aware partial

optimization detects the partial poses causing the penetration

and leverages the contact map constraint to optimize the par-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Contact2Grasp:3DGraspSynthesisviaHand-ObjectContactConstraintHaomingLi1,XinzhuoLin1,YangZhou2,XiangLi2,YuchiHuo3,JimingChen1andQiYe11KeyLabofCS&AUS,ZhejiangUniversity,Hangzhou,China2OPPOUSResearchCenter,PaloAlto,USA3StateKeyLabofCAD&CGandZhejiangLab,ZhejiangUniversity,Hangzhou,Chinafhaomingli,linxi...

展开>> 收起<<

Contact2Grasp 3D Grasp Synthesis via Hand-Object Contact Constraint Haoming Li1Xinzhuo Lin1Yang Zhou2Xiang Li2Yuchi Huo3Jiming Chen1andQi Ye1 1Key Lab of CS AUS Zhejiang University Hangzhou China.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Contact2Grasp 3D Grasp Synthesis via Hand-Object Contact Constraint Haoming Li1Xinzhuo Lin1Yang Zhou2Xiang Li2Yuchi Huo3Jiming Chen1andQi Ye1 1Key Lab of CS AUS Zhejiang University Hangzhou China

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: