Contact2Grasp 3D Grasp Synthesis via Hand-Object Contact Constraint Haoming Li1Xinzhuo Lin1Yang Zhou2Xiang Li2Yuchi Huo3Jiming Chen1andQi Ye1 1Key Lab of CS AUS Zhejiang University Hangzhou China

2025-05-02 0 0 2.43MB 9 页 10玖币
侵权投诉
Contact2Grasp: 3D Grasp Synthesis via Hand-Object Contact Constraint
Haoming Li1,Xinzhuo Lin1,Yang Zhou2,Xiang Li2,Yuchi Huo3,Jiming Chen1and Qi Ye1
1Key Lab of CS&AUS, Zhejiang University, Hangzhou, China
2OPPO US Research Center, Palo Alto, USA
3State Key Lab of CAD&CG and Zhejiang Lab, Zhejiang University, Hangzhou, China
{haomingli, linxinzhuo, cjm, qi.ye}@zju.edu.cn, {yang.zhou, xiang.li}@oppo.com,
huo.yuchi.sc@gmail.com
Abstract
3D grasp synthesis generates grasping poses given
an input object. Existing works tackle the prob-
lem by learning a direct mapping from objects
to the distributions of grasping poses. However,
because the physical contact is sensitive to small
changes in pose, the high-nonlinear mapping be-
tween 3D object representation to valid poses is
considerably non-smooth, leading to poor genera-
tion efficiency and restricted generality. To tackle
the challenge, we introduce an intermediate vari-
able for grasp contact areas to constrain the grasp
generation; in other words, we factorize the map-
ping into two sequential stages by assuming that
grasping poses are fully constrained given contact
maps: 1) we first learn contact map distributions to
generate the potential contact maps for grasps; 2)
then learn a mapping from the contact maps to the
grasping poses. Further, we propose a penetration-
aware optimization with the generated contacts as a
consistency constraint for grasp refinement. Exten-
sive validations on two public datasets show that
our method outperforms state-of-the-art methods
regarding grasp generation on various metrics.
1 Introduction
3D grasp synthesis studies the problem of generating grasp-
ing poses given an input object. It has wide applications rang-
ing from animation, human-computer interaction to robotic
grasping. Though it has been researched for many years,
only a limited number of works about 3D grasp generation
using deep learning have been proposed due to the lack of
large grasping data [Corona et al., 2020; Taheri et al., 2020;
Jiang et al., 2021; Karunratanakul et al., 2020; Zhang et al.,
2021; Taheri et al., 2021]. Recently, a dataset for human
grasping objects with annotations of full body meshes and
objects meshes have been collected by a multi-view capture
rig, and a coarse-to-fine hand pose generation network based
on a conditional autoencoder (CVAE) is proposed [Taheri et
al., 2020]. In [Karunratanakul et al., 2020], a new implicit
representation is proposed for hand and object interactions,
Corresponding author.
TypeA TypeB
(a)
(b)
TypeA TypeB
TypeA TypeB
(a)
Figure 1: Interpolated contact maps and grasps between different
generated contacts (TypeA and TypeB) by our method. Note that
the grasping poses (e.g. finger positions denoted in the yellow circle
and arrow) change with transitions between two types of contacts
and a small change in a valid contact map produces another valid
grasp. The intermediate contact maps reduce the non-smooth high-
nonlinear pose generation problem to a map generation problem in
a low-dimension and smooth manifold, benefiting generation effi-
ciency and generality.
and a similar CVAE method is used for static grasps genera-
tion. Taheri et al. [Taheri et al., 2021]take a step further to
learn dynamic grasping sequences including the whole body
motion given an object, instead of static grasping poses.
Existing methods treat the generation as a black box map-
ping from an object to its grasp pose distribution. However,
this formulation has its defects. On one hand, the mapping
from the 3D object space to the pose space represented by
rotations is highly non-linear. On the other hand, physical
contact is sensitive to small changes in pose, e.g., less than
a millimeter change in the pose of a fingertip normal to the
surface of an object can make the difference between the ob-
ject being held or dropped on the floor [Grady et al., 2021].
Therefore, the mapping between 3D object representation to
valid poses is non-smooth, as a small change in the pose could
make a valid pose invalid. These defects raise a challenge for
the network to learn the sparse mapping and generalize to un-
seen valid poses in the highly non-linear space.
In robotics, contact areas between agents and objects are
found to be important [Deng et al., 2021; Roy and Todor-
ovic, 2016; Zhu et al., 2015]because localizing the position
of possible grasps can greatly help the planning of actions for
robotic hands [Mo et al., 2021; Wu et al., 2021; Mandikal
and Grauman, 2021; Mandikal and Grauman, 2022]. For ex-
arXiv:2210.09245v3 [cs.RO] 6 May 2023
Figure 2: The framework of our method. It consists of three stages: ContactCVAE, GraspNet and Penetration-aware Partial Optimization.
ContactCVAE takes an object point cloud Oas input and generates a contact map C0. GraspNet estimates a grasp parameterized by θfrom
the contact map C0. Finally, penetration-aware partial (PAP) optimization refines θto get the final grasp.
ample, [Mo et al., 2021]and [Wu et al., 2021]first estimate
the contact points for parallel-jaw grippers and plan paths to
grasp the target objects. The common assumption in the lit-
erature is that the contact area is a point and the contact point
generation is treated as a per-point (or pixel voxel) detection
problem, i.e. classifying each 3D object point to be a con-
tact or not, which cannot be applied to dexterous hand grasps
demonstrating much more complex contact. For dexterous
robotic hand grasping, recent work [Mandikal and Grauman,
2021]finds that leveraging contact areas from human grasp
can improve the grasping success rate in a reinforcement
learning framework. However, it assumes an object only af-
fords one grasp, which contradicts the real case and limits its
application.
To tackle the limitations, we propose to leverage contact
maps to constrain the grasp synthesis. Specifically, we fac-
torize the learning task into two sequential stages, rather than
taking a black-box hand pose generative network that directly
maps an object to the possible grasping poses in previous
work. In the first stage, we generate multiple hypotheses of
the grasping contact areas, represented by binary 3D segmen-
tation maps. In the second stage, we learn a mapping from the
contact to the grasping pose by assuming the grasping pose is
fully constrained given a contact map.
The intermediate segmentation contact maps align with the
smooth manifold of the object surface: for example, a small
change in a valid contact map would likely produce another
valid solution (as illustrated in Figure 1), then the correspond-
ing pose can be deterministically established by the follow-
ing GraspNet and PAP optimization. This manner reduces
the challenging pose generation to an easier map generation
problem in a low-dimension and smooth manifold, benefiting
generation efficiency and generality.
The other benefit of the intermediate contact representation
is enabling the optimization from the contacts. Different from
the optimization for the full grasps from scratch [Brahmbhatt
et al., 2019b; Xing et al., 2022], we propose a penetration-
aware partial (PAP) optimization with the intermediate con-
tacts. It detects partial poses causing penetration and lever-
ages the generated contact maps as a consistency constraint
for the refinement of the partial poses. The PAP optimization
constrains gradients from wrong partial poses to affect these
poses requiring adjustment only, which results in better grasp
quality than a global optimization method.
In summary, our key contributions are: 1) we tackle the
high non-linearity problem of the 3D generation problem by
introducing the contact map constraint and factorizing the
generation in two stages: contact map generation and map-
ping from contact maps to grasps; 2) we propose a PAP op-
timization with the intermediate contacts for the grasp re-
finement; 3) benefiting from the two decomposed learning
stages and partial optimization, our method outperforms ex-
isting methods both quantitatively and qualitatively.
2 Related Works
Human grasp generation is a challenging task due to the
higher degrees of freedom of human hands and the require-
ment of the generated hands to interact with objects in a phys-
ically reasonable manner. Most methods use models such as
MANO [Romero et al., 2017]to parameterize hand poses,
aiming to directly learn a latent conditional distribution of the
hand parameters given objects via large datasets. The distri-
bution is usually learned by generative network models such
as Conditional Variational Auto-Encoder [Sohn et al., 2015],
or Adversarial Generative Networks [Arjovsky et al., 2017].
To get finer poses, many existing works adopt a coarse-to-fine
strategy by learning the residuals of the grasping poses in the
refinement stage. [Corona et al., 2020]uses a generative ad-
versarial network to obtain an initial grasp, and then an extra
network to refine it. [Taheri et al., 2020]follows a similar
strategy but uses a CVAE model to output an initial grasp.
In recent works, contact maps are exploited to improve
robotic grasping, hand object reconstruction, and 3D grasp
synthesis. [Brahmbhatt et al., 2019b]introduces a loss for
robotic grasping optimization using contact maps captured
from thermal cameras [Brahmbhatt et al., 2019a; Brahmb-
hatt et al., 2020]to filter and rank random grasps sampled
by GraspIt! [Miller and Allen, 2004]. It concludes that syn-
thesized grasping poses optimized directly from the contact
demonstrate superior quality to other approaches which kine-
matically re-target observed human grasps to the target hand
model. In the reconstruction of the hand-object interaction,
[Grady et al., 2021]propose a differentiable contact optimiza-
tion to refine the hand pose reconstructed from an image. In
the 3D grasp synthesis, [Jiang et al., 2021]also exploits con-
tact maps but they only use them to refine generated grasps
during inference. Our work differs from these works using
contact maps in three aspects: 1) these works use contact
maps as a loss for the grasp optimization or post-processing
for further grasp refinement while our work exploits the con-
tact maps as an intermediate constraint for the learning of the
grasp distribution; 2) in contrast to the learning-based works
with contact maps which treat objects-to-grasps as a black
box, our work factorizes the grasp synthesis into objects-to-
contact maps and contact maps-to-grasps; 3) moreover, these
works refine the whole grasps with global optimization meth-
ods using contact maps while our penetration-aware partial
optimization detects the partial poses causing the penetration
and leverages the contact map constraint to optimize the par-
摘要:

Contact2Grasp:3DGraspSynthesisviaHand-ObjectContactConstraintHaomingLi1,XinzhuoLin1,YangZhou2,XiangLi2,YuchiHuo3,JimingChen1andQiYe11KeyLabofCS&AUS,ZhejiangUniversity,Hangzhou,China2OPPOUSResearchCenter,PaloAlto,USA3StateKeyLabofCAD&CGandZhejiangLab,ZhejiangUniversity,Hangzhou,Chinafhaomingli,linxi...

展开>> 收起<<
Contact2Grasp 3D Grasp Synthesis via Hand-Object Contact Constraint Haoming Li1Xinzhuo Lin1Yang Zhou2Xiang Li2Yuchi Huo3Jiming Chen1andQi Ye1 1Key Lab of CS AUS Zhejiang University Hangzhou China.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:2.43MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注