DexGraspNet A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang1 Jialiang Zhang1 Jiayi Chen12 Yinzhen Xu12 Puhao Li23 Tengyu Liu2 He Wang1y

2025-04-27 0 0 4.37MB 8 页 10玖币

侵权投诉

DexGraspNet: A Large-Scale Robotic Dexterous

Grasp Dataset for General Objects Based on Simulation

Ruicheng Wang1∗, Jialiang Zhang1∗, Jiayi Chen1,2, Yinzhen Xu1,2, Puhao Li2,3, Tengyu Liu2, He Wang1†

Abstract— Robotic dexterous grasping is the ﬁrst step to

enable human-like dexterous object manipulation and thus

a crucial robotic technology. However, dexterous grasping is

much more under-explored than object grasping with parallel

grippers, partially due to the lack of a large-scale dataset. In this

work, we present a large-scale robotic dexterous grasp dataset,

DexGraspNet, generated by our proposed highly efﬁcient syn-

thesis method that can be generally applied to any dexterous

hand. Our method leverages a deeply accelerated differentiable

force closure estimator and thus can efﬁciently and robustly

synthesize stable and diverse grasps on a large scale. We choose

ShadowHand and generate 1.32 million grasps for 5355 objects,

covering more than 133 object categories and containing more

than 200 diverse grasps for each object instance, with all grasps

having been validated by the Isaac Gym simulator. Compared to

the previous dataset from Liu et al. generated by GraspIt!, our

dataset has not only more objects and grasps, but also higher

diversity and quality. Via performing cross-dataset experiments,

we show that training several algorithms of dexterous grasp

synthesis on our dataset signiﬁcantly outperforms training on

the previous one. To access our data and code, including code

for human and Allegro grasp synthesis, please visit our project

page: https://pku-epic.github.io/DexGraspNet/.

I. INTRODUCTION

Robotic object grasping is an important technology for

many robotic systems. Recent years have witnessed great

success in developing vision-based grasping methods [1–

6] and large-scale datasets for parallel-jaw grippers, e.g.,

synthetic object-centric dataset, ACRONYM [7], and real-

world dataset of grasping in clutter, GraspNet [3].

Although simple and effective for pick-and-place, parallel-

jaw grippers show certain limitations in dexterous object

manipulation, e.g., using scissors, due to their low DoFs.

On the contrary, multi-ﬁngered robotic hands, e.g., Shadow-

Hand [8], are human-like, designed with very high DoFs

(26 for ShadowHand), and can attain more diverse grasp

types. Those dexterous hands can support many complex and

diverse manipulations, e.g., solving Rubik’s cube [11], and

can be used in task-speciﬁc grasping [12].

Arguably, dexterous grasping is the ﬁrst step to dexterous

manipulation. However, dexterous grasping is highly under-

explored, compared to parallel grasping. One major obstacle

is the lack of large-scale robotic dexterous grasping datasets

required by learning-based methods. Up to now, the only

dataset is provided by Liu et al. [9] (Deep Differentiable

Grasp, referred to as DDG), which contains only 6.9K grasps

1Peking University

2Beijing Institute for General Artiﬁcial Intelligence

3Tsinghua University

∗Equal contribution

†Corresponding author: hewang@pku.edu.cn

Fig. 1: A visualization of DexGraspNet. DexGraspNet con-

tains 1.32M grasps of ShadowHand [8] on 5355 objects,

which is two orders of magnitudes larger than the previous

dataset from DDG [9]. It features diverse types of grasping

that cannot be achieved using GraspIt! [10].

and 565 objects and is much smaller than the grasp datasets

for parallel grippers, e.g., GraspNet [3], ACRONYM [7].

Considering the high-DoF nature of the dexterous hand,

dexterous grasping datasets need to be signiﬁcantly larger

and more diverse for the sake of generalization.

In this work, we propose DexGraspNet, a large-scale

simulated dataset for robotic dexterous grasping. This dataset

contains 1.32 million dexterous grasps for ShadowHand on

5355 objects, with more than 200 diverse grasps for each ob-

ject instance. The objects are from more than 133 hand-scale

object categories and collected from various synthetic and

scanned object datasets. In addition to the scale, our dataset

also features high diversity and high physical stability. All

grasps have been examined by force closure and validated

by Isaac Gym [13] physics simulator, enabling further tasks

in both real-world and simulation environments.

Note that synthesizing diverse high-quality dexterous

grasps at scale is known to be very challenging. For dex-

terous grasping data synthesis, previous works, e.g., DDG,

mainly use GraspIt! [10], which lacks diversity in grasping

poses due to its naive search strategy. A recent work [19]

proposes a novel method to address this diversity issue. This

work devises a differentiable energy term to approximate

arXiv:2210.02697v2 [cs.RO] 8 Mar 2023

TABLE I: Dexterous Grasp Dataset Comparison

Dataset Hand Observations Sim./Real Grasps Obj.(Cat.) Grasps per Obj. Method

ObMan [14] MANO - Sim. 27k 2772(8) 10 GraspIt!

HO3D [15] MANO RGBD Real 77k 10 >7k Estimation

DexYCB [16] MANO RGBD Real 582K 20 >29k Human annotation

ContactDB [17] MANO RGBD+thermal Real 3750 50 75 Capture

ContactPose [18] MANO RGBD Real 2306 25 92 Capture

DDGdata [9] ShadowHand - Sim. 6.9k 565 >100 GraspIt!

DexGraspNet (Ours) ShadowHand - Sim. 1.32M 5355(133) >200 Optimization

force closure and then uses it to synthesize diverse and

stable grasps via optimization. However, [19] suffers from

low yield, slow convergence, and strict constraints on object

meshes, making it infeasible for us to use for synthesizing a

large-scale dataset.

To achieve our desired diversity, quality, and scale, we pro-

pose several critical improvements to [19], making it much

more efﬁcient and robust. First, we design a better hand pose

initialization strategy and carefully select contact candidates

to boost yield. For synthesizing 10000 valid grasps, we speed

up from 400 GPU hours to 7 GPU hours. Second, we propose

an alternative way to compute penetration energy and signed

distances, which enables us to handle object meshes of much

lower quality, and also highly simpliﬁes their preprocessing

procedures. Third, we introduce energy terms that punish

self-penetration and out-of-limit joint angles to further im-

prove grasp quality. Additionally, with simple modiﬁcations,

the entire pipeline can be applied to other dexterous hands,

such as MANO [20] and Allegro.

To verify the advantage of our dataset over the one from

DDG, we train two dexterous grasping algorithms on our

dataset and DDG. The cross-dataset experiments conﬁrm that

training on our dataset yields better grasping quality and

higher diversity. Also, the great diversity of the hand grasps

from our dataset leaves huge improvement space for future

dexterous grasping algorithms.

II. RELATED WORK

Researches in grasping can be broadly categorized by the

types of end effectors involved. The most thoroughly studied

ones are the suction cup and parallel jaw grippers, whose

grasp pose can be deﬁned by a 7D vector at most, including

3D for translation, 3D for rotation, and 1D for the width

between the two ﬁngers. Dexterous robotic hands with three

or more ﬁngers such as ShadowHand [8] and humanoid

hands such as MANO [20] require more complex descriptors,

sometimes up to 24DoF as in ShadowHand [8]. In this

paper, we are dedicated to researches on the latter type. To

bridge the gap between humanoid hands and robotic hands,

numerous researches have shown the efﬁcacy of retargeting

humanoid hand poses to dexterous robotic hands [21–24].

A. Analytical Grasping

Early researches in dexterous grasping focus on optimizing

grasping poses to form force closure that can resist external

forces and torques [25–28].

Due to the complexity of computing hand kinematics and

testing force closure, many works were devoted to simpli-

fying the search space [29–31]. As a result, these methods

were applicable to restricted settings and can only produce

limited types of grasping poses. Another stream of work [32–

34] looks for simplifying the optimization process with an

auxiliary function. [19] proposed to use a differentiable

estimator of the force closure metric to synthesize diverse

grasping poses for arbitrary hands.

B. Data-Driven Grasping

Recent works shift their focus to data-driven methods.

Given an object, the most straightforward approach is to

directly generate the pose vectors of the grasping hand [35–

39]. A reﬁnement step is usually implemented in these

methods to remove inconsistencies such as penetration.

Other methods take an indirect approach that involves gen-

erating an intermediate representation ﬁrst. Existing methods

use contact points [40–42], contact maps [21, 22, 43–45], and

occupancy ﬁelds [46] as the intermediate representations.

The methods then obtain the grasping poses via optimiza-

tion [40, 41, 44, 46], planning [43], RL policies [22, 42], or

another generative model [45].

Compared to most analytical methods, data-driven meth-

ods show improved inference speed and diversity of gener-

ated grasping poses. However, the diversity is still limited

by the training data.

C. Dexterous Grasp Datasets

Dexterous grasping is impossibly difﬁcult to annotate

for its overwhelming degrees of freedom. Most existing

works are trained on programmatically synthesized grasping

poses [9, 14, 38, 47] using the GraspIt! [10] planner. The

planner ﬁrst searches the eigengrasp space for pregrasp

poses that cross a threshold. Then, the planner squeezes

all ﬁngers in the selected pregrasp poses to construct a

ﬁrm grasp. Since the initial search is performed in the low

dimensional eigengrasp space, the resulting data follows a

narrow distribution and cannot cover the full dexterity of

multi-ﬁnger hands.

More recent works leverage the increasing capacity of

computer vision to collect human hand poses when inter-

acting with the object. HO3D [15, 48] computes the ground

truth 3D hand pose for images from 2D hand keypoint

annotations. The method resolves ambiguities by considering

physics constraints in hand-object interactions and hand-hand

interactions. DexYCB [16] and ContactPose [18] solve the

3D hand shape from multi-view RGBD camera recordings.

Latest datasets [49–51] use optical motion capture systems to

track hand and object shapes during interactions. While these

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DexGraspNet:ALarge-ScaleRoboticDexterousGraspDatasetforGeneralObjectsBasedonSimulationRuichengWang1,JialiangZhang1,JiayiChen1;2,YinzhenXu1;2,PuhaoLi2;3,TengyuLiu2,HeWang1yAbstractRoboticdexterousgraspingistherststeptoenablehuman-likedexterousobjectmanipulationandthusacrucialrobotictechnology.How...

展开>> 收起<<

DexGraspNet A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang1 Jialiang Zhang1 Jiayi Chen12 Yinzhen Xu12 Puhao Li23 Tengyu Liu2 He Wang1y.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DexGraspNet A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang1 Jialiang Zhang1 Jiayi Chen12 Yinzhen Xu12 Puhao Li23 Tengyu Liu2 He Wang1y

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: