DexGraspNet A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang1 Jialiang Zhang1 Jiayi Chen12 Yinzhen Xu12 Puhao Li23 Tengyu Liu2 He Wang1y

2025-04-27 0 0 4.37MB 8 页 10玖币
侵权投诉
DexGraspNet: A Large-Scale Robotic Dexterous
Grasp Dataset for General Objects Based on Simulation
Ruicheng Wang1, Jialiang Zhang1, Jiayi Chen1,2, Yinzhen Xu1,2, Puhao Li2,3, Tengyu Liu2, He Wang1
Abstract Robotic dexterous grasping is the first step to
enable human-like dexterous object manipulation and thus
a crucial robotic technology. However, dexterous grasping is
much more under-explored than object grasping with parallel
grippers, partially due to the lack of a large-scale dataset. In this
work, we present a large-scale robotic dexterous grasp dataset,
DexGraspNet, generated by our proposed highly efficient syn-
thesis method that can be generally applied to any dexterous
hand. Our method leverages a deeply accelerated differentiable
force closure estimator and thus can efficiently and robustly
synthesize stable and diverse grasps on a large scale. We choose
ShadowHand and generate 1.32 million grasps for 5355 objects,
covering more than 133 object categories and containing more
than 200 diverse grasps for each object instance, with all grasps
having been validated by the Isaac Gym simulator. Compared to
the previous dataset from Liu et al. generated by GraspIt!, our
dataset has not only more objects and grasps, but also higher
diversity and quality. Via performing cross-dataset experiments,
we show that training several algorithms of dexterous grasp
synthesis on our dataset significantly outperforms training on
the previous one. To access our data and code, including code
for human and Allegro grasp synthesis, please visit our project
page: https://pku-epic.github.io/DexGraspNet/.
I. INTRODUCTION
Robotic object grasping is an important technology for
many robotic systems. Recent years have witnessed great
success in developing vision-based grasping methods [1–
6] and large-scale datasets for parallel-jaw grippers, e.g.,
synthetic object-centric dataset, ACRONYM [7], and real-
world dataset of grasping in clutter, GraspNet [3].
Although simple and effective for pick-and-place, parallel-
jaw grippers show certain limitations in dexterous object
manipulation, e.g., using scissors, due to their low DoFs.
On the contrary, multi-fingered robotic hands, e.g., Shadow-
Hand [8], are human-like, designed with very high DoFs
(26 for ShadowHand), and can attain more diverse grasp
types. Those dexterous hands can support many complex and
diverse manipulations, e.g., solving Rubik’s cube [11], and
can be used in task-specific grasping [12].
Arguably, dexterous grasping is the first step to dexterous
manipulation. However, dexterous grasping is highly under-
explored, compared to parallel grasping. One major obstacle
is the lack of large-scale robotic dexterous grasping datasets
required by learning-based methods. Up to now, the only
dataset is provided by Liu et al. [9] (Deep Differentiable
Grasp, referred to as DDG), which contains only 6.9K grasps
1Peking University
2Beijing Institute for General Artificial Intelligence
3Tsinghua University
Equal contribution
Corresponding author: hewang@pku.edu.cn
Fig. 1: A visualization of DexGraspNet. DexGraspNet con-
tains 1.32M grasps of ShadowHand [8] on 5355 objects,
which is two orders of magnitudes larger than the previous
dataset from DDG [9]. It features diverse types of grasping
that cannot be achieved using GraspIt! [10].
and 565 objects and is much smaller than the grasp datasets
for parallel grippers, e.g., GraspNet [3], ACRONYM [7].
Considering the high-DoF nature of the dexterous hand,
dexterous grasping datasets need to be significantly larger
and more diverse for the sake of generalization.
In this work, we propose DexGraspNet, a large-scale
simulated dataset for robotic dexterous grasping. This dataset
contains 1.32 million dexterous grasps for ShadowHand on
5355 objects, with more than 200 diverse grasps for each ob-
ject instance. The objects are from more than 133 hand-scale
object categories and collected from various synthetic and
scanned object datasets. In addition to the scale, our dataset
also features high diversity and high physical stability. All
grasps have been examined by force closure and validated
by Isaac Gym [13] physics simulator, enabling further tasks
in both real-world and simulation environments.
Note that synthesizing diverse high-quality dexterous
grasps at scale is known to be very challenging. For dex-
terous grasping data synthesis, previous works, e.g., DDG,
mainly use GraspIt! [10], which lacks diversity in grasping
poses due to its naive search strategy. A recent work [19]
proposes a novel method to address this diversity issue. This
work devises a differentiable energy term to approximate
arXiv:2210.02697v2 [cs.RO] 8 Mar 2023
TABLE I: Dexterous Grasp Dataset Comparison
Dataset Hand Observations Sim./Real Grasps Obj.(Cat.) Grasps per Obj. Method
ObMan [14] MANO - Sim. 27k 2772(8) 10 GraspIt!
HO3D [15] MANO RGBD Real 77k 10 >7k Estimation
DexYCB [16] MANO RGBD Real 582K 20 >29k Human annotation
ContactDB [17] MANO RGBD+thermal Real 3750 50 75 Capture
ContactPose [18] MANO RGBD Real 2306 25 92 Capture
DDGdata [9] ShadowHand - Sim. 6.9k 565 >100 GraspIt!
DexGraspNet (Ours) ShadowHand - Sim. 1.32M 5355(133) >200 Optimization
force closure and then uses it to synthesize diverse and
stable grasps via optimization. However, [19] suffers from
low yield, slow convergence, and strict constraints on object
meshes, making it infeasible for us to use for synthesizing a
large-scale dataset.
To achieve our desired diversity, quality, and scale, we pro-
pose several critical improvements to [19], making it much
more efficient and robust. First, we design a better hand pose
initialization strategy and carefully select contact candidates
to boost yield. For synthesizing 10000 valid grasps, we speed
up from 400 GPU hours to 7 GPU hours. Second, we propose
an alternative way to compute penetration energy and signed
distances, which enables us to handle object meshes of much
lower quality, and also highly simplifies their preprocessing
procedures. Third, we introduce energy terms that punish
self-penetration and out-of-limit joint angles to further im-
prove grasp quality. Additionally, with simple modifications,
the entire pipeline can be applied to other dexterous hands,
such as MANO [20] and Allegro.
To verify the advantage of our dataset over the one from
DDG, we train two dexterous grasping algorithms on our
dataset and DDG. The cross-dataset experiments confirm that
training on our dataset yields better grasping quality and
higher diversity. Also, the great diversity of the hand grasps
from our dataset leaves huge improvement space for future
dexterous grasping algorithms.
II. RELATED WORK
Researches in grasping can be broadly categorized by the
types of end effectors involved. The most thoroughly studied
ones are the suction cup and parallel jaw grippers, whose
grasp pose can be defined by a 7D vector at most, including
3D for translation, 3D for rotation, and 1D for the width
between the two fingers. Dexterous robotic hands with three
or more fingers such as ShadowHand [8] and humanoid
hands such as MANO [20] require more complex descriptors,
sometimes up to 24DoF as in ShadowHand [8]. In this
paper, we are dedicated to researches on the latter type. To
bridge the gap between humanoid hands and robotic hands,
numerous researches have shown the efficacy of retargeting
humanoid hand poses to dexterous robotic hands [21–24].
A. Analytical Grasping
Early researches in dexterous grasping focus on optimizing
grasping poses to form force closure that can resist external
forces and torques [25–28].
Due to the complexity of computing hand kinematics and
testing force closure, many works were devoted to simpli-
fying the search space [29–31]. As a result, these methods
were applicable to restricted settings and can only produce
limited types of grasping poses. Another stream of work [32–
34] looks for simplifying the optimization process with an
auxiliary function. [19] proposed to use a differentiable
estimator of the force closure metric to synthesize diverse
grasping poses for arbitrary hands.
B. Data-Driven Grasping
Recent works shift their focus to data-driven methods.
Given an object, the most straightforward approach is to
directly generate the pose vectors of the grasping hand [35–
39]. A refinement step is usually implemented in these
methods to remove inconsistencies such as penetration.
Other methods take an indirect approach that involves gen-
erating an intermediate representation first. Existing methods
use contact points [40–42], contact maps [21, 22, 43–45], and
occupancy fields [46] as the intermediate representations.
The methods then obtain the grasping poses via optimiza-
tion [40, 41, 44, 46], planning [43], RL policies [22, 42], or
another generative model [45].
Compared to most analytical methods, data-driven meth-
ods show improved inference speed and diversity of gener-
ated grasping poses. However, the diversity is still limited
by the training data.
C. Dexterous Grasp Datasets
Dexterous grasping is impossibly difficult to annotate
for its overwhelming degrees of freedom. Most existing
works are trained on programmatically synthesized grasping
poses [9, 14, 38, 47] using the GraspIt! [10] planner. The
planner first searches the eigengrasp space for pregrasp
poses that cross a threshold. Then, the planner squeezes
all fingers in the selected pregrasp poses to construct a
firm grasp. Since the initial search is performed in the low
dimensional eigengrasp space, the resulting data follows a
narrow distribution and cannot cover the full dexterity of
multi-finger hands.
More recent works leverage the increasing capacity of
computer vision to collect human hand poses when inter-
acting with the object. HO3D [15, 48] computes the ground
truth 3D hand pose for images from 2D hand keypoint
annotations. The method resolves ambiguities by considering
physics constraints in hand-object interactions and hand-hand
interactions. DexYCB [16] and ContactPose [18] solve the
3D hand shape from multi-view RGBD camera recordings.
Latest datasets [49–51] use optical motion capture systems to
track hand and object shapes during interactions. While these
摘要:

DexGraspNet:ALarge-ScaleRoboticDexterousGraspDatasetforGeneralObjectsBasedonSimulationRuichengWang1,JialiangZhang1,JiayiChen1;2,YinzhenXu1;2,PuhaoLi2;3,TengyuLiu2,HeWang1yAbstract—Roboticdexterousgraspingistherststeptoenablehuman-likedexterousobjectmanipulationandthusacrucialrobotictechnology.How...

展开>> 收起<<
DexGraspNet A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang1 Jialiang Zhang1 Jiayi Chen12 Yinzhen Xu12 Puhao Li23 Tengyu Liu2 He Wang1y.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:4.37MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注