GraspCaps A Capsule Network Approach for Familiar 6DoF Object Grasping Tomas van der Velde1 Hamed Ayoobi2 Hamidreza Kasaei1 1Department of Artificial Intelligence University of Groningen The Netherlands

2025-05-06 0 0 9.2MB 10 页 10玖币
侵权投诉
GraspCaps: A Capsule Network Approach for Familiar 6DoF Object Grasping
Tomas van der Velde1, Hamed Ayoobi2, Hamidreza Kasaei1*
1Department of Artificial Intelligence, University of Groningen, The Netherlands
2Department of Computing, Imperial College London, United Kingdom
Abstract
As robots become more widely available outside indus-
trial settings, the need for reliable object grasping and ma-
nipulation is increasing. In such environments, robots must
be able to grasp and manipulate novel objects in various
situations. This paper presents GraspCaps, a novel archi-
tecture based on Capsule Networks for generating per-point
6D grasp configurations for familiar objects. GraspCaps
extracts a rich feature vector of the objects present in the
point cloud input, which is then used to generate per-point
grasp vectors. This approach allows the network to learn
specific grasping strategies for each object category. In ad-
dition to GraspCaps, the paper also presents a method for
generating a large object-grasping dataset using simulated
annealing. The obtained dataset is then used to train the
GraspCaps network. Through extensive experiments, we
evaluate the performance of the proposed approach, par-
ticularly in terms of the success rate of grasping familiar
objects in challenging real and simulated scenarios. The ex-
perimental results showed that the overall object-grasping
performance of the proposed approach is significantly bet-
ter than the selected baseline. This superior performance
highlights the effectiveness of the GraspCaps in achieving
successful object grasping across various scenarios.
1. Introduction
Robots are becoming increasingly accessible to the pub-
lic, finding applications in various non-industrial settings
such as homes, hospitals, and shopping malls. This grow-
ing accessibility highlights the importance of developing
reliable object grasping and manipulation capabilities, as
robots must interact with a diverse range of novel objects
in dynamic and unforeseen environments (see Fig. 1). A
significant portion of recent research on object grasping has
concentrated on addressing 4 Degrees of Freedom (4DoF)
challenges, specifically in achieving object-agnostic grasp-
ing. In such approaches, the gripper is typically oriented
*Corresponding author: hamidreza.kasaei@rug.nl
Figure 1. In this illustrative scenario, our dual-arm robot is in-
structed to do a clear-table task. The operational cycle involves
processing input point cloud data, predicting a reliable grasp con-
figuration for each point, and subsequently executing the grasping
action to transfer the object into the basket.
to approach objects from an overhead perspective (i.e., top-
down grasp). However, these methods exhibit notable limi-
tations: (i) they do not take into account the semantic func-
tion or label of the object, and (ii) they inherently restrict the
range of interaction possibilities with the object. For exam-
ple, they cannot distinguish between different objects and
struggle to grasp horizontally positioned items like plates.
These limitations motivate the study of “learning to grasp
familiar objects”, where the robot can recognize the label of
the object, and its gripper is free to approach objects from
any arbitrary direction it can reach. This approach aims to
enhance the versatility and adaptability of robotic grasping
in real-world scenarios.
Towards this end, we formulate object grasping as a
supervised learning problem based on a capsule network
to grasp familiar objects. The primary assumption is that
new objects that are geometrically similar to known objects
can be grasped in similar ways using object-aware grasp-
ing [9,27]. Object-aware grasping allows the network to
specifically sample grasps based on the geometry of the ob-
ject, as opposed to object-agnostic grasping, which gener-
ates grasp configurations based on the input to the network
without any deeper knowledge of the features of the object
it is attempting to grasp. There are several reasons why
capsule networks are superior to Convolutional Neural Net-
works (CNNs) and Vision Transformers (ViT) for grasping
objects [5,32]. Their ability to capture spatial hierarchies
and part-whole relationships within objects is one of their
most notable advantages. Unlike CNNs and ViT, capsule
arXiv:2210.03628v2 [cs.RO] 29 Nov 2023
networks are designed to be geometrically invariant, which
means they are robust to changes in scale, orientation, and
position. Geometric invariance is a crucial property in real-
world robotic scenarios where objects are placed in diverse
poses. Moreover, capsule networks employ dynamic rout-
ing mechanisms, allowing for more flexible information
flow between layers. This dynamic routing enables capsules
to detect spatial relationships and contributes to a richer un-
derstanding of the input data [25]. In the context of object
grasping, capsule networks enable object-aware grasping
by generating grasp configurations based on object features
and geometry. Building upon this concept, we have devel-
oped a novel architecture, GraspCaps, that takes as input
a point cloud representation of an object and generates as
outputs a semantic category label and per-point grasp con-
figurations. Our approach utilizes the activation of a single
capsule in the capsule network and processes this activa-
tion to produce per-point grasp vectors and corresponding
fitness values. To the best of our knowledge, GraspCaps
represents the first instance of a grasp network architecture
that employs a capsule network for object-aware grasping.
The contributions of this paper can be summarized as:
This paper presents a novel architecture for object-aware
grasping that utilizes a capsule network to process a point
cloud representation of an object and generate a corre-
sponding semantic category label along with point-wise
grasp synthesis. This marks the first instance of a grasp-
ing model using the capsule network.
We propose an algorithm for generating 6D grasp vectors
from point clouds and creating a synthetic grasp dataset
consisting of 4,576 samples with corresponding object
labels and target grasp vectors.
To rigorously evaluate the effectiveness of the proposed
approach, we conducted a comprehensive series of exper-
iments in both simulation and real-robot setups.
2. Related Work
Deep learning-based object grasping methods provide
enhanced accuracy and adaptability, reduced dependency
on manual engineering, and improved robustness to vari-
ability in real-world scenarios [20]. Current approaches that
process point cloud data can be split up into two categories:
(i) approaches that first transform the point cloud into a dif-
ferent data structure [1,5,26], (ii) approaches that directly
process the point cloud [14,21,22,28,33]. Our method
falls into the second category.
Processing the point set directly has several advantages,
since no overhead is added by transforming the point set,
and there is no chance of any information loss in the con-
version. However, point sets are by definition unordered,
which makes extracting local structures and identifying
similar regions non-trivial. PointNet [21] was one of the
first architectures to effectively use point set data for train-
ing a neural network in an object recognition task. By de-
sign, the PointNet architecture is mostly invariant to point
order, which benefits point sets since extracting a natural or-
der from these sets is non-trivial. However, this does limit
the performance of PointNet as it cannot recognize local
structures in point sets. In prior research the importance of
order in data for the performance of neural networks has
been illustrated [31], hence order should not be fully disre-
garded. PointNet++ [22] improves upon PointNet by recog-
nizing local structures in the data. Our network architecture
is based in part on the architecture used by [3], which makes
the insight to split up the PointNet architecture into several
distinct modules.
Later research showed successful results working with
point sets by transforming the point set to be processed by
a convolutional neural network. PointCNN [14] processed
the input data by applying a χ-transform on the point set.
DGCNN [33] and Point-GNN [28] employ layer architec-
tures that transform the point set into a graph representation
and apply convolution to the resulting graph edges. Sev-
eral approaches have been successful in processing point
sets using a CNN by first transforming the point set into
a more regular data structure, such as a 3D voxel grid [1],
top-down view [13,17,26], or multi-view 2D images [5].
The resulting data structures can be processed with existing
deep neural network architectures. These conversions come
with significant limitations however, as there is a consider-
able loss in information when converting the point cloud to
a different structure, whether that be in the form of losing
natural point densities when converting to a voxel grid, or
the loss of spatial relations between points when convert-
ing to a top-down image. Additionally, the generated voxel
grids might be more voluminous than the original point set,
as it is likely that many of the voxels remain empty [21].
Due to these considerations, we decided to base the Grasp-
Caps architecture in a way that processes the point cloud di-
rectly. Moreover, our method utilizes capsule activations to
generate per-point grasp configuration. The intricate under-
standing of spatial hierarchies afforded by capsule networks
distinguishes GraspCaps as a pioneering solution for object
grasping. Unlike the reviewed approaches, our approach al-
leviates the need for excessive pooling layers employed in
CNN architectures. Such pooling layers can result in a loss
of detailed spatial information.
In the field of grasp generation, S4G [23] extended the
PointNet architecture to generate 6D grasps based on the
input point set. Grasp pose detection (GPD) [29] was de-
veloped to generate and evaluate the fitness of grasps. It
takes a point cloud as its input and generates grasps which
are then filtered on fitness. The network then classifies the
grasp candidate as either successful or unsuccessful. Point-
NetGPD [16] builds upon the idea of GPD and expands on
it by employing the PointNet architecture to evaluate the
摘要:

GraspCaps:ACapsuleNetworkApproachforFamiliar6DoFObjectGraspingTomasvanderVelde1,HamedAyoobi2,HamidrezaKasaei1*1DepartmentofArtificialIntelligence,UniversityofGroningen,TheNetherlands2DepartmentofComputing,ImperialCollegeLondon,UnitedKingdomAbstractAsrobotsbecomemorewidelyavailableoutsideindus-trials...

展开>> 收起<<
GraspCaps A Capsule Network Approach for Familiar 6DoF Object Grasping Tomas van der Velde1 Hamed Ayoobi2 Hamidreza Kasaei1 1Department of Artificial Intelligence University of Groningen The Netherlands.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:9.2MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注