GraspCaps A Capsule Network Approach for Familiar 6DoF Object Grasping Tomas van der Velde1 Hamed Ayoobi2 Hamidreza Kasaei1 1Department of Artificial Intelligence University of Groningen The Netherlands

2025-05-06 0 0 9.2MB 10 页 10玖币

侵权投诉

GraspCaps: A Capsule Network Approach for Familiar 6DoF Object Grasping

Tomas van der Velde1, Hamed Ayoobi2, Hamidreza Kasaei1*

1Department of Artiﬁcial Intelligence, University of Groningen, The Netherlands

2Department of Computing, Imperial College London, United Kingdom

Abstract

As robots become more widely available outside indus-

trial settings, the need for reliable object grasping and ma-

nipulation is increasing. In such environments, robots must

be able to grasp and manipulate novel objects in various

situations. This paper presents GraspCaps, a novel archi-

tecture based on Capsule Networks for generating per-point

6D grasp conﬁgurations for familiar objects. GraspCaps

extracts a rich feature vector of the objects present in the

point cloud input, which is then used to generate per-point

grasp vectors. This approach allows the network to learn

speciﬁc grasping strategies for each object category. In ad-

dition to GraspCaps, the paper also presents a method for

generating a large object-grasping dataset using simulated

annealing. The obtained dataset is then used to train the

GraspCaps network. Through extensive experiments, we

evaluate the performance of the proposed approach, par-

ticularly in terms of the success rate of grasping familiar

objects in challenging real and simulated scenarios. The ex-

perimental results showed that the overall object-grasping

performance of the proposed approach is signiﬁcantly bet-

ter than the selected baseline. This superior performance

highlights the effectiveness of the GraspCaps in achieving

successful object grasping across various scenarios.

1. Introduction

Robots are becoming increasingly accessible to the pub-

lic, ﬁnding applications in various non-industrial settings

such as homes, hospitals, and shopping malls. This grow-

ing accessibility highlights the importance of developing

reliable object grasping and manipulation capabilities, as

robots must interact with a diverse range of novel objects

in dynamic and unforeseen environments (see Fig. 1). A

signiﬁcant portion of recent research on object grasping has

concentrated on addressing 4 Degrees of Freedom (4DoF)

challenges, speciﬁcally in achieving object-agnostic grasp-

ing. In such approaches, the gripper is typically oriented

*Corresponding author: hamidreza.kasaei@rug.nl

Figure 1. In this illustrative scenario, our dual-arm robot is in-

structed to do a clear-table task. The operational cycle involves

processing input point cloud data, predicting a reliable grasp con-

ﬁguration for each point, and subsequently executing the grasping

action to transfer the object into the basket.

to approach objects from an overhead perspective (i.e., top-

down grasp). However, these methods exhibit notable limi-

tations: (i) they do not take into account the semantic func-

tion or label of the object, and (ii) they inherently restrict the

range of interaction possibilities with the object. For exam-

ple, they cannot distinguish between different objects and

struggle to grasp horizontally positioned items like plates.

These limitations motivate the study of “learning to grasp

familiar objects”, where the robot can recognize the label of

the object, and its gripper is free to approach objects from

any arbitrary direction it can reach. This approach aims to

enhance the versatility and adaptability of robotic grasping

in real-world scenarios.

Towards this end, we formulate object grasping as a

supervised learning problem based on a capsule network

to grasp familiar objects. The primary assumption is that

new objects that are geometrically similar to known objects

can be grasped in similar ways using object-aware grasp-

ing [9,27]. Object-aware grasping allows the network to

speciﬁcally sample grasps based on the geometry of the ob-

ject, as opposed to object-agnostic grasping, which gener-

ates grasp conﬁgurations based on the input to the network

without any deeper knowledge of the features of the object

it is attempting to grasp. There are several reasons why

capsule networks are superior to Convolutional Neural Net-

works (CNNs) and Vision Transformers (ViT) for grasping

objects [5,32]. Their ability to capture spatial hierarchies

and part-whole relationships within objects is one of their

most notable advantages. Unlike CNNs and ViT, capsule

arXiv:2210.03628v2 [cs.RO] 29 Nov 2023

networks are designed to be geometrically invariant, which

means they are robust to changes in scale, orientation, and

position. Geometric invariance is a crucial property in real-

world robotic scenarios where objects are placed in diverse

poses. Moreover, capsule networks employ dynamic rout-

ing mechanisms, allowing for more ﬂexible information

ﬂow between layers. This dynamic routing enables capsules

to detect spatial relationships and contributes to a richer un-

derstanding of the input data [25]. In the context of object

grasping, capsule networks enable object-aware grasping

by generating grasp conﬁgurations based on object features

and geometry. Building upon this concept, we have devel-

oped a novel architecture, GraspCaps, that takes as input

a point cloud representation of an object and generates as

outputs a semantic category label and per-point grasp con-

ﬁgurations. Our approach utilizes the activation of a single

capsule in the capsule network and processes this activa-

tion to produce per-point grasp vectors and corresponding

ﬁtness values. To the best of our knowledge, GraspCaps

represents the ﬁrst instance of a grasp network architecture

that employs a capsule network for object-aware grasping.

The contributions of this paper can be summarized as:

• This paper presents a novel architecture for object-aware

grasping that utilizes a capsule network to process a point

cloud representation of an object and generate a corre-

sponding semantic category label along with point-wise

grasp synthesis. This marks the ﬁrst instance of a grasp-

ing model using the capsule network.

• We propose an algorithm for generating 6D grasp vectors

from point clouds and creating a synthetic grasp dataset

consisting of 4,576 samples with corresponding object

labels and target grasp vectors.

• To rigorously evaluate the effectiveness of the proposed

approach, we conducted a comprehensive series of exper-

iments in both simulation and real-robot setups.

2. Related Work

Deep learning-based object grasping methods provide

enhanced accuracy and adaptability, reduced dependency

on manual engineering, and improved robustness to vari-

ability in real-world scenarios [20]. Current approaches that

process point cloud data can be split up into two categories:

(i) approaches that ﬁrst transform the point cloud into a dif-

ferent data structure [1,5,26], (ii) approaches that directly

process the point cloud [14,21,22,28,33]. Our method

falls into the second category.

Processing the point set directly has several advantages,

since no overhead is added by transforming the point set,

and there is no chance of any information loss in the con-

version. However, point sets are by deﬁnition unordered,

which makes extracting local structures and identifying

similar regions non-trivial. PointNet [21] was one of the

ﬁrst architectures to effectively use point set data for train-

ing a neural network in an object recognition task. By de-

sign, the PointNet architecture is mostly invariant to point

order, which beneﬁts point sets since extracting a natural or-

der from these sets is non-trivial. However, this does limit

the performance of PointNet as it cannot recognize local

structures in point sets. In prior research the importance of

order in data for the performance of neural networks has

been illustrated [31], hence order should not be fully disre-

garded. PointNet++ [22] improves upon PointNet by recog-

nizing local structures in the data. Our network architecture

is based in part on the architecture used by [3], which makes

the insight to split up the PointNet architecture into several

distinct modules.

Later research showed successful results working with

point sets by transforming the point set to be processed by

a convolutional neural network. PointCNN [14] processed

the input data by applying a χ-transform on the point set.

DGCNN [33] and Point-GNN [28] employ layer architec-

tures that transform the point set into a graph representation

and apply convolution to the resulting graph edges. Sev-

eral approaches have been successful in processing point

sets using a CNN by ﬁrst transforming the point set into

a more regular data structure, such as a 3D voxel grid [1],

top-down view [13,17,26], or multi-view 2D images [5].

The resulting data structures can be processed with existing

deep neural network architectures. These conversions come

with signiﬁcant limitations however, as there is a consider-

able loss in information when converting the point cloud to

a different structure, whether that be in the form of losing

natural point densities when converting to a voxel grid, or

the loss of spatial relations between points when convert-

ing to a top-down image. Additionally, the generated voxel

grids might be more voluminous than the original point set,

as it is likely that many of the voxels remain empty [21].

Due to these considerations, we decided to base the Grasp-

Caps architecture in a way that processes the point cloud di-

rectly. Moreover, our method utilizes capsule activations to

generate per-point grasp conﬁguration. The intricate under-

standing of spatial hierarchies afforded by capsule networks

distinguishes GraspCaps as a pioneering solution for object

grasping. Unlike the reviewed approaches, our approach al-

leviates the need for excessive pooling layers employed in

CNN architectures. Such pooling layers can result in a loss

of detailed spatial information.

In the ﬁeld of grasp generation, S4G [23] extended the

PointNet architecture to generate 6D grasps based on the

input point set. Grasp pose detection (GPD) [29] was de-

veloped to generate and evaluate the ﬁtness of grasps. It

takes a point cloud as its input and generates grasps which

are then ﬁltered on ﬁtness. The network then classiﬁes the

grasp candidate as either successful or unsuccessful. Point-

NetGPD [16] builds upon the idea of GPD and expands on

it by employing the PointNet architecture to evaluate the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GraspCaps:ACapsuleNetworkApproachforFamiliar6DoFObjectGraspingTomasvanderVelde1,HamedAyoobi2,HamidrezaKasaei1*1DepartmentofArtificialIntelligence,UniversityofGroningen,TheNetherlands2DepartmentofComputing,ImperialCollegeLondon,UnitedKingdomAbstractAsrobotsbecomemorewidelyavailableoutsideindus-trials...

展开>> 收起<<

GraspCaps A Capsule Network Approach for Familiar 6DoF Object Grasping Tomas van der Velde1 Hamed Ayoobi2 Hamidreza Kasaei1 1Department of Artificial Intelligence University of Groningen The Netherlands.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

GraspCaps A Capsule Network Approach for Familiar 6DoF Object Grasping Tomas van der Velde1 Hamed Ayoobi2 Hamidreza Kasaei1 1Department of Artificial Intelligence University of Groningen The Netherlands

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: