Learning the Sequence of Packing Irregular Objects from Human Demonstrations Towards Autonomous Packing Robots Andr e Santos1 Nuno Ferreira Duarte1 Atabak Dehban1and Jos e Santos-Victor1

2025-04-29 0 0 5.34MB 7 页 10玖币

侵权投诉

Learning the Sequence of Packing Irregular Objects from Human

Demonstrations: Towards Autonomous Packing Robots

Andr´

e Santos1, Nuno Ferreira Duarte1, Atabak Dehban1and Jos´

e Santos-Victor1

Abstract— We tackle the challenge of robotic bin packing

with irregular objects, such as groceries. Given the diverse

physical attributes of these objects and the complex constraints

governing their placement and manipulation, employing pre-

programmed strategies becomes unfeasible. Our approach is to

learn directly from expert demonstrations in order to extract

implicit task knowledge and strategies to ensure safe object

positioning, efﬁcient use of space, and the generation of human-

like behaviors that enhance human-robot trust. We rely on

human demonstrations to learn a Markov chain for predicting

the object packing sequence for a given set of items and then

compare it with human performance. Our experimental results

show that the model outperforms human performance by gener-

ating sequence predictions that humans classify as human-like

more frequently than human-generated sequences. The human

demonstrations were collected using our proposed VR platform,

“BoxED”, which is a box packaging environment for simulating

real-world objects and scenarios for fast and streamlined data

collection with the purpose of teaching robots. We collected

data from 43 participants packing a total of 263 boxes with

supermarket-like objects, yielding 4644 object manipulations.

Our VR platform can be easily adapted to new scenarios and

objects, and is publicly available, alongside our dataset, at

https://github.com/andrejfsantos4/BoxED.

I. INTRODUCTION

The ceaseless digitalization of the modern world has

steadily transformed many aspects of our daily lives, includ-

ing the grocery shopping experience. In recent years, we have

observed a pronounced migration from physical retail spaces

to the online realm, with companies such as Ocado [1], a

dedicated online grocery retailer, reporting 2.5 billion GBP

of revenue in 2022. This trend is underpinned by the intricate

logistical challenge of efﬁciently packaging all orders into

shipping containers.

Simultaneously, robots have been steadily integrating into

both our workplaces and homes. The so-called “cobots” [2]

have attracted a considerable amount of research and devel-

opment attention, with the introduction of Amazon’s new

Astro robot [3] and Temi robot [4]. Besides the critical need

to ensure a safe operation while interacting with humans,

robots must also act as human-like as possible, such that their

actions can be understood and anticipated by their interac-

tion partners, providing users with an improved interaction

experience.

*This work was supported by the Fundac¸˜

ao para a Ciˆ

encia e

a Tecnologia (FCT) through the ISR/LARSyS Associated Laboratory

UID/EEA/50009/2020, LA/P/0083/2020.

1All the authors are afﬁliated with the Institute for Systems and Robotics,

Instituto Superior T´

ecnico, Universidade de Lisboa, 1049-001 Lisbon, Por-

tugal (e-mail: andrejfsantos@tecnico.ulisboa.pt, {nferreiraduarte, adehban,

jasv}@isr.tecnico.ulisboa.pt).

Fig. 1. The demonstrator performs the box packing task in virtual reality.

The task that motivated our work lies at the intersection

of these two rapidly expanding ﬁelds: a robot that packs

groceries alongside and learns from a human collaborator.

Several hardware solutions have been developed speciﬁ-

cally to enhance robot-human collaboration, featuring special

adaptations such as ﬂexible joints. However, there is very

little literature on how to pack heterogeneous objects such as

those found in a supermarket, and there is even less publicly

available datasets related to this task. Existing methods that

address packing objects inside a container consider essen-

tially simple shape objects (such as cuboids and spheres) and

often deploy search-based algorithms to determine the best

placement pose, requiring multiple hours to ﬁnd a solution

[5], [6]. Methods that predict a packing sequence given a set

of objects (for instance, the objects in an online order) are

equally sparse [7]. Our contributions are twofold:

1) A model trained on our dataset that predicts the correct

packing sequence, extracting implicit task knowledge

directly from humans. The generated sequences lead

to safe and efﬁcient packing and were classiﬁed as

“human-like” sequences by users.

2) A publicly-available VR platform (BoxED) for fast and

streamlined data collection, see Fig. 1. BoxED can

be adapted to new box packaging scenarios with any

object datasets, it collects 6-DOF pick-and-place grasp

poses, object trajectories, packing sequences, and ob-

jects’ poses when inside the box. In addition to BoxED,

we created the ﬁrst publicly available collection of

human demonstrations of packing groceries into a box.

Section II addresses related research, Section III describes

the dataset, Section IV introduces our packing sequence

prediction model, and Section VI presents ﬁnal remarks.

II. RELATED WORK

Learning from Demonstrations (LfD). LfD consists of

learning how to execute new tasks and their constraints by

arXiv:2210.01645v2 [cs.RO] 8 Nov 2023

observing an expert performing them [8]. This paradigm has

attracted increasing attention from the robotics community

because it circumvents predeﬁned behaviors and introduces

more ﬂexibility in robotic learning. These methods require

tracking the human’s actions which is not a trivial task due to

rapid movements, occlusions, and observation noise [9], [10].

Munzer T. et al. [11] used a variation of Markov Decision

Processes along with ﬁrst-order logic in a system for a

collaborative robot that learns tasks and human preferences

before and during execution. A major challenge in LfD is

how to encode the demonstration in a useful format for

training models. In some works, participants wear a motion

capture suit [12], or the demonstrations occur in simulation

[13], while others use complex pose estimators [14].

Bin Packing. It addresses the task of packing a set of

objects into a bin with maximum space efﬁciency. In general,

there are two categories of methods: those based on search

strategies and those that use unsupervised learning methods

such as reinforcement learning (RL). The ﬁrst cluster offers

the advantage of rapid deployment due to the absence of a

learning phase, but the majority of cases suffer from slow

execution times (ranging from dozens of seconds to a few

hours [5], [6]). Conversely, methods based on RL or Deep RL

(DRL) [15], [16] require complex training schemes but have

faster execution times. Our approach aims to overcome a

strong limitation in both categories of methods, as they often

consider solely cuboid objects (otherwise the search and

action spaces would grow drastically) and ignore properties

such as fragility.

Sequence Prediction. Various types of sequence predic-

tion techniques exist, including sequence-to-sequence and

sequence classiﬁcation. However, our speciﬁc emphasis lies

in predicting the subsequent element within a sequence,

which encompasses tasks like time series forecasting and

product recommendation. Algorithms designed to tackle this

category of problems exhibit a broad spectrum of approaches,

from explicit association rules [17] and pattern mining al-

gorithms [18] to deep-learning based approaches that learn

implicit representations. These algorithms search for sub-

sequences that appear often in the data and thus contain

important sequential or associative information to predict the

next element.

Related Datasets. Perhaps most similar to our work is

the dataset proposed by Song S. et al. [19], consisting of

videos of eight participants performing pick-and-place tasks

in cluttered environments. The videos are annotated with the

gripper trajectory (before the grasp and during the object

manipulation), grasp pose, picking order, and object mask.

Although more encompassing than most robotic grasping

datasets, this dataset is not related to bin packing and as such

does not include packing order, placement inside a container

or object pose since pose estimation in a real environment

is not trivial.

III. THE BOXED PLATFORM

A. The Virtual Environment

A virtual reality (VR) environment was created in

Unity [20], illustrated in Figure 2, that consists of a circular

area where the user is free to move and a table in its center.

The task takes place on the table: the user should pack the

available objects into a box1. We use 24 different objects,

most chosen from the YCB dataset [21] and some obtained

from public object model platforms.

Fig. 2. The environment and robot gripper created for packing groceries.

The participants interact with the virtual world via the

physical controller. The pose of the controller is mimicked

by the virtual gripper and a pressure-sensing button controls

the closing and opening of the virtual gripper’s ﬁngers. The

objects are conﬁgured as rigid bodies, which means that they

are affected by gravity and friction, and can collide with

other objects. Furthermore, we implemented haptic feedback

on the physical controller to signal collisions between the

gripper and objects to the participants. Upon grasping an

object, it is rigidly attached to the gripper so that its relative

pose is maintained throughout the manipulation.

B. Data Collecting Experiments

We chose to collect the BoxED dataset in VR as it

simpliﬁes much of the complexity associated with tracking

objects and gripper pose seen on [19]. Before starting the

task (i.e., the scene), each participant was instructed to pack

all the items into the box in a manner similar to how they

would typically pack groceries in a supermarket. They were

also instructed to pack the objects in an orderly manner

starting from one side of the container, for reasons that are

clariﬁed in Section IV-B. It is important to mention that

the objects presented here constitute a random subset of the

complete object collection. These objects were generated in a

way that ensures the total combined volume of each object’s

bounding box falls within the range of 70% to 90% of the

container’s volume. This is to ensure that the task is not

trivial, forcing the participants to carefully reﬂect on how

to place the objects. The initial poses of the objects are

spread out as shown in Figure 2. The experiment consists of 4

different scenes to pack a box. For each object manipulation,

we record the 6-DOF pick-up grasp pose, the 6-DOF pose

1The supplementary material presents a participant completing the task.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningtheSequenceofPackingIrregularObjectsfromHumanDemonstrations:TowardsAutonomousPackingRobotsAndr´eSantos1,NunoFerreiraDuarte1,AtabakDehban1andJos´eSantos-Victor1Abstract—Wetacklethechallengeofroboticbinpackingwithirregularobjects,suchasgroceries.Giventhediversephysicalattributesoftheseobjectsa...

展开>> 收起<<

Learning the Sequence of Packing Irregular Objects from Human Demonstrations Towards Autonomous Packing Robots Andr e Santos1 Nuno Ferreira Duarte1 Atabak Dehban1and Jos e Santos-Victor1.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning the Sequence of Packing Irregular Objects from Human Demonstrations Towards Autonomous Packing Robots Andr e Santos1 Nuno Ferreira Duarte1 Atabak Dehban1and Jos e Santos-Victor1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: