Learning the Sequence of Packing Irregular Objects from Human Demonstrations Towards Autonomous Packing Robots Andr e Santos1 Nuno Ferreira Duarte1 Atabak Dehban1and Jos e Santos-Victor1

2025-04-29 0 0 5.34MB 7 页 10玖币
侵权投诉
Learning the Sequence of Packing Irregular Objects from Human
Demonstrations: Towards Autonomous Packing Robots
Andr´
e Santos1, Nuno Ferreira Duarte1, Atabak Dehban1and Jos´
e Santos-Victor1
Abstract We tackle the challenge of robotic bin packing
with irregular objects, such as groceries. Given the diverse
physical attributes of these objects and the complex constraints
governing their placement and manipulation, employing pre-
programmed strategies becomes unfeasible. Our approach is to
learn directly from expert demonstrations in order to extract
implicit task knowledge and strategies to ensure safe object
positioning, efficient use of space, and the generation of human-
like behaviors that enhance human-robot trust. We rely on
human demonstrations to learn a Markov chain for predicting
the object packing sequence for a given set of items and then
compare it with human performance. Our experimental results
show that the model outperforms human performance by gener-
ating sequence predictions that humans classify as human-like
more frequently than human-generated sequences. The human
demonstrations were collected using our proposed VR platform,
“BoxED”, which is a box packaging environment for simulating
real-world objects and scenarios for fast and streamlined data
collection with the purpose of teaching robots. We collected
data from 43 participants packing a total of 263 boxes with
supermarket-like objects, yielding 4644 object manipulations.
Our VR platform can be easily adapted to new scenarios and
objects, and is publicly available, alongside our dataset, at
https://github.com/andrejfsantos4/BoxED.
I. INTRODUCTION
The ceaseless digitalization of the modern world has
steadily transformed many aspects of our daily lives, includ-
ing the grocery shopping experience. In recent years, we have
observed a pronounced migration from physical retail spaces
to the online realm, with companies such as Ocado [1], a
dedicated online grocery retailer, reporting 2.5 billion GBP
of revenue in 2022. This trend is underpinned by the intricate
logistical challenge of efficiently packaging all orders into
shipping containers.
Simultaneously, robots have been steadily integrating into
both our workplaces and homes. The so-called “cobots” [2]
have attracted a considerable amount of research and devel-
opment attention, with the introduction of Amazon’s new
Astro robot [3] and Temi robot [4]. Besides the critical need
to ensure a safe operation while interacting with humans,
robots must also act as human-like as possible, such that their
actions can be understood and anticipated by their interac-
tion partners, providing users with an improved interaction
experience.
*This work was supported by the Fundac¸˜
ao para a Ciˆ
encia e
a Tecnologia (FCT) through the ISR/LARSyS Associated Laboratory
UID/EEA/50009/2020, LA/P/0083/2020.
1All the authors are affiliated with the Institute for Systems and Robotics,
Instituto Superior T´
ecnico, Universidade de Lisboa, 1049-001 Lisbon, Por-
tugal (e-mail: andrejfsantos@tecnico.ulisboa.pt, {nferreiraduarte, adehban,
jasv}@isr.tecnico.ulisboa.pt).
Fig. 1. The demonstrator performs the box packing task in virtual reality.
The task that motivated our work lies at the intersection
of these two rapidly expanding fields: a robot that packs
groceries alongside and learns from a human collaborator.
Several hardware solutions have been developed specifi-
cally to enhance robot-human collaboration, featuring special
adaptations such as flexible joints. However, there is very
little literature on how to pack heterogeneous objects such as
those found in a supermarket, and there is even less publicly
available datasets related to this task. Existing methods that
address packing objects inside a container consider essen-
tially simple shape objects (such as cuboids and spheres) and
often deploy search-based algorithms to determine the best
placement pose, requiring multiple hours to find a solution
[5], [6]. Methods that predict a packing sequence given a set
of objects (for instance, the objects in an online order) are
equally sparse [7]. Our contributions are twofold:
1) A model trained on our dataset that predicts the correct
packing sequence, extracting implicit task knowledge
directly from humans. The generated sequences lead
to safe and efficient packing and were classified as
“human-like” sequences by users.
2) A publicly-available VR platform (BoxED) for fast and
streamlined data collection, see Fig. 1. BoxED can
be adapted to new box packaging scenarios with any
object datasets, it collects 6-DOF pick-and-place grasp
poses, object trajectories, packing sequences, and ob-
jects’ poses when inside the box. In addition to BoxED,
we created the first publicly available collection of
human demonstrations of packing groceries into a box.
Section II addresses related research, Section III describes
the dataset, Section IV introduces our packing sequence
prediction model, and Section VI presents final remarks.
II. RELATED WORK
Learning from Demonstrations (LfD). LfD consists of
learning how to execute new tasks and their constraints by
arXiv:2210.01645v2 [cs.RO] 8 Nov 2023
observing an expert performing them [8]. This paradigm has
attracted increasing attention from the robotics community
because it circumvents predefined behaviors and introduces
more flexibility in robotic learning. These methods require
tracking the human’s actions which is not a trivial task due to
rapid movements, occlusions, and observation noise [9], [10].
Munzer T. et al. [11] used a variation of Markov Decision
Processes along with first-order logic in a system for a
collaborative robot that learns tasks and human preferences
before and during execution. A major challenge in LfD is
how to encode the demonstration in a useful format for
training models. In some works, participants wear a motion
capture suit [12], or the demonstrations occur in simulation
[13], while others use complex pose estimators [14].
Bin Packing. It addresses the task of packing a set of
objects into a bin with maximum space efficiency. In general,
there are two categories of methods: those based on search
strategies and those that use unsupervised learning methods
such as reinforcement learning (RL). The first cluster offers
the advantage of rapid deployment due to the absence of a
learning phase, but the majority of cases suffer from slow
execution times (ranging from dozens of seconds to a few
hours [5], [6]). Conversely, methods based on RL or Deep RL
(DRL) [15], [16] require complex training schemes but have
faster execution times. Our approach aims to overcome a
strong limitation in both categories of methods, as they often
consider solely cuboid objects (otherwise the search and
action spaces would grow drastically) and ignore properties
such as fragility.
Sequence Prediction. Various types of sequence predic-
tion techniques exist, including sequence-to-sequence and
sequence classification. However, our specific emphasis lies
in predicting the subsequent element within a sequence,
which encompasses tasks like time series forecasting and
product recommendation. Algorithms designed to tackle this
category of problems exhibit a broad spectrum of approaches,
from explicit association rules [17] and pattern mining al-
gorithms [18] to deep-learning based approaches that learn
implicit representations. These algorithms search for sub-
sequences that appear often in the data and thus contain
important sequential or associative information to predict the
next element.
Related Datasets. Perhaps most similar to our work is
the dataset proposed by Song S. et al. [19], consisting of
videos of eight participants performing pick-and-place tasks
in cluttered environments. The videos are annotated with the
gripper trajectory (before the grasp and during the object
manipulation), grasp pose, picking order, and object mask.
Although more encompassing than most robotic grasping
datasets, this dataset is not related to bin packing and as such
does not include packing order, placement inside a container
or object pose since pose estimation in a real environment
is not trivial.
III. THE BOXED PLATFORM
A. The Virtual Environment
A virtual reality (VR) environment was created in
Unity [20], illustrated in Figure 2, that consists of a circular
area where the user is free to move and a table in its center.
The task takes place on the table: the user should pack the
available objects into a box1. We use 24 different objects,
most chosen from the YCB dataset [21] and some obtained
from public object model platforms.
Fig. 2. The environment and robot gripper created for packing groceries.
The participants interact with the virtual world via the
physical controller. The pose of the controller is mimicked
by the virtual gripper and a pressure-sensing button controls
the closing and opening of the virtual gripper’s fingers. The
objects are configured as rigid bodies, which means that they
are affected by gravity and friction, and can collide with
other objects. Furthermore, we implemented haptic feedback
on the physical controller to signal collisions between the
gripper and objects to the participants. Upon grasping an
object, it is rigidly attached to the gripper so that its relative
pose is maintained throughout the manipulation.
B. Data Collecting Experiments
We chose to collect the BoxED dataset in VR as it
simplifies much of the complexity associated with tracking
objects and gripper pose seen on [19]. Before starting the
task (i.e., the scene), each participant was instructed to pack
all the items into the box in a manner similar to how they
would typically pack groceries in a supermarket. They were
also instructed to pack the objects in an orderly manner
starting from one side of the container, for reasons that are
clarified in Section IV-B. It is important to mention that
the objects presented here constitute a random subset of the
complete object collection. These objects were generated in a
way that ensures the total combined volume of each object’s
bounding box falls within the range of 70% to 90% of the
container’s volume. This is to ensure that the task is not
trivial, forcing the participants to carefully reflect on how
to place the objects. The initial poses of the objects are
spread out as shown in Figure 2. The experiment consists of 4
different scenes to pack a box. For each object manipulation,
we record the 6-DOF pick-up grasp pose, the 6-DOF pose
1The supplementary material presents a participant completing the task.
摘要:

LearningtheSequenceofPackingIrregularObjectsfromHumanDemonstrations:TowardsAutonomousPackingRobotsAndr´eSantos1,NunoFerreiraDuarte1,AtabakDehban1andJos´eSantos-Victor1Abstract—Wetacklethechallengeofroboticbinpackingwithirregularobjects,suchasgroceries.Giventhediversephysicalattributesoftheseobjectsa...

展开>> 收起<<
Learning the Sequence of Packing Irregular Objects from Human Demonstrations Towards Autonomous Packing Robots Andr e Santos1 Nuno Ferreira Duarte1 Atabak Dehban1and Jos e Santos-Victor1.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:5.34MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注