
Bag All You Need: Learning a Generalizable Bagging Strategy
for Heterogeneous Objects
Arpit Bahety∗1Shreeya Jain∗1Huy Ha1Nathalie Hager1
Benjamin Burchfiel2Eric Cousineau2Siyuan Feng2Shuran Song1
bag-all-you-need.cs.columbia.edu
Abstract— We introduce a practical robotics solution for
the task of heterogeneous bagging, requiring the placement
of multiple rigid and deformable objects into a deformable
bag. This is a difficult task as it features complex interactions
between multiple highly deformable objects under limited
observability. To tackle these challenges, we propose a robotic
system consisting of two learned policies: a rearrangement policy
that learns to place multiple rigid objects and fold deformable
objects in order to achieve desirable pre-bagging conditions, and
a lifting policy to infer suitable grasp points for bi-manual bag
lifting. We evaluate these learned policies on a real-world three-
arm robot platform that achieves a 70% heterogeneous bagging
success rate with novel objects. To facilitate future research
and comparison, we also develop a novel heterogeneous bagging
simulation benchmark that will be made publicly available.
I. INTRODUCTION
Imagine packing a bag for a picnic; we might first put
several rigid objects (such as an apple and a water bottle)
into the bag, fold deformable objects (such as a picnic mat
and a T-shirt) and then place them on top of the bag opening.
We must then lift the bag (another deformable object) in a
way that these objects fall inside without spilling. Successful
completion of this task requires both a comprehensive
understanding of the objects’ physical properties and the
capability to plan and integrate multiple manipulation skills.
For instance, the robot’s actions must take into account:
•
Object geometry: objects must be placed and oriented
to fit into the bag opening.
•
Object material: large deformable objects, such
as blankets, must be folded or crumpled into a
compact configuration prior to packing. This requires
manipulation strategies that are conditioned on object
material (i.e. rigid and deformable).
•
Inter-object dynamics: the ultimate success of this task
is determined jointly by object configurations and the
robot’s grasp on the bag during lifting. Crucially, when
objects are partially inside a bag (for example, a mat on
top of the bag opening), different lifting positions will
result in different outcomes. Therefore, a successful
approach must decide when a desired pre-bagging
condition is achieved and, if so, determine a good grasp
location(s) to lift up the bag. Here, pre-bagging condition
refers to when all objects are sufficiently inside the bag
opening, and will fall into the bag with a proper lift.
∗indicates equal contribution
1Columbia University 2Toyota Research Institute
a) Rearrange b) Lift c) Final
Fig. 1. The Heterogeneous Bagging Task requires packing multiple rigid
(e.g., the apple) and deformable objects (e.g., the T-shirt) into a deformable
bag. The system must learn to (a) strategically manipulate these objects
to achieve a feasible pre-bagging configuration. It also needs to (b) infer
suitable grasp points from which to lift up the bag such that (c) the objects
fall inside the bag.
Due to these difficulties, prior work focused either on only
the lifting step of the process [1] or considered a simplified
scenario of packing only rigid items [2], [3].
We seek to address these limitations and propose a system
that tackles the complete bagging process for a diverse set
of rigid and deformable objects — a task we refer to as
heterogeneous bagging. Our proposed approach consists of
two learnable policies: a rearrangement policy that uses
sequential pick-and-place actions to rearrange or fold items
(Fig. 1a) in order to achieve a suitable pre-bagging configura-
tion and a lifting policy that determines where to grasp and
lift up the bag once pre-bagging conditions are met (Fig. 1b,c).
We show that estimating the satisfaction of these pre-bagging
conditions (required to decide when to stop rearranging and
begin lifting) can be jointly performed by the two policies.
To accomplish this task on real hardware, we develop a
representative simulation environment and use it to train both
policies. Then, to facilitate a better bridge for the inevitable
sim2real gap, we train a self-supervised network that detects
the bag opening from real-world depth images. These
predictions are used as additional input to the rearrangement
and lifting policies, allowing them to transfer more robustly
from simulation, where they are trained, to the real world.
We evaluate the learned policies with a real-world three-arm
robot system with novel objects. The system is equipped with
two types of end-effectors: a suction gripper, responsible for
arXiv:2210.09997v2 [cs.RO] 1 Oct 2023