
Robotic Table Wiping via Reinforcement Learning and
Whole-body Trajectory Optimization
Thomas Lew1,3, Sumeet Singh1, Mario Prats2, Jeffrey Bingham2, Jonathan Weisz2, Benjie Holson2, Xiaohan
Zhang1,4, Vikas Sindhwani1, Yao Lu1, Fei Xia1, Peng Xu1, Tingnan Zhang1, Jie Tan1, Montserrat Gonzalez1
Abstract— We propose a framework to enable multipurpose
assistive mobile robots to autonomously wipe tables to clean
spills and crumbs. This problem is challenging, as it requires
planning wiping actions while reasoning over uncertain latent
dynamics of crumbs and spills captured via high-dimensional
visual observations. Simultaneously, we must guarantee con-
straints satisfaction to enable safe deployment in unstructured
cluttered environments. To tackle this problem, we first propose
a stochastic differential equation to model crumbs and spill
dynamics and absorption with a robot wiper. Using this model,
we train a vision-based policy for planning wiping actions in
simulation using reinforcement learning (RL). To enable zero-
shot sim-to-real deployment, we dovetail the RL policy with
a whole-body trajectory optimization framework to compute
base and arm joint trajectories that execute the desired wiping
motions while guaranteeing constraints satisfaction. We exten-
sively validate our approach in simulation and on hardware.
Video of experiments: https://youtu.be/inORKP4F3EI
I. INTRODUCTION
Multipurpose assistive robots will play an important role
in improving people’s lives in the spaces where we live and
work [1], [2]. Repetitive tasks such as cleaning surfaces are
well-suited for robots, but remain challenging for systems
that typically operate in structured environments. Operating
in the real world requires handling high-dimensional sensory
inputs and dealing with the stochasticity of the environment.
Learning-based techniques such as reinforcement learning
(RL) offer the promise of solving these complex visuo-motor
tasks from high-dimensional observations. However, apply-
ing end-to-end learning methods to mobile manipulation
tasks remains challenging due to the increased dimensional-
ity and the need for precise low-level control. Additionally,
on-robot deployment either requires collecting large amounts
of data [3]–[5], using accurate but computationally expensive
models [5], or on-hardware fine-tuning [6].
In this work, we focus on the task of cleaning tables with
a mobile robotic manipulator equipped with a wiping tool.
This problem is challenging for both high-level planning and
low-level control. Indeed, at a high-level, deciding how to
best wipe a spill perceived by a camera requires solving a
challenging planning problem with stochastic dynamics. At a
low-level, executing a wiping motion requires simultaneously
maintaining contact with the table while avoiding nearby
obstacles such as chairs. Designing a real-time and effective
solution to this problem remains an open problem [1].
1Robotics at Google 2Everyday Robots
3Department of Aeronautics and Astronautics, Stanford University
4Department of Computer Science, SUNY Binghamton
Fig. 1: We present a framework to autonomously clean tables with
a mobile manipulator. Our proposed approach combines reinforce-
ment learning to select the best wiping strategy and trajectory
optimization to safely execute the wiping actions. We validate our
approach on the multipurpose assistive robot from Everyday Robots.
Our main contributions are as follows:
•We propose a framework for autonomous table wiping.
First, we use visual observations of the table state to plan
high-level wiping actions for the end-effector. Then, we
compute whole-body trajectories that we execute using
admittance control. This approach is key to achieving
reliable table wiping in new environments without the
need for real-world data collection or demonstrations.
•We describe the uncertain time evolution of dirty particles
on the table using a stochastic differential equation (SDE)
capable of modeling absorption with the wiper. Then, we
formulate the problem of planning wiping actions as a
stochastic optimal control problem. As this task requires
planning over high-dimensional visual inputs, we solve
the problem using RL, entirely in simulation.
•We design a whole-body trajectory optimization algo-
rithm for navigation in cluttered environments and table
wiping. Our approach accounts for the kinematics of the
manipulator, the nonholonomic constraints of the base,
and collision avoidance constraints with the environment.
This approach combines the strengths of reinforcement learn-
ing - planning in high-dimensional observation spaces with
complex stochastic dynamics, and of trajectory optimization -
guaranteeing constraints satisfaction while executing whole-
body trajectories; it does not require collecting a task-specific
dataset on the system, and transfers zero-shot to hardware.
II. RELATED WORK
Reinforcement learning allows tackling complex high-
dimensional planning problems with stochastic multimodal
dynamics that would be difficult to solve in real-time with
arXiv:2210.10865v1 [cs.RO] 19 Oct 2022