Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization Thomas Lew13 Sumeet Singh1 Mario Prats2 Jeffrey Bingham2 Jonathan Weisz2 Benjie Holson2 Xiaohan

2025-05-03 0 0 5.93MB 7 页 10玖币
侵权投诉
Robotic Table Wiping via Reinforcement Learning and
Whole-body Trajectory Optimization
Thomas Lew1,3, Sumeet Singh1, Mario Prats2, Jeffrey Bingham2, Jonathan Weisz2, Benjie Holson2, Xiaohan
Zhang1,4, Vikas Sindhwani1, Yao Lu1, Fei Xia1, Peng Xu1, Tingnan Zhang1, Jie Tan1, Montserrat Gonzalez1
Abstract We propose a framework to enable multipurpose
assistive mobile robots to autonomously wipe tables to clean
spills and crumbs. This problem is challenging, as it requires
planning wiping actions while reasoning over uncertain latent
dynamics of crumbs and spills captured via high-dimensional
visual observations. Simultaneously, we must guarantee con-
straints satisfaction to enable safe deployment in unstructured
cluttered environments. To tackle this problem, we first propose
a stochastic differential equation to model crumbs and spill
dynamics and absorption with a robot wiper. Using this model,
we train a vision-based policy for planning wiping actions in
simulation using reinforcement learning (RL). To enable zero-
shot sim-to-real deployment, we dovetail the RL policy with
a whole-body trajectory optimization framework to compute
base and arm joint trajectories that execute the desired wiping
motions while guaranteeing constraints satisfaction. We exten-
sively validate our approach in simulation and on hardware.
Video of experiments: https://youtu.be/inORKP4F3EI
I. INTRODUCTION
Multipurpose assistive robots will play an important role
in improving people’s lives in the spaces where we live and
work [1], [2]. Repetitive tasks such as cleaning surfaces are
well-suited for robots, but remain challenging for systems
that typically operate in structured environments. Operating
in the real world requires handling high-dimensional sensory
inputs and dealing with the stochasticity of the environment.
Learning-based techniques such as reinforcement learning
(RL) offer the promise of solving these complex visuo-motor
tasks from high-dimensional observations. However, apply-
ing end-to-end learning methods to mobile manipulation
tasks remains challenging due to the increased dimensional-
ity and the need for precise low-level control. Additionally,
on-robot deployment either requires collecting large amounts
of data [3]–[5], using accurate but computationally expensive
models [5], or on-hardware fine-tuning [6].
In this work, we focus on the task of cleaning tables with
a mobile robotic manipulator equipped with a wiping tool.
This problem is challenging for both high-level planning and
low-level control. Indeed, at a high-level, deciding how to
best wipe a spill perceived by a camera requires solving a
challenging planning problem with stochastic dynamics. At a
low-level, executing a wiping motion requires simultaneously
maintaining contact with the table while avoiding nearby
obstacles such as chairs. Designing a real-time and effective
solution to this problem remains an open problem [1].
1Robotics at Google 2Everyday Robots
3Department of Aeronautics and Astronautics, Stanford University
4Department of Computer Science, SUNY Binghamton
Fig. 1: We present a framework to autonomously clean tables with
a mobile manipulator. Our proposed approach combines reinforce-
ment learning to select the best wiping strategy and trajectory
optimization to safely execute the wiping actions. We validate our
approach on the multipurpose assistive robot from Everyday Robots.
Our main contributions are as follows:
We propose a framework for autonomous table wiping.
First, we use visual observations of the table state to plan
high-level wiping actions for the end-effector. Then, we
compute whole-body trajectories that we execute using
admittance control. This approach is key to achieving
reliable table wiping in new environments without the
need for real-world data collection or demonstrations.
We describe the uncertain time evolution of dirty particles
on the table using a stochastic differential equation (SDE)
capable of modeling absorption with the wiper. Then, we
formulate the problem of planning wiping actions as a
stochastic optimal control problem. As this task requires
planning over high-dimensional visual inputs, we solve
the problem using RL, entirely in simulation.
We design a whole-body trajectory optimization algo-
rithm for navigation in cluttered environments and table
wiping. Our approach accounts for the kinematics of the
manipulator, the nonholonomic constraints of the base,
and collision avoidance constraints with the environment.
This approach combines the strengths of reinforcement learn-
ing - planning in high-dimensional observation spaces with
complex stochastic dynamics, and of trajectory optimization -
guaranteeing constraints satisfaction while executing whole-
body trajectories; it does not require collecting a task-specific
dataset on the system, and transfers zero-shot to hardware.
II. RELATED WORK
Reinforcement learning allows tackling complex high-
dimensional planning problems with stochastic multimodal
dynamics that would be difficult to solve in real-time with
arXiv:2210.10865v1 [cs.RO] 19 Oct 2022
Fig. 2: System overview. (1) A perception module processes sensory inputs (camera images and LiDAR pointclouds) and senses obstacles,
crumbs, and spills. (2) A high-level planning module selects wiping waypoints on the table. (3) A whole-body trajectory optimization
module computes joint angles to perform the wipe while satisfying constraints. (4) An admittance controller executes the planned trajectory.
model-based techniques [7]–[10]. The success of RL in
complex robotic tasks hinges on appropriately selecting the
observation and action spaces to simplify learning [11]–[14].
Indeed, end-to-end training is computationally costly and
requires expensive data collection [3]–[5]. Previous work
demonstrated that decomposing complex problems by plan-
ning high-level waypoints with RL and generating motion
plans with model-based approaches improves performance
[13], [15]. We use a similar decomposition in this work.
However, while previous approaches were applied to solve
navigation or manipulation tasks independently, we demon-
strate that RL can be combined with trajectory optimization
and admittance control to simultaneously move the base and
arm of a mobile manipulator while avoiding obstacles to
solve a complex task such as table wiping.
Trajectory optimization allows computing dynamically-
feasible trajectories that guarantee reliable and safe execu-
tion, e.g., by accounting for obstacle avoidance constraints.
Multiple works have demonstrated real-time trajectory op-
timization algorithms for mobile manipulators, e.g., for a
ball-balancing manipulator [16], [17] and a legged robot
equipped with a robotic arm [18]–[20]. In this work, we
demonstrate real-time whole-body collision-free trajectory
optimization on the mobile manipulator with an arm with
seven degrees of freedom shown in Figure 1. The robot
base has a nonholonomic constraint that makes real-time
trajectory optimization challenging. Navigating in a cluttered
environment such as a kitchen requires real-time collision
avoidance. As in [16], [20], [21], we enforce collision
avoidance constraints in the formulation. We assume that all
perceived obstacles are given as polyhedrons, as is common
in the literature [22]–[24]. The optimization problem is
solved using the differentiable shooting-sequential-quadratic-
programming (ShootingSQP) method presented in [25].
Table wiping with a multipurpose mobile manipulator is
challenging, as this task requires simultaneously reasoning
about the cleanliness state of the table, planning optimal
wiping actions, and acting accordingly [1]. Table wiping
approaches can be divided into three categories. First, classi-
cal methods detect spills and subsequently apply pre-defined
wiping patterns to clean them [26]. These methods work
well but are suboptimal as they do not explicitly reason
about the time evolution of the spills at planning time. A
second class of methods uses analytical or learned transition
models for the cleanliness state of the table and subsequently
apply classical planning methods to solve these problems
[27]–[29]. These methods were applied to manipulation
problems too [30]. However, [27], [29] only consider dirt
cleaning tasks, and learning transition dynamics accounting
for absorption and the sticking behavior of certain liquids
(e.g., honey) may be challenging. A third class of imitation
learning methods uses demonstration data to learn a cleaning
policy [31]–[36]. These methods are successful since wiping
primitives are easier to learn than visual transition models.
Our proposed approach does not require training data
from the system. Instead, we train an RL policy that selects
high-level wiping waypoints entirely in simulation. The key
consists of describing crumbs and spill dynamics with a
stochastic differential equation (SDE) [37]. In contrast to
existing learned [38] models and the material point method
[39], our SDE model does not require training data from
the system that may be expensive to collect, is efficient to
simulate, allows modeling dry objects, sticking and absorp-
tion behavior. We then directly deploy the learned wiping
policy to hardware. Wipes are executed using trajectory
optimization which guarantees reliable and safe execution.
III. COMBINING RL AND TRAJECTORY OPTIMIZATION
We consider two table wiping tasks: gathering crumbs
and cleaning spills. A robot equipped with a wiper cannot
immediately capture crumbs and clean dirt particles. Instead,
one may first gather the crumbs together before using a
different method to remove them (e.g., with a vacuum cleaner
[27] or by pushing them into a bin, see Section VI). Thus,
we formulate the objective of the crumbs-gathering task as
moving all crumbs particles to the center of the table. For
spills-cleaning, the objective is wiping all spill particles.
The problem of table wiping can be formulated as a
POMDP from image inputs to control signals to send to the
robot actuators. In this work, we decompose the problem and
propose a framework (see Figure 2) consisting of four steps:
1) Perception: The system processes LiDAR pointclouds
and camera depth and color images and returns bounding
boxes for obstacles O(see Figure 5) and an image mask
ofor spills and crumbs on the table (see Section IV).
摘要:

RoboticTableWipingviaReinforcementLearningandWhole-bodyTrajectoryOptimizationThomasLew1;3,SumeetSingh1,MarioPrats2,JeffreyBingham2,JonathanWeisz2,BenjieHolson2,XiaohanZhang1;4,VikasSindhwani1,YaoLu1,FeiXia1,PengXu1,TingnanZhang1,JieTan1,MontserratGonzalez1Abstract—Weproposeaframeworktoenablemultipur...

展开>> 收起<<
Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization Thomas Lew13 Sumeet Singh1 Mario Prats2 Jeffrey Bingham2 Jonathan Weisz2 Benjie Holson2 Xiaohan.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:5.93MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注