Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization Thomas Lew13 Sumeet Singh1 Mario Prats2 Jeffrey Bingham2 Jonathan Weisz2 Benjie Holson2 Xiaohan

2025-05-03 3 0 5.93MB 7 页 10玖币

侵权投诉

Robotic Table Wiping via Reinforcement Learning and

Whole-body Trajectory Optimization

Thomas Lew1,3, Sumeet Singh1, Mario Prats2, Jeffrey Bingham2, Jonathan Weisz2, Benjie Holson2, Xiaohan

Zhang1,4, Vikas Sindhwani1, Yao Lu1, Fei Xia1, Peng Xu1, Tingnan Zhang1, Jie Tan1, Montserrat Gonzalez1

Abstract— We propose a framework to enable multipurpose

assistive mobile robots to autonomously wipe tables to clean

spills and crumbs. This problem is challenging, as it requires

planning wiping actions while reasoning over uncertain latent

dynamics of crumbs and spills captured via high-dimensional

visual observations. Simultaneously, we must guarantee con-

straints satisfaction to enable safe deployment in unstructured

cluttered environments. To tackle this problem, we ﬁrst propose

a stochastic differential equation to model crumbs and spill

dynamics and absorption with a robot wiper. Using this model,

we train a vision-based policy for planning wiping actions in

simulation using reinforcement learning (RL). To enable zero-

shot sim-to-real deployment, we dovetail the RL policy with

a whole-body trajectory optimization framework to compute

base and arm joint trajectories that execute the desired wiping

motions while guaranteeing constraints satisfaction. We exten-

sively validate our approach in simulation and on hardware.

Video of experiments: https://youtu.be/inORKP4F3EI

I. INTRODUCTION

Multipurpose assistive robots will play an important role

in improving people’s lives in the spaces where we live and

work [1], [2]. Repetitive tasks such as cleaning surfaces are

well-suited for robots, but remain challenging for systems

that typically operate in structured environments. Operating

in the real world requires handling high-dimensional sensory

inputs and dealing with the stochasticity of the environment.

Learning-based techniques such as reinforcement learning

(RL) offer the promise of solving these complex visuo-motor

tasks from high-dimensional observations. However, apply-

ing end-to-end learning methods to mobile manipulation

tasks remains challenging due to the increased dimensional-

ity and the need for precise low-level control. Additionally,

on-robot deployment either requires collecting large amounts

of data [3]–[5], using accurate but computationally expensive

models [5], or on-hardware ﬁne-tuning [6].

In this work, we focus on the task of cleaning tables with

a mobile robotic manipulator equipped with a wiping tool.

This problem is challenging for both high-level planning and

low-level control. Indeed, at a high-level, deciding how to

best wipe a spill perceived by a camera requires solving a

challenging planning problem with stochastic dynamics. At a

low-level, executing a wiping motion requires simultaneously

maintaining contact with the table while avoiding nearby

obstacles such as chairs. Designing a real-time and effective

solution to this problem remains an open problem [1].

1Robotics at Google 2Everyday Robots

3Department of Aeronautics and Astronautics, Stanford University

4Department of Computer Science, SUNY Binghamton

Fig. 1: We present a framework to autonomously clean tables with

a mobile manipulator. Our proposed approach combines reinforce-

ment learning to select the best wiping strategy and trajectory

optimization to safely execute the wiping actions. We validate our

approach on the multipurpose assistive robot from Everyday Robots.

Our main contributions are as follows:

•We propose a framework for autonomous table wiping.

First, we use visual observations of the table state to plan

high-level wiping actions for the end-effector. Then, we

compute whole-body trajectories that we execute using

admittance control. This approach is key to achieving

reliable table wiping in new environments without the

need for real-world data collection or demonstrations.

•We describe the uncertain time evolution of dirty particles

on the table using a stochastic differential equation (SDE)

capable of modeling absorption with the wiper. Then, we

formulate the problem of planning wiping actions as a

stochastic optimal control problem. As this task requires

planning over high-dimensional visual inputs, we solve

the problem using RL, entirely in simulation.

•We design a whole-body trajectory optimization algo-

rithm for navigation in cluttered environments and table

wiping. Our approach accounts for the kinematics of the

manipulator, the nonholonomic constraints of the base,

and collision avoidance constraints with the environment.

This approach combines the strengths of reinforcement learn-

ing - planning in high-dimensional observation spaces with

complex stochastic dynamics, and of trajectory optimization -

guaranteeing constraints satisfaction while executing whole-

body trajectories; it does not require collecting a task-speciﬁc

dataset on the system, and transfers zero-shot to hardware.

II. RELATED WORK

Reinforcement learning allows tackling complex high-

dimensional planning problems with stochastic multimodal

dynamics that would be difﬁcult to solve in real-time with

arXiv:2210.10865v1 [cs.RO] 19 Oct 2022

Fig. 2: System overview. (1) A perception module processes sensory inputs (camera images and LiDAR pointclouds) and senses obstacles,

crumbs, and spills. (2) A high-level planning module selects wiping waypoints on the table. (3) A whole-body trajectory optimization

module computes joint angles to perform the wipe while satisfying constraints. (4) An admittance controller executes the planned trajectory.

model-based techniques [7]–[10]. The success of RL in

complex robotic tasks hinges on appropriately selecting the

observation and action spaces to simplify learning [11]–[14].

Indeed, end-to-end training is computationally costly and

requires expensive data collection [3]–[5]. Previous work

demonstrated that decomposing complex problems by plan-

ning high-level waypoints with RL and generating motion

plans with model-based approaches improves performance

[13], [15]. We use a similar decomposition in this work.

However, while previous approaches were applied to solve

navigation or manipulation tasks independently, we demon-

strate that RL can be combined with trajectory optimization

and admittance control to simultaneously move the base and

arm of a mobile manipulator while avoiding obstacles to

solve a complex task such as table wiping.

Trajectory optimization allows computing dynamically-

feasible trajectories that guarantee reliable and safe execu-

tion, e.g., by accounting for obstacle avoidance constraints.

Multiple works have demonstrated real-time trajectory op-

timization algorithms for mobile manipulators, e.g., for a

ball-balancing manipulator [16], [17] and a legged robot

equipped with a robotic arm [18]–[20]. In this work, we

demonstrate real-time whole-body collision-free trajectory

optimization on the mobile manipulator with an arm with

seven degrees of freedom shown in Figure 1. The robot

base has a nonholonomic constraint that makes real-time

trajectory optimization challenging. Navigating in a cluttered

environment such as a kitchen requires real-time collision

avoidance. As in [16], [20], [21], we enforce collision

avoidance constraints in the formulation. We assume that all

perceived obstacles are given as polyhedrons, as is common

in the literature [22]–[24]. The optimization problem is

solved using the differentiable shooting-sequential-quadratic-

programming (ShootingSQP) method presented in [25].

Table wiping with a multipurpose mobile manipulator is

challenging, as this task requires simultaneously reasoning

about the cleanliness state of the table, planning optimal

wiping actions, and acting accordingly [1]. Table wiping

approaches can be divided into three categories. First, classi-

cal methods detect spills and subsequently apply pre-deﬁned

wiping patterns to clean them [26]. These methods work

well but are suboptimal as they do not explicitly reason

about the time evolution of the spills at planning time. A

second class of methods uses analytical or learned transition

models for the cleanliness state of the table and subsequently

apply classical planning methods to solve these problems

[27]–[29]. These methods were applied to manipulation

problems too [30]. However, [27], [29] only consider dirt

cleaning tasks, and learning transition dynamics accounting

for absorption and the sticking behavior of certain liquids

(e.g., honey) may be challenging. A third class of imitation

learning methods uses demonstration data to learn a cleaning

policy [31]–[36]. These methods are successful since wiping

primitives are easier to learn than visual transition models.

Our proposed approach does not require training data

from the system. Instead, we train an RL policy that selects

high-level wiping waypoints entirely in simulation. The key

consists of describing crumbs and spill dynamics with a

stochastic differential equation (SDE) [37]. In contrast to

existing learned [38] models and the material point method

[39], our SDE model does not require training data from

the system that may be expensive to collect, is efﬁcient to

simulate, allows modeling dry objects, sticking and absorp-

tion behavior. We then directly deploy the learned wiping

policy to hardware. Wipes are executed using trajectory

optimization which guarantees reliable and safe execution.

III. COMBINING RL AND TRAJECTORY OPTIMIZATION

We consider two table wiping tasks: gathering crumbs

and cleaning spills. A robot equipped with a wiper cannot

immediately capture crumbs and clean dirt particles. Instead,

one may ﬁrst gather the crumbs together before using a

different method to remove them (e.g., with a vacuum cleaner

[27] or by pushing them into a bin, see Section VI). Thus,

we formulate the objective of the crumbs-gathering task as

moving all crumbs particles to the center of the table. For

spills-cleaning, the objective is wiping all spill particles.

The problem of table wiping can be formulated as a

POMDP from image inputs to control signals to send to the

robot actuators. In this work, we decompose the problem and

propose a framework (see Figure 2) consisting of four steps:

1) Perception: The system processes LiDAR pointclouds

and camera depth and color images and returns bounding

boxes for obstacles O(see Figure 5) and an image mask

ofor spills and crumbs on the table (see Section IV).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RoboticTableWipingviaReinforcementLearningandWhole-bodyTrajectoryOptimizationThomasLew1;3,SumeetSingh1,MarioPrats2,JeffreyBingham2,JonathanWeisz2,BenjieHolson2,XiaohanZhang1;4,VikasSindhwani1,YaoLu1,FeiXia1,PengXu1,TingnanZhang1,JieTan1,MontserratGonzalez1AbstractWeproposeaframeworktoenablemultipur...

展开>> 收起<<

Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization Thomas Lew13 Sumeet Singh1 Mario Prats2 Jeffrey Bingham2 Jonathan Weisz2 Benjie Holson2 Xiaohan.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization Thomas Lew13 Sumeet Singh1 Mario Prats2 Jeffrey Bingham2 Jonathan Weisz2 Benjie Holson2 Xiaohan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: