SAM-RL Sensing- Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering Jun Lv1 Yunhai Feng2 Cheng Zhang3 Shuang Zhao3 Lin Shao4and Cewu Lu5

2025-05-03 0 0 1.88MB 11 页 10玖币

侵权投诉

SAM-RL:Sensing-Aware Model-Based Reinforcement Learning via

Differentiable Physics-Based Simulation and Rendering

Jun Lv1, Yunhai Feng2, Cheng Zhang3, Shuang Zhao3, Lin Shao4∗and Cewu Lu5∗

Abstract—Model-based reinforcement learning (MBRL) is rec-

ognized with the potential to be signiﬁcantly more sample efﬁcient

than model-free RL. How an accurate model can be developed

automatically and efﬁciently from raw sensory inputs (such

as images), especially for complex environments and tasks, is

a challenging problem that hinders the broad application of

MBRL in the real world. In this work, we propose a sensing-

aware model-based reinforcement learning system called SAM-

RL. Leveraging the differentiable physics-based simulation and

rendering, SAM-RL automatically updates the model by com-

paring rendered images with real raw images and produces

the policy efﬁciently. With the sensing-aware learning pipeline,

SAM-RL allows a robot to select an informative viewpoint to

monitor the task process. We apply our framework to real

world experiments for accomplishing three manipulation tasks:

robotic assembly, tool manipulation, and deformable object

manipulation. We demonstrate the effectiveness of SAM-RL

via extensive experiments. Videos are available on our project

webpage at https://sites.google.com/view/rss-sam-rl.

I. INTRODUCTION

Over the past decade, deep reinforcement learning (RL) has

resulted in impressive successes, including mastering Atari

games [1], winning the games of Go [2], and solving Rubik’s

cube with a human-like robot hand [3]. However, deep RL

algorithms adopt the paradigm of model-free RL and require

vast amounts of training data, signiﬁcantly limiting their

practicality for real-world robotic tasks. Model-based rein-

forcement learning (MBRL) is recognized with the potential to

be signiﬁcantly more sample efﬁcient than model-free RL [4].

How to automatically and efﬁciently develop an accurate

model from raw sensory inputs, especially for complex envi-

ronments and tasks, is a challenging problem that hinders the

wide application of MBRL in the physical world.

One line of works [5, 6, 7, 8] adopt representation learning

approaches to learn the model from raw input data. They aim to

learn low-dimensional latent states and action representations

from high-dimensional input data like images. But the learned

*Equal advising.

1Jun Lv is with the Department of Electronic Engineering, Shanghai Jiao

Tong University, China. [lyujune sjtu@sjtu.edu.cn]

2Yunhai Feng is with the Department of Computer Science and Engineer-

ing, University of California San Diego, USA. [yuf020@ucsd.edu]

3Cheng Zhang and Shuang Zhao are with the Department of Com-

puter Science, University of California Irvine, USA. [chengz20@uci.edu,

shz@ics.uci.edu]

4Lin Shao is with the Department of Computer Science, National University

of Singapore, Singapore. [linshao@nus.edu.sg]

5Cewu Lu is the corresponding author, the member of Qing Yuan Re-

search Institute and MoE Key Lab of Artiﬁcial Intelligence, AI Insti-

tute, Shanghai Jiao Tong University, China and Shanghai Qi Zhi Institute.

[lucewu@sjtu.edu.cn]

Bad View Good View

Model

Real World

Differentiable Rendering Model Update

Differentiable

Physics

Simulation

Sim2Real

Fig. 1. Our proposed SAM-RL enables robots to autonomously select an

informative camera viewpoint to better monitor the manipulation task (for

example, the Needle-Threading task). We leverage the differentiable rendering

to update the model by comparing the raw observation between simulation

and the real world, and differentiable physics simulation to produce policy

efﬁciently.

deep network might not satisfy the physical dynamics, and its

quality may also signiﬁcantly degenerate beyond the training

data distribution when testing in the wild. Recent develop-

ments in differentiable physics-based simulation [9, 10, 11,

12, 13, 14, 15, 16] and rendering [17, 18, 19, 20] provide an

alternative direction to model the environment [21, 22]. Lv

et al. [23] use differentiable physics-based simulation as the

backbone of the model and train robots to perform articulated

object manipulation in the real world. Their pipeline produces

a ﬁle of Uniﬁed Robot Description Format (URDF) [24] of the

environment, which is loaded into the differentiable simulation

from raw point clouds gathered by an RGB-D camera mounted

on its wrist. However, a sequence of camera poses is needed

to scan the 3D environment every time step, and these camera

poses are manually predeﬁned, which is time-consuming and

arXiv:2210.15185v3 [cs.RO] 23 May 2023

difﬁcult to adapt to various tasks. Object colors and geometric

details are not included in the model, limiting its representation

capability [23].

By integrating differentiable physics-based simulation and

rendering, we propose a sensing-aware model-based reinforce-

ment learning system called SAM-RL. As shown in Fig. 1, we

apply SAM-RL on a robot system with two 7-DoF robotic

arms (Flexiv Rizon [25] and Franka Emika Panda), where

the former mounts an RGB-D camera, and the latter handles

manipulation tasks. Our framework is sensing-aware, which

allows the robot to automatically select an informative camera

view to effectively monitor the manipulation process, provid-

ing the following beneﬁts. First, the system no longer requires

obtaining a sequence of camera poses at each step, which

is extremely time-consuming. Second, compared with using

a ﬁxed view, SAM-RL leverages varying camera views with

potentially fewer occlusions and offers better estimations of

environment states and object status (especially for deformable

bodies). The improved quality in object status estimation

contributes more effective robotic actions to complete various

tasks. Third, by comparing rendered and measured (i.e., real-

world) images, discrepancies between the simulation and the

reality are better revealed and then reduced automatically using

gradient-based optimization and differentiable rendering.

In practice, we train the robot to learn three challeng-

ing manipulation skills: Peg-Insertion,Spatula-Flipping, and

Needle-Threading. Our experiments indicate that SAM-RL can

signiﬁcantly reduce training time and improve success rate by

large margins compared to common model-free and model-

based deep reinforcement learning algorithms.

Our primary contributions include:

•proposing an active-sensing framework named SAM-RL

that enables robots to select informative views for various

manipulation tasks;

•introducing a model-based reinforcement learning algo-

rithm to produce efﬁcient policies;

•conducting extensive quantitative and qualitative evalua-

tions to demonstrate the effectiveness of our approach;

•applying our framework to robotic assembly, tool manip-

ulation, and deformable object manipulation tasks both

in simulation and real world experiments.

II. RELATED WORK

We review related literature on key components in our

approach, including model-based reinforcement learning, next

best view, integration of differentiable physics-based simula-

tion and rendering, and robotic manipulation. We describe how

we are different from previous work.

A. Model-based Reinforcement Learning

MBRL is considered to be potentially more sample efﬁcient

than model-free RL [4]. However, automatically and efﬁciently

developing an accurate model from raw sensory data is a

challenging problem, which retards MBRL from being widely

applied in the real world. For a broader review of the ﬁeld

on MBRL, we refer to [26]. One line of works [5, 6, 7, 8]

use representation learning methods to learn low-dimensional

latent state and action representations from high-dimensional

input data. But the learned models might not satisfy the

physical dynamics, and the quality may also signiﬁcantly drop

beyond the training data distribution. Recently, Lv et al. [23]

leveraged the differentiable physics simulation and developed

a system to produce a URDF ﬁle to model the surrounding

environment based on an RGB-D camera. However, the RGB-

D camera poses used in [23] are predeﬁned and can not adjust

to different tasks. Our approach allows the robot to select the

most informative camera view to monitor the manipulation

process and update the environment model automatically.

B. Next Best View in Active Sensing

Next Best View (NBV) has been one of the core problems

in active sensing. It studies the problem of how to obtain a

series of sensor poses to increase the information gain. The

information gain is explicitly deﬁned to reﬂect the improved

perception for 3D reconstruction [27, 28, 29, 30], object

recognition [31, 32, 33, 34], 3D model completion [35], and

3D exploration [36, 37]. Unlike perception-related tasks, we

explore the NBV over a wide range of robotic manipulation

tasks. Information gain in the robotic manipulation tasks is

difﬁcult to deﬁne explicitly and is implicitly related to task

performance. In our system, the environment changes accord-

ingly after the robot’s interaction. We integrate the information

gain into the Qfunction to reﬂect the informative viewpoint

for manipulation.

C. Integration of Differentiable Physics-Based Simulation and

Rendering

Recently, great progresses have been made in the ﬁeld of

differentiable physics-based simulation and rendering. For a

broader review, please refer to [9, 10, 11, 12, 13, 14, 15, 16]

and [38, 39]. With the development of these techniques,

Jatavallabhula et al. [21] ﬁrst proposed a pipeline to leverage

differentiable simulation and rendering for system identiﬁ-

cation and visuomotor control. Ma et al. [22] introduced a

rendering-invariant state predictor network that maps images

into states that are agnostic to rendering parameters. By

comparing the state predictions obtained using rendered and

ground-truth images, the pipeline can backpropagate the gradi-

ent to update system parameters and actions. Sundaresan et al.

[40] proposed a real-to-sim parameter estimation approach

from point clouds for deformable objects. Different from these

works, we use the differentiable simulation and rendering to

ﬁnd the next best view for various manipulation tasks and

update the object status in the model by comparing rendered

and captured images.

D. Manipulation

Our framework can be adopted to improve the performance

of a range of manipulation tasks. We review the related work

in these domains. 1) Peg-insertion. Peg insertion is a classic

robotic assembly task with rich literature [41, 42, 43]. For a

t+1

Differentiable

Physics Simulation

Differentiable

Render

Camera

Pose

𝑎

0.83

𝑄

𝑎

0.42

𝑄

0.07

𝑄

Training Data for πand Q

Supervised

Training

𝑄𝑄

Real2Sim

Sim2Real

Learn@Sim

Joint Optimization of Camera Pose

and Action over Q Landscape

πQ

𝑄

𝑎𝑎𝑎

Fig. 2. The overall approach of SAM-RL includes Real2Sim, Learn@Sim, and Sim2Real stages. SAM-RL automatically develop and update the model during

Real2Sim stage. During the Learn@Sim stage, it learns the sensing-aware Qfunction and actor πsim in the model. The differentiable physics simulation

generates training data (rendered image, action, and associated return) to learn the Qfunction and actor function, which allows the robot to select an informative

view. In the Sim2Real stage, SAM-RL learns a residual policy to reduce the sim-to-real gap.

broad review of peg-insertion, we refer to [44]. 2) Spatula-

Flipping. Chebotar et al. [45] used the tactile sensor to train

the robot learning to perform a scraping task with a spatula.

Tsuji et al. [46] studied the dynamic object manipulation by

a spatula. They clariﬁed the conditions for achieving dynamic

movements and presented a uniﬁed algorithm for generating

a variety of movements. 3) Needle-Threading. The needle

threading task requires the robots to adapt action according

to the thread deformation. Silv´

erio et al. [47] relied on a high-

resolution laser scanner to perceive the thread and needle.

Huang et al. [48] used a high-speed camera to monitor the

process and provide high-speed visual feedback. Kim et al.

[49] proposed a deep imitation learning algorithm for the

needle threading task. Unlike approaches above, we develop

a sensing-aware model-based reinforcement learning approach

to learn these skills.

III. TECHNICAL APPROACH

Given a manipulation task denoted as T, our pipeline takes

as input images gathered from an RGB-D camera and outputs

a policy to select the camera pose Pcfollowed by producing

an action a. An overview of our proposed method is shown

in Fig. 2. In what follows, we ﬁrst brieﬂy introduce the

model Mthat integrates differentiable physics-based simula-

tion and rendering in Sec. III-A. Then, we describe developing

and updating the model in M(Real2Sim) in Sec. III-B,

training robots to learn the perception and action with the

model (Learn@Sim) in Sec. III-D, and applying the learned

model to the real world (Sim2Real) in Sec. III-C.

A. Model with Differentiable Simulation and Rendering

In this work, we combine the differentiable physics-based

simulation and rendering. The resulting differentiable system

plays the backbone of the model, which we denote as M,

for the model-based reinforcement learning. The model can

load robots, cameras, and objects denoted as {Osim

j}along

with their visual/geometric (e.g., shape, pose, and texture)

and physical attributes (e.g., mass and inertial). We denote

the attributes of all objects loaded in the simulation as one

type of model’s parameters ψMas follows:

ψM=X

Osim

j.(1)

The model can render an image Isim under the camera pose

Pcand model parameter ψMthrough:

Isim =M(ψM,Pc;render)(2)

We can get the gradients ∂Isim/∂Pcand ∂Isim/∂Osim

using differentiable rendering [18]. Note that ∂Isim/∂Osim

contains only the gradient with respect to object visual and

geometric attributes.

Additionally, given the state ssim

tincluding the object

attributes ψMand the robots’ status, when an action denoted

as asim

tis executed (for example, an external force is exerted

on the object), the model simulates the next state via

ssim

t+1 =M(ssim

t, asim

t;forward)(3)

in a differentiable fashion [10], providing the gradients

∂ssim

t+1 /∂asim

tand ∂ssim

t+1 /∂ssim

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SAM-RL:Sensing-AwareModel-BasedReinforcementLearningviaDifferentiablePhysics-BasedSimulationandRenderingJunLv1,YunhaiFeng2,ChengZhang3,ShuangZhao3,LinShao4∗andCewuLu5∗Abstract—Model-basedreinforcementlearning(MBRL)isrec-ognizedwiththepotentialtobesignificantlymoresampleefficientthanmodel-freeRL.Howa...

展开>> 收起<<

SAM-RL Sensing- Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering Jun Lv1 Yunhai Feng2 Cheng Zhang3 Shuang Zhao3 Lin Shao4and Cewu Lu5.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SAM-RL Sensing- Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering Jun Lv1 Yunhai Feng2 Cheng Zhang3 Shuang Zhao3 Lin Shao4and Cewu Lu5

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: