SAM-RL Sensing- Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering Jun Lv1 Yunhai Feng2 Cheng Zhang3 Shuang Zhao3 Lin Shao4and Cewu Lu5

2025-05-03 0 0 1.88MB 11 页 10玖币
侵权投诉
SAM-RL:Sensing-Aware Model-Based Reinforcement Learning via
Differentiable Physics-Based Simulation and Rendering
Jun Lv1, Yunhai Feng2, Cheng Zhang3, Shuang Zhao3, Lin Shao4and Cewu Lu5
Abstract—Model-based reinforcement learning (MBRL) is rec-
ognized with the potential to be significantly more sample efficient
than model-free RL. How an accurate model can be developed
automatically and efficiently from raw sensory inputs (such
as images), especially for complex environments and tasks, is
a challenging problem that hinders the broad application of
MBRL in the real world. In this work, we propose a sensing-
aware model-based reinforcement learning system called SAM-
RL. Leveraging the differentiable physics-based simulation and
rendering, SAM-RL automatically updates the model by com-
paring rendered images with real raw images and produces
the policy efficiently. With the sensing-aware learning pipeline,
SAM-RL allows a robot to select an informative viewpoint to
monitor the task process. We apply our framework to real
world experiments for accomplishing three manipulation tasks:
robotic assembly, tool manipulation, and deformable object
manipulation. We demonstrate the effectiveness of SAM-RL
via extensive experiments. Videos are available on our project
webpage at https://sites.google.com/view/rss-sam-rl.
I. INTRODUCTION
Over the past decade, deep reinforcement learning (RL) has
resulted in impressive successes, including mastering Atari
games [1], winning the games of Go [2], and solving Rubik’s
cube with a human-like robot hand [3]. However, deep RL
algorithms adopt the paradigm of model-free RL and require
vast amounts of training data, significantly limiting their
practicality for real-world robotic tasks. Model-based rein-
forcement learning (MBRL) is recognized with the potential to
be significantly more sample efficient than model-free RL [4].
How to automatically and efficiently develop an accurate
model from raw sensory inputs, especially for complex envi-
ronments and tasks, is a challenging problem that hinders the
wide application of MBRL in the physical world.
One line of works [5, 6, 7, 8] adopt representation learning
approaches to learn the model from raw input data. They aim to
learn low-dimensional latent states and action representations
from high-dimensional input data like images. But the learned
*Equal advising.
1Jun Lv is with the Department of Electronic Engineering, Shanghai Jiao
Tong University, China. [lyujune sjtu@sjtu.edu.cn]
2Yunhai Feng is with the Department of Computer Science and Engineer-
ing, University of California San Diego, USA. [yuf020@ucsd.edu]
3Cheng Zhang and Shuang Zhao are with the Department of Com-
puter Science, University of California Irvine, USA. [chengz20@uci.edu,
shz@ics.uci.edu]
4Lin Shao is with the Department of Computer Science, National University
of Singapore, Singapore. [linshao@nus.edu.sg]
5Cewu Lu is the corresponding author, the member of Qing Yuan Re-
search Institute and MoE Key Lab of Artificial Intelligence, AI Insti-
tute, Shanghai Jiao Tong University, China and Shanghai Qi Zhi Institute.
[lucewu@sjtu.edu.cn]
12
1
2
Bad View Good View
Model
Real World
3
3
Differentiable Rendering Model Update
Differentiable
Physics
Simulation
Sim2Real
Fig. 1. Our proposed SAM-RL enables robots to autonomously select an
informative camera viewpoint to better monitor the manipulation task (for
example, the Needle-Threading task). We leverage the differentiable rendering
to update the model by comparing the raw observation between simulation
and the real world, and differentiable physics simulation to produce policy
efficiently.
deep network might not satisfy the physical dynamics, and its
quality may also significantly degenerate beyond the training
data distribution when testing in the wild. Recent develop-
ments in differentiable physics-based simulation [9, 10, 11,
12, 13, 14, 15, 16] and rendering [17, 18, 19, 20] provide an
alternative direction to model the environment [21, 22]. Lv
et al. [23] use differentiable physics-based simulation as the
backbone of the model and train robots to perform articulated
object manipulation in the real world. Their pipeline produces
a file of Unified Robot Description Format (URDF) [24] of the
environment, which is loaded into the differentiable simulation
from raw point clouds gathered by an RGB-D camera mounted
on its wrist. However, a sequence of camera poses is needed
to scan the 3D environment every time step, and these camera
poses are manually predefined, which is time-consuming and
arXiv:2210.15185v3 [cs.RO] 23 May 2023
difficult to adapt to various tasks. Object colors and geometric
details are not included in the model, limiting its representation
capability [23].
By integrating differentiable physics-based simulation and
rendering, we propose a sensing-aware model-based reinforce-
ment learning system called SAM-RL. As shown in Fig. 1, we
apply SAM-RL on a robot system with two 7-DoF robotic
arms (Flexiv Rizon [25] and Franka Emika Panda), where
the former mounts an RGB-D camera, and the latter handles
manipulation tasks. Our framework is sensing-aware, which
allows the robot to automatically select an informative camera
view to effectively monitor the manipulation process, provid-
ing the following benefits. First, the system no longer requires
obtaining a sequence of camera poses at each step, which
is extremely time-consuming. Second, compared with using
a fixed view, SAM-RL leverages varying camera views with
potentially fewer occlusions and offers better estimations of
environment states and object status (especially for deformable
bodies). The improved quality in object status estimation
contributes more effective robotic actions to complete various
tasks. Third, by comparing rendered and measured (i.e., real-
world) images, discrepancies between the simulation and the
reality are better revealed and then reduced automatically using
gradient-based optimization and differentiable rendering.
In practice, we train the robot to learn three challeng-
ing manipulation skills: Peg-Insertion,Spatula-Flipping, and
Needle-Threading. Our experiments indicate that SAM-RL can
significantly reduce training time and improve success rate by
large margins compared to common model-free and model-
based deep reinforcement learning algorithms.
Our primary contributions include:
proposing an active-sensing framework named SAM-RL
that enables robots to select informative views for various
manipulation tasks;
introducing a model-based reinforcement learning algo-
rithm to produce efficient policies;
conducting extensive quantitative and qualitative evalua-
tions to demonstrate the effectiveness of our approach;
applying our framework to robotic assembly, tool manip-
ulation, and deformable object manipulation tasks both
in simulation and real world experiments.
II. RELATED WORK
We review related literature on key components in our
approach, including model-based reinforcement learning, next
best view, integration of differentiable physics-based simula-
tion and rendering, and robotic manipulation. We describe how
we are different from previous work.
A. Model-based Reinforcement Learning
MBRL is considered to be potentially more sample efficient
than model-free RL [4]. However, automatically and efficiently
developing an accurate model from raw sensory data is a
challenging problem, which retards MBRL from being widely
applied in the real world. For a broader review of the field
on MBRL, we refer to [26]. One line of works [5, 6, 7, 8]
use representation learning methods to learn low-dimensional
latent state and action representations from high-dimensional
input data. But the learned models might not satisfy the
physical dynamics, and the quality may also significantly drop
beyond the training data distribution. Recently, Lv et al. [23]
leveraged the differentiable physics simulation and developed
a system to produce a URDF file to model the surrounding
environment based on an RGB-D camera. However, the RGB-
D camera poses used in [23] are predefined and can not adjust
to different tasks. Our approach allows the robot to select the
most informative camera view to monitor the manipulation
process and update the environment model automatically.
B. Next Best View in Active Sensing
Next Best View (NBV) has been one of the core problems
in active sensing. It studies the problem of how to obtain a
series of sensor poses to increase the information gain. The
information gain is explicitly defined to reflect the improved
perception for 3D reconstruction [27, 28, 29, 30], object
recognition [31, 32, 33, 34], 3D model completion [35], and
3D exploration [36, 37]. Unlike perception-related tasks, we
explore the NBV over a wide range of robotic manipulation
tasks. Information gain in the robotic manipulation tasks is
difficult to define explicitly and is implicitly related to task
performance. In our system, the environment changes accord-
ingly after the robot’s interaction. We integrate the information
gain into the Qfunction to reflect the informative viewpoint
for manipulation.
C. Integration of Differentiable Physics-Based Simulation and
Rendering
Recently, great progresses have been made in the field of
differentiable physics-based simulation and rendering. For a
broader review, please refer to [9, 10, 11, 12, 13, 14, 15, 16]
and [38, 39]. With the development of these techniques,
Jatavallabhula et al. [21] first proposed a pipeline to leverage
differentiable simulation and rendering for system identifi-
cation and visuomotor control. Ma et al. [22] introduced a
rendering-invariant state predictor network that maps images
into states that are agnostic to rendering parameters. By
comparing the state predictions obtained using rendered and
ground-truth images, the pipeline can backpropagate the gradi-
ent to update system parameters and actions. Sundaresan et al.
[40] proposed a real-to-sim parameter estimation approach
from point clouds for deformable objects. Different from these
works, we use the differentiable simulation and rendering to
find the next best view for various manipulation tasks and
update the object status in the model by comparing rendered
and captured images.
D. Manipulation
Our framework can be adopted to improve the performance
of a range of manipulation tasks. We review the related work
in these domains. 1) Peg-insertion. Peg insertion is a classic
robotic assembly task with rich literature [41, 42, 43]. For a
t
t+1
Differentiable
Physics Simulation
Differentiable
Render
Camera
Pose
𝑎
0.83
𝑄
𝑎
𝑎
0.42
𝑄
0.07
𝑄
Training Data for πand Q
Supervised
Training
𝑄𝑄
Real2Sim
Sim2Real
Learn@Sim
Joint Optimization of Camera Pose
and Action over Q Landscape
πQ
𝑄
𝑎𝑎𝑎
Q
Fig. 2. The overall approach of SAM-RL includes Real2Sim, Learn@Sim, and Sim2Real stages. SAM-RL automatically develop and update the model during
Real2Sim stage. During the Learn@Sim stage, it learns the sensing-aware Qfunction and actor πsim in the model. The differentiable physics simulation
generates training data (rendered image, action, and associated return) to learn the Qfunction and actor function, which allows the robot to select an informative
view. In the Sim2Real stage, SAM-RL learns a residual policy to reduce the sim-to-real gap.
broad review of peg-insertion, we refer to [44]. 2) Spatula-
Flipping. Chebotar et al. [45] used the tactile sensor to train
the robot learning to perform a scraping task with a spatula.
Tsuji et al. [46] studied the dynamic object manipulation by
a spatula. They clarified the conditions for achieving dynamic
movements and presented a unified algorithm for generating
a variety of movements. 3) Needle-Threading. The needle
threading task requires the robots to adapt action according
to the thread deformation. Silv´
erio et al. [47] relied on a high-
resolution laser scanner to perceive the thread and needle.
Huang et al. [48] used a high-speed camera to monitor the
process and provide high-speed visual feedback. Kim et al.
[49] proposed a deep imitation learning algorithm for the
needle threading task. Unlike approaches above, we develop
a sensing-aware model-based reinforcement learning approach
to learn these skills.
III. TECHNICAL APPROACH
Given a manipulation task denoted as T, our pipeline takes
as input images gathered from an RGB-D camera and outputs
a policy to select the camera pose Pcfollowed by producing
an action a. An overview of our proposed method is shown
in Fig. 2. In what follows, we first briefly introduce the
model Mthat integrates differentiable physics-based simula-
tion and rendering in Sec. III-A. Then, we describe developing
and updating the model in M(Real2Sim) in Sec. III-B,
training robots to learn the perception and action with the
model (Learn@Sim) in Sec. III-D, and applying the learned
model to the real world (Sim2Real) in Sec. III-C.
A. Model with Differentiable Simulation and Rendering
In this work, we combine the differentiable physics-based
simulation and rendering. The resulting differentiable system
plays the backbone of the model, which we denote as M,
for the model-based reinforcement learning. The model can
load robots, cameras, and objects denoted as {Osim
j}along
with their visual/geometric (e.g., shape, pose, and texture)
and physical attributes (e.g., mass and inertial). We denote
the attributes of all objects loaded in the simulation as one
type of models parameters ψMas follows:
ψM=X
j
Osim
j.(1)
The model can render an image Isim under the camera pose
Pcand model parameter ψMthrough:
Isim =M(ψM,Pc;render)(2)
We can get the gradients Isim/∂Pcand Isim/∂Osim
j
using differentiable rendering [18]. Note that Isim/∂Osim
j
contains only the gradient with respect to object visual and
geometric attributes.
Additionally, given the state ssim
tincluding the object
attributes ψMand the robots’ status, when an action denoted
as asim
tis executed (for example, an external force is exerted
on the object), the model simulates the next state via
ssim
t+1 =M(ssim
t, asim
t;forward)(3)
in a differentiable fashion [10], providing the gradients
ssim
t+1 /∂asim
tand ssim
t+1 /∂ssim
t.
摘要:

SAM-RL:Sensing-AwareModel-BasedReinforcementLearningviaDifferentiablePhysics-BasedSimulationandRenderingJunLv1,YunhaiFeng2,ChengZhang3,ShuangZhao3,LinShao4∗andCewuLu5∗Abstract—Model-basedreinforcementlearning(MBRL)isrec-ognizedwiththepotentialtobesignificantlymoresampleefficientthanmodel-freeRL.Howa...

展开>> 收起<<
SAM-RL Sensing- Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering Jun Lv1 Yunhai Feng2 Cheng Zhang3 Shuang Zhao3 Lin Shao4and Cewu Lu5.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.88MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注