Motion Policy Networks Adam Fishman University of Washington

2025-05-02 0 0 5.93MB 24 页 10玖币
侵权投诉
Motion Policy Networks
Adam Fishman
University of Washington
afishman@cs.washington.edu
Adithyavairavan Murali
NVIDIA
admurali@nvidia.com
Clemens Eppner
NVIDIA
ceppner@nvidia.com
Bryan Peele
NVIDIA
bpeele@nvidia.com
Byron Boots
University of Washington
bboots@cs.washington.edu
Dieter Fox
University of Washington
fox@cs.washington.edu
Abstract: Collision-free motion generation in unknown environments is a core
building block for robot manipulation. Generating such motions is challenging
due to multiple objectives; not only should the solutions be optimal, the mo-
tion generator itself must be fast enough for real-time performance and reliable
enough for practical deployment. A wide variety of methods have been proposed
ranging from local controllers to global planners, often being combined to offset
their shortcomings. We present an end-to-end neural model called Motion Policy
Networks (MπNets) to generate collision-free, smooth motion from just a single
depth camera observation. MπNets are trained on over 3million motion plan-
ning problems in 500,000 environments. Our experiments show that MπNets are
significantly faster than global planners while exhibiting the reactivity needed to
deal with dynamic scenes. They are 46% better than prior neural planners and
more robust than local control policies. Despite being only trained in simulation,
MπNets transfer well to the real robot with noisy partial point clouds. Videos and
code are available at https://mpinets.github.io.
Keywords: Motion Control, Imitation Learning, End-to-end Learning
Figure 1: MπNets are trained on a large dataset of synthetic demonstrations (left) and can solve
complex motion planning problems using raw point cloud observations (right).
1 Introduction
Generating fast and legible motions for a robotic manipulator in unknown environments is still an
open problem. Decades of research have established many well-studied algorithms, but there are
two practical issues that prevent motion planning methods from being widely adopted in industrial
applications and home environments that require real-time control. First, it is challenging for any
single approach to satisfy multiple planning considerations: speed, completeness, optimality, ease-
of-use, legibility (from the perspective of a human operator), determinism, and smoothness. Second,
existing approaches enforce strong assumptions about the visual obstacle representations—such as
accurate collision checking in configuration space [1] or the availability of a gradient [2,3,4]—and
Author is also affiliated with NVIDIA
arXiv:2210.12209v1 [cs.RO] 21 Oct 2022
hence require intermediate processing to operate in novel scenes directly using raw sensor observa-
tions.
Global planners such as RRT [5] are useful to quickly find a feasible path but say nothing about
optimality. Other sampling-based approaches iteratively refine their paths to reduce cost and asymp-
totically approach the optimal solution [6,7,8,9]. Optimization-based approaches [2,3,10] em-
brace locally optimal behavior in exchange for completeness. Recent methods such as Geometric
Fabrics [4] and STORM [11] deploy reactive local policies and assume that local decisions will lead
to globally acceptable paths. Unfortunately, as we show in our experiments, the performance of
these local approaches degrades in more geometrically complex environments as they get stuck in
local minima. Motivated by the success of deep learning, neural motion planning approaches such
as Motion Planning Networks [12] have been proposed to greatly improve the sampling of an RRT
planner with imitation learning. However, they still require a planner and a collision checker with
known models at test time.
Planners have traditionally been evaluated with known environment models and perfect state esti-
mation. When deploying them in practice, however, one would have to create one of several scene
representations: a static or dynamic mesh, occupancy grids [13,14], signed distance fields, etc. Re-
construction systems such as SLAM and KinectFusion [15] have a large system start-up time, require
a moving camera to aggregate many viewpoints, and ultimately require costly updates in the pres-
ence of dynamic objects. Recent implicit deep learning methods like DeepSDF [16] and NeRF [17]
are slow or do not yet generalize to novel scenes. Methods such as SceneCollisionNet [18] provide
fast collision checks but require expensive MPC rollouts at test time. It also draws samples from a
straight line path in configuration space which may not generalize to challenging environments be-
yond a tabletop. Other RL-based methods learn a latent representation from observations but have
only been applied to simple 2D [19,20] or 3D [21] environments in simulation.
We present Motion Policy Networks (MπNets), a novel method for learning an end-to-end policy for
motion planning. Our approach circumvents the challenges of traditional motion planning and is
flexible enough to be applied in unknown environments. Our contributions are as follows:
We present a large-scale effort in neural motion planning for manipulation. Specifically, we
learn from over 3million motion planning problems across over 500,000 instances of three
types of environments, nearly 300x larger than prior work [12].
We train a reactive, end-to-end neural policy that operates on point clouds of the environment
and moves to task space targets while avoiding obstacles. Our policy is significantly faster than
other baseline configuration space planners and succeeds more than local task space controllers.
On our challenging dataset benchmarks, we show that MπNets is nearly 46% more successful
at finding collision-free paths than prior work [12] without even needing the full scene collision
model.
Finally, we demonstrate sim2real transfer to real robot partial point cloud observations.
2 Related Work
Global Planning: Robotic motion planning typically splits into three camps: search, sampling, and
optimization-based planning. Search-based planning algorithms, such as A* [22,23,24], discretize
the state space and perform a graph search to find an optimal path. While the graph search can
be fast, complete, and guaranteed optimal, the requirement to construct a discrete graph hinders
these algorithms in continuous spaces and for novel problems not well covered by the current graph.
Sampling-based planners [5] function in a continuous state space by drawing samples and building
a tree. When the tree has sufficient coverage of the planning problem, the algorithm traverses the
tree to produce the final plan. Sampling based planners are continuous, probabilistically complete,
i.e. find a solution with probability 1, and some are even asymptotically optimal [6,7,8], but under
practical time limitations, their random nature can produce erratic—though valid—paths.
Both of the aforementioned planner types are designed to optimize for path length in the given state
space (e.g. configuration space) while avoiding collisions. An optimal path in configuration space
is not necessarily optimal for the end effector in cartesian space. Humans motion tends to minimize
hand distance traveled [25], so what appears optimal for the algorithm may be unintuitive for a hu-
man partner or operator. In the manipulation domain, goals are typically represented in end effector
2
task space [26,27]. In a closed loop setting with a moving target, the traditional process of using
IK to map task to configuration space can produce highly variable configurations, especially around
obstacles. Motion Optimization [2,3,28] on the other hand, generates paths with non-linear opti-
mization and can consider multiple objectives such as smoothness of the motion, obstacle avoidance
and convergence to an end effector pose. These algorithms require careful tuning of the respective
cost functions to ensure convergence to a desirable path and are prone to local minima. Furthermore,
non-linear optimization is computationally complex and can be slow for difficult planning problems.
Local Control: In contrast to global planners, local controllers have long been applied to create
collision-free motions [29,30,4,11]. While they prioritize speed and smoothness, they are highly
local and may fail to find a valid path in complex environments. We demonstrate in our experiments
that MπNets are more effective at producing convergent motions in these types of environments,
including in dynamic and in partially observed settings.
Imitation Learning: Imitation Learning [31] can train a policy from expert demonstrations with
limited knowledge of the expert’s internal model. For motion planning problems, we can apply im-
itation learning and leverage a traditional planner as the expert demonstrator—with perfect model
of the scene during training—and learn a policy that forgoes the need for an explicit scene model at
test time. Popular imitation learning methods include Inverse Reinforcement Learning [32,33,34]
and Behavior Cloning [35,36]. The former typically assumes expert optimality and learns a cost
function accordingly, whereas the latter directly learns the state-action mapping from demonstra-
tions, regardless of the expert’s optimality. We thus employ behavior cloning because producing
optimal plans for continuous manipulation problems is challenging. Recent work demonstrates
behavior cloning’s efficacy for fine-grained manipulation tasks, such as chopstick use [37] and pick-
and-place [38]. For long-horizon tasks like ours, however, distributional shift and data variance can
hinder behavior cloning performance. Distribution shift during execution can lead to states unseen
in training data [37]. Complex tasks often have a long tail of possible action states that are under-
represented in the data, leading to high data variance [39]. There are many techniques to address
these challenges through randomization, noise injection, regret optimization, and expert correction
[37,40,41,42,43]. These techniques, however, have not been demonstrated on a problem of our
scale and complexity (see Appendix Dfor details on the range of data). Our proposal seeks to over-
come these issues by specifically designing a learnable expert, increasing the scale and variation of
the data, and using a sufficiently expressive policy model.
Neural Motion Planning: Many deep planning methods [13,44,45,46] seek to learn efficient sam-
plers to speed up traditional planners. Motion Planning Networks (MPNets) [12] learn to directly
plan through imitation of a standard sampling based RRT* planner [6] and is used in conjunction
with a traditional planner for stronger guarantees. While these works greatly improve the speed of
the planning search, they have the same requirements as a standard planning system: targets in con-
figuration space and an explicit collision checker to connect the path. Our work operates based on
task-space targets and perceptual observations from a depth sensor without explicit state estimation.
Novel architectures have been proposed, such as differentiable planning modules in Value Iteration
Networks [20], transformers by Chaplot et al. [47] and goal-conditioned RL policies [48]. These
methods are challenging to generalize to unknown environments or have only been shown in simple
2D [19] or 3D settings [21]. In contrast, we illustrate our approach in the challenging domain of
controlling a 7degrees of freedom (DOF) manipulator in unknown, dynamic environments.
3 Learning from Motion Planning
3.1 Problem Formulation
MπNets expect two inputs, a robot configuration qtand a segmented, calibrated point cloud zt.
Before passing qtthrough the network, we normalize each element to be within [1,1] according
to the limits for the corresponding joint. We call this qk·k
t. The point cloud is always assumed to be
calibrated in the robot’s base frame, and it encodes three segmentation classes: the robot’s current
geometry, the scene geometry, and the target pose. Targets are inserted into the point cloud via
points sampled from the mesh of a floating end effector placed at the target pose.
3
The network produces a displacement within normalized configuration space ˙qk·k
t. To get the next
predicted state ˆqt+1, we take qk·k
t+ ˙qk·k
t, clamp between [1,1], and unnormalize. During training,
we use ˆqt+1 to compute the loss, and when executing, we use ˆqt+1 as the next position target for the
robot’s low-level controller.
3.2 Model Architecture
Figure 2: MπNets encodes state as a normalized robot configuration and segmented point cloud
with three classes for the robot, the obstacles, and the target. The policy outputs a displacement in
normalized joint space, which can then be applied to the input before unnormalizing to get qt+1.
The network consists of two separate encoders, one for the point cloud and one for the robot’s
current configuration, as well as a decoder, totaling 19M parameters. Our neural policy architecture
is visualized in Fig. 2. We use PointNet++ [49] for our point cloud encoder. PointNet++ learns
a hierarchical point cloud representation and can encode a point cloud’s 3D geometry, even with
high variation in sampling density. PointNet++ architectures have been shown to be effective for
a variety of point cloud processing tasks, such as segmentation [49], collision checking [18], and
robotic grasping [50,51]. Additionally, PointNet++ includes PointNet as a subcomponent. PointNet
is effective at processing partially observed pointclouds, even when trained exclusively with fully-
observed scenes [52]. The robot configuration encoder and the displacement decoder are both fully
connected multilayer perceptrons. We discuss the architecture in detail in Appendix C.
3.3 Loss Function
The network is trained with a compound loss function with two constituent parts: a behavior cloning
loss to enforce accurate predictions and a collision loss to safeguard against catastrophic behavior.
Geometric Loss for Behavior Cloning To encourage alignment between the prediction and the
expert, we compute a geometric loss across a set of 1,024 fixed points along the surface of the robot.
LBC(ˆ
qt) = X
i
kˆxi
t+1 xi
t+1k2+kˆxi
t+1 xi
t+1k1, where ˆxi
t+1 =φi(qt+ˆ
qt)
xi
t+1 =φi(qt+1)(1)
φi(·)represents a forward kinematics mapping from the joint angles of the robot to point idefined
on the robot’s surface. The loss is computed as the sum of the L1and L2distances between cor-
responding points on the expert and the prediction after applying the predicted displacement. By
using both L1and L2, we are able to penalize both large and small deviations.
We use a geometric, task-space loss because our goal is to ensure task-space consistency of our
policy. Configuration space loss appears in prior work [12], but does not capture the accumulated
error of the kinematic chain as effectively (see Appendix J).
Collision Loss In order to avoid collisions–a catastrophic failure–we apply an additional hinge-
loss inspired by motion optimization [53].
Lcollision =X
iX
j
khj(ˆxi
t+1)k2, where hj(ˆxi
t+1) = Dj(ˆxi
t+1),if Dj(ˆxi
t+1)0
0,if Dj(ˆxi
t+1)>0(2)
The synthetic environments are fully-observable during training, giving us access to the signed-
distance functions (SDF), {Dj(·)}j, of the obstacles in each scene. For a given closed surface, its
4
SDF maps a point in Euclidean space to the minimum distance from the point to the surface. If the
point is inside the surface, the function returns a negative value.
3.4 Training Implementation Details
MπNets is trained for single-step prediction, but during inference, we use it recursively for closed-
loop rollouts. The compounded noise in subsequent inputs equates covariate shift [41,43]. To
promote robustness, we augment our training data with random perturbations in two ways. We apply
Gaussian noise to the joint angles of each input configuration, which in turn affects the corresponding
points in the point cloud, passed as input to the network [37,54]. Additionally, for each training
example, we generate a unique point cloud during training, i.e. during each epoch, the network sees
163.5M unique point clouds. We train our network with a single set of weights across our entire
dataset.
4 Procedural Data Generation
Figure 3: MπNets is trained with a dataset consisting of solutions to 3.27 million unique planning
problems across over 575,000 unique, procedurally generated environments.
4.1 Large-scale Motion Planning Problems
Each planning problem is defined by three components: the scene geometry, the start configuration,
and the goal pose. Our dataset consists of randomly generated problems across all three components,
totaling 3.27 million problems in over 575,000 environments. We have three classes of problems of
increasing difficulty: a cluttered tabletop with randomly placed objects, cubbies and dressers. Rep-
resentative examples of these environments are shown in Fig. 1. Once we build these environments,
we generate a set of potential end-effector targets and corresponding inverse kinematics solutions.
We then randomly choose pairs of these configurations and verify if a plan exists between them
using our expert pipeline, as detailed further in Sec. 4.2 and in the Appendix D.
4.2 Expert Pipeline
Our expert pipeline is designed to produce high quality demonstrations we want to mimic, i.e. tra-
jectories with smooth, consistent motion and short path lengths. Here, consistency is meant to
describe quality and repeatability of an expert planner—see Appendix Bfor further discussion. We
considered two candidates for the expert - the Global Planner which is a typical state-of-the-art
configuration space planning pipeline [9] and a Hybrid Planner that we engineered specifically to
generate consistent motion in task space. For both planners, we reject any trajectories that produce
collisions, exceed the joint limits, exhibit erratic behavior (i.e. high jerk), or that have divergent
motion (i.e. final task space pose is more than 5cm from the target).
Global Planner consists of off-the-shelf components of a standard motion planning pipeline–inverse
kinematics (IK) [55], configuration-space AIT* [9], and spline-based, collision-aware trajectory
smoothing [56]. For a solvable problem, as the planning time approaches infinity, IK will find a valid
solution and AIT* will produce an optimal collision-free path, both with probability 1. Likewise,
with continuous collision checking, the smoother will produce a smooth, collision-free path. In
practice, our dataset size goal—we generated 6.54M trajectories across over 773K environments—
dictated our computation budget and we tuned the algorithms according to this limit. We attempted
5
摘要:

MotionPolicyNetworksAdamFishmanUniversityofWashingtonafishman@cs.washington.eduAdithyavairavanMuraliNVIDIAadmurali@nvidia.comClemensEppnerNVIDIAceppner@nvidia.comBryanPeeleNVIDIAbpeele@nvidia.comByronBootsUniversityofWashingtonbboots@cs.washington.eduDieterFoxUniversityofWashingtonfox@cs.washingt...

展开>> 收起<<
Motion Policy Networks Adam Fishman University of Washington.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:5.93MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注