Motion Policy Networks Adam Fishman University of Washington

2025-05-02 0 0 5.93MB 24 页 10玖币

侵权投诉

Motion Policy Networks

Adam Fishman∗

University of Washington

afishman@cs.washington.edu

Adithyavairavan Murali

NVIDIA

admurali@nvidia.com

Clemens Eppner

NVIDIA

ceppner@nvidia.com

Bryan Peele

NVIDIA

bpeele@nvidia.com

Byron Boots∗

University of Washington

bboots@cs.washington.edu

Dieter Fox∗

University of Washington

fox@cs.washington.edu

Abstract: Collision-free motion generation in unknown environments is a core

building block for robot manipulation. Generating such motions is challenging

due to multiple objectives; not only should the solutions be optimal, the mo-

tion generator itself must be fast enough for real-time performance and reliable

enough for practical deployment. A wide variety of methods have been proposed

ranging from local controllers to global planners, often being combined to offset

their shortcomings. We present an end-to-end neural model called Motion Policy

Networks (MπNets) to generate collision-free, smooth motion from just a single

depth camera observation. MπNets are trained on over 3million motion plan-

ning problems in 500,000 environments. Our experiments show that MπNets are

signiﬁcantly faster than global planners while exhibiting the reactivity needed to

deal with dynamic scenes. They are 46% better than prior neural planners and

more robust than local control policies. Despite being only trained in simulation,

MπNets transfer well to the real robot with noisy partial point clouds. Videos and

code are available at https://mpinets.github.io.

Keywords: Motion Control, Imitation Learning, End-to-end Learning

Figure 1: MπNets are trained on a large dataset of synthetic demonstrations (left) and can solve

complex motion planning problems using raw point cloud observations (right).

1 Introduction

Generating fast and legible motions for a robotic manipulator in unknown environments is still an

open problem. Decades of research have established many well-studied algorithms, but there are

two practical issues that prevent motion planning methods from being widely adopted in industrial

applications and home environments that require real-time control. First, it is challenging for any

single approach to satisfy multiple planning considerations: speed, completeness, optimality, ease-

of-use, legibility (from the perspective of a human operator), determinism, and smoothness. Second,

existing approaches enforce strong assumptions about the visual obstacle representations—such as

accurate collision checking in conﬁguration space [1] or the availability of a gradient [2,3,4]—and

∗Author is also afﬁliated with NVIDIA

arXiv:2210.12209v1 [cs.RO] 21 Oct 2022

hence require intermediate processing to operate in novel scenes directly using raw sensor observa-

tions.

Global planners such as RRT [5] are useful to quickly ﬁnd a feasible path but say nothing about

optimality. Other sampling-based approaches iteratively reﬁne their paths to reduce cost and asymp-

totically approach the optimal solution [6,7,8,9]. Optimization-based approaches [2,3,10] em-

brace locally optimal behavior in exchange for completeness. Recent methods such as Geometric

Fabrics [4] and STORM [11] deploy reactive local policies and assume that local decisions will lead

to globally acceptable paths. Unfortunately, as we show in our experiments, the performance of

these local approaches degrades in more geometrically complex environments as they get stuck in

local minima. Motivated by the success of deep learning, neural motion planning approaches such

as Motion Planning Networks [12] have been proposed to greatly improve the sampling of an RRT

planner with imitation learning. However, they still require a planner and a collision checker with

known models at test time.

Planners have traditionally been evaluated with known environment models and perfect state esti-

mation. When deploying them in practice, however, one would have to create one of several scene

representations: a static or dynamic mesh, occupancy grids [13,14], signed distance ﬁelds, etc. Re-

construction systems such as SLAM and KinectFusion [15] have a large system start-up time, require

a moving camera to aggregate many viewpoints, and ultimately require costly updates in the pres-

ence of dynamic objects. Recent implicit deep learning methods like DeepSDF [16] and NeRF [17]

are slow or do not yet generalize to novel scenes. Methods such as SceneCollisionNet [18] provide

fast collision checks but require expensive MPC rollouts at test time. It also draws samples from a

straight line path in conﬁguration space which may not generalize to challenging environments be-

yond a tabletop. Other RL-based methods learn a latent representation from observations but have

only been applied to simple 2D [19,20] or 3D [21] environments in simulation.

We present Motion Policy Networks (MπNets), a novel method for learning an end-to-end policy for

motion planning. Our approach circumvents the challenges of traditional motion planning and is

ﬂexible enough to be applied in unknown environments. Our contributions are as follows:

• We present a large-scale effort in neural motion planning for manipulation. Speciﬁcally, we

learn from over 3million motion planning problems across over 500,000 instances of three

types of environments, nearly 300x larger than prior work [12].

• We train a reactive, end-to-end neural policy that operates on point clouds of the environment

and moves to task space targets while avoiding obstacles. Our policy is signiﬁcantly faster than

other baseline conﬁguration space planners and succeeds more than local task space controllers.

• On our challenging dataset benchmarks, we show that MπNets is nearly 46% more successful

at ﬁnding collision-free paths than prior work [12] without even needing the full scene collision

model.

• Finally, we demonstrate sim2real transfer to real robot partial point cloud observations.

2 Related Work

Global Planning: Robotic motion planning typically splits into three camps: search, sampling, and

optimization-based planning. Search-based planning algorithms, such as A* [22,23,24], discretize

the state space and perform a graph search to ﬁnd an optimal path. While the graph search can

be fast, complete, and guaranteed optimal, the requirement to construct a discrete graph hinders

these algorithms in continuous spaces and for novel problems not well covered by the current graph.

Sampling-based planners [5] function in a continuous state space by drawing samples and building

a tree. When the tree has sufﬁcient coverage of the planning problem, the algorithm traverses the

tree to produce the ﬁnal plan. Sampling based planners are continuous, probabilistically complete,

i.e. ﬁnd a solution with probability 1, and some are even asymptotically optimal [6,7,8], but under

practical time limitations, their random nature can produce erratic—though valid—paths.

Both of the aforementioned planner types are designed to optimize for path length in the given state

space (e.g. conﬁguration space) while avoiding collisions. An optimal path in conﬁguration space

is not necessarily optimal for the end effector in cartesian space. Humans motion tends to minimize

hand distance traveled [25], so what appears optimal for the algorithm may be unintuitive for a hu-

man partner or operator. In the manipulation domain, goals are typically represented in end effector

task space [26,27]. In a closed loop setting with a moving target, the traditional process of using

IK to map task to conﬁguration space can produce highly variable conﬁgurations, especially around

obstacles. Motion Optimization [2,3,28] on the other hand, generates paths with non-linear opti-

mization and can consider multiple objectives such as smoothness of the motion, obstacle avoidance

and convergence to an end effector pose. These algorithms require careful tuning of the respective

cost functions to ensure convergence to a desirable path and are prone to local minima. Furthermore,

non-linear optimization is computationally complex and can be slow for difﬁcult planning problems.

Local Control: In contrast to global planners, local controllers have long been applied to create

collision-free motions [29,30,4,11]. While they prioritize speed and smoothness, they are highly

local and may fail to ﬁnd a valid path in complex environments. We demonstrate in our experiments

that MπNets are more effective at producing convergent motions in these types of environments,

including in dynamic and in partially observed settings.

Imitation Learning: Imitation Learning [31] can train a policy from expert demonstrations with

limited knowledge of the expert’s internal model. For motion planning problems, we can apply im-

itation learning and leverage a traditional planner as the expert demonstrator—with perfect model

of the scene during training—and learn a policy that forgoes the need for an explicit scene model at

test time. Popular imitation learning methods include Inverse Reinforcement Learning [32,33,34]

and Behavior Cloning [35,36]. The former typically assumes expert optimality and learns a cost

function accordingly, whereas the latter directly learns the state-action mapping from demonstra-

tions, regardless of the expert’s optimality. We thus employ behavior cloning because producing

optimal plans for continuous manipulation problems is challenging. Recent work demonstrates

behavior cloning’s efﬁcacy for ﬁne-grained manipulation tasks, such as chopstick use [37] and pick-

and-place [38]. For long-horizon tasks like ours, however, distributional shift and data variance can

hinder behavior cloning performance. Distribution shift during execution can lead to states unseen

in training data [37]. Complex tasks often have a long tail of possible action states that are under-

represented in the data, leading to high data variance [39]. There are many techniques to address

these challenges through randomization, noise injection, regret optimization, and expert correction

[37,40,41,42,43]. These techniques, however, have not been demonstrated on a problem of our

scale and complexity (see Appendix Dfor details on the range of data). Our proposal seeks to over-

come these issues by speciﬁcally designing a learnable expert, increasing the scale and variation of

the data, and using a sufﬁciently expressive policy model.

Neural Motion Planning: Many deep planning methods [13,44,45,46] seek to learn efﬁcient sam-

plers to speed up traditional planners. Motion Planning Networks (MPNets) [12] learn to directly

plan through imitation of a standard sampling based RRT* planner [6] and is used in conjunction

with a traditional planner for stronger guarantees. While these works greatly improve the speed of

the planning search, they have the same requirements as a standard planning system: targets in con-

ﬁguration space and an explicit collision checker to connect the path. Our work operates based on

task-space targets and perceptual observations from a depth sensor without explicit state estimation.

Novel architectures have been proposed, such as differentiable planning modules in Value Iteration

Networks [20], transformers by Chaplot et al. [47] and goal-conditioned RL policies [48]. These

methods are challenging to generalize to unknown environments or have only been shown in simple

2D [19] or 3D settings [21]. In contrast, we illustrate our approach in the challenging domain of

controlling a 7degrees of freedom (DOF) manipulator in unknown, dynamic environments.

3 Learning from Motion Planning

3.1 Problem Formulation

MπNets expect two inputs, a robot conﬁguration qtand a segmented, calibrated point cloud zt.

Before passing qtthrough the network, we normalize each element to be within [−1,1] according

to the limits for the corresponding joint. We call this qk·k

t. The point cloud is always assumed to be

calibrated in the robot’s base frame, and it encodes three segmentation classes: the robot’s current

geometry, the scene geometry, and the target pose. Targets are inserted into the point cloud via

points sampled from the mesh of a ﬂoating end effector placed at the target pose.

The network produces a displacement within normalized conﬁguration space ˙qk·k

t. To get the next

predicted state ˆqt+1, we take qk·k

t+ ˙qk·k

t, clamp between [−1,1], and unnormalize. During training,

we use ˆqt+1 to compute the loss, and when executing, we use ˆqt+1 as the next position target for the

robot’s low-level controller.

3.2 Model Architecture

Figure 2: MπNets encodes state as a normalized robot conﬁguration and segmented point cloud

with three classes for the robot, the obstacles, and the target. The policy outputs a displacement in

normalized joint space, which can then be applied to the input before unnormalizing to get qt+1.

The network consists of two separate encoders, one for the point cloud and one for the robot’s

current conﬁguration, as well as a decoder, totaling 19M parameters. Our neural policy architecture

is visualized in Fig. 2. We use PointNet++ [49] for our point cloud encoder. PointNet++ learns

a hierarchical point cloud representation and can encode a point cloud’s 3D geometry, even with

high variation in sampling density. PointNet++ architectures have been shown to be effective for

a variety of point cloud processing tasks, such as segmentation [49], collision checking [18], and

robotic grasping [50,51]. Additionally, PointNet++ includes PointNet as a subcomponent. PointNet

is effective at processing partially observed pointclouds, even when trained exclusively with fully-

observed scenes [52]. The robot conﬁguration encoder and the displacement decoder are both fully

connected multilayer perceptrons. We discuss the architecture in detail in Appendix C.

3.3 Loss Function

The network is trained with a compound loss function with two constituent parts: a behavior cloning

loss to enforce accurate predictions and a collision loss to safeguard against catastrophic behavior.

Geometric Loss for Behavior Cloning To encourage alignment between the prediction and the

expert, we compute a geometric loss across a set of 1,024 ﬁxed points along the surface of the robot.

LBC(ˆ

∆qt) = X

kˆxi

t+1 −xi

t+1k2+kˆxi

t+1 −xi

t+1k1, where ˆxi

t+1 =φi(qt+ˆ

∆qt)

t+1 =φi(qt+1)(1)

φi(·)represents a forward kinematics mapping from the joint angles of the robot to point ideﬁned

on the robot’s surface. The loss is computed as the sum of the L1and L2distances between cor-

responding points on the expert and the prediction after applying the predicted displacement. By

using both L1and L2, we are able to penalize both large and small deviations.

We use a geometric, task-space loss because our goal is to ensure task-space consistency of our

policy. Conﬁguration space loss appears in prior work [12], but does not capture the accumulated

error of the kinematic chain as effectively (see Appendix J).

Collision Loss In order to avoid collisions–a catastrophic failure–we apply an additional hinge-

loss inspired by motion optimization [53].

Lcollision =X

khj(ˆxi

t+1)k2, where hj(ˆxi

t+1) = −Dj(ˆxi

t+1),if Dj(ˆxi

t+1)≤0

0,if Dj(ˆxi

t+1)>0(2)

The synthetic environments are fully-observable during training, giving us access to the signed-

distance functions (SDF), {Dj(·)}j, of the obstacles in each scene. For a given closed surface, its

SDF maps a point in Euclidean space to the minimum distance from the point to the surface. If the

point is inside the surface, the function returns a negative value.

3.4 Training Implementation Details

MπNets is trained for single-step prediction, but during inference, we use it recursively for closed-

loop rollouts. The compounded noise in subsequent inputs equates covariate shift [41,43]. To

promote robustness, we augment our training data with random perturbations in two ways. We apply

Gaussian noise to the joint angles of each input conﬁguration, which in turn affects the corresponding

points in the point cloud, passed as input to the network [37,54]. Additionally, for each training

example, we generate a unique point cloud during training, i.e. during each epoch, the network sees

163.5M unique point clouds. We train our network with a single set of weights across our entire

dataset.

4 Procedural Data Generation

Figure 3: MπNets is trained with a dataset consisting of solutions to 3.27 million unique planning

problems across over 575,000 unique, procedurally generated environments.

4.1 Large-scale Motion Planning Problems

Each planning problem is deﬁned by three components: the scene geometry, the start conﬁguration,

and the goal pose. Our dataset consists of randomly generated problems across all three components,

totaling 3.27 million problems in over 575,000 environments. We have three classes of problems of

increasing difﬁculty: a cluttered tabletop with randomly placed objects, cubbies and dressers. Rep-

resentative examples of these environments are shown in Fig. 1. Once we build these environments,

we generate a set of potential end-effector targets and corresponding inverse kinematics solutions.

We then randomly choose pairs of these conﬁgurations and verify if a plan exists between them

using our expert pipeline, as detailed further in Sec. 4.2 and in the Appendix D.

4.2 Expert Pipeline

Our expert pipeline is designed to produce high quality demonstrations we want to mimic, i.e. tra-

jectories with smooth, consistent motion and short path lengths. Here, consistency is meant to

describe quality and repeatability of an expert planner—see Appendix Bfor further discussion. We

considered two candidates for the expert - the Global Planner which is a typical state-of-the-art

conﬁguration space planning pipeline [9] and a Hybrid Planner that we engineered speciﬁcally to

generate consistent motion in task space. For both planners, we reject any trajectories that produce

collisions, exceed the joint limits, exhibit erratic behavior (i.e. high jerk), or that have divergent

motion (i.e. ﬁnal task space pose is more than 5cm from the target).

Global Planner consists of off-the-shelf components of a standard motion planning pipeline–inverse

kinematics (IK) [55], conﬁguration-space AIT* [9], and spline-based, collision-aware trajectory

smoothing [56]. For a solvable problem, as the planning time approaches inﬁnity, IK will ﬁnd a valid

solution and AIT* will produce an optimal collision-free path, both with probability 1. Likewise,

with continuous collision checking, the smoother will produce a smooth, collision-free path. In

practice, our dataset size goal—we generated 6.54M trajectories across over 773K environments—

dictated our computation budget and we tuned the algorithms according to this limit. We attempted

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

MotionPolicyNetworksAdamFishmanUniversityofWashingtonafishman@cs.washington.eduAdithyavairavanMuraliNVIDIAadmurali@nvidia.comClemensEppnerNVIDIAceppner@nvidia.comBryanPeeleNVIDIAbpeele@nvidia.comByronBootsUniversityofWashingtonbboots@cs.washington.eduDieterFoxUniversityofWashingtonfox@cs.washingt...

展开>> 收起<<

Motion Policy Networks Adam Fishman University of Washington.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Motion Policy Networks Adam Fishman University of Washington

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: