model fit to a large dataset of feasible robot poses, and are
thus approximately drawn from this data distribution, there is
a very high likelihood that every state in the planned path
is valid in terms of self-collisions and kinematic feasibil-
ity; (b) by modelling joint states and end-effector positions
jointly, we avoid the need to explicitly calculate inverse or
forward kinematics at any stage during planning, even when
the goal configuration is given in R3Cartesian space; (c)
by leveraging activation maximisation (AM) via gradients
through performance predictors, we can enforce arbitrarily
complex, potentially non-differentiable constraints that would
be hard to express in direct optimisation-based planners, and
might be intractably restrictive for sampling-based planners;
(d) by taking a pre-trained, data-driven approach to collision
avoidance, we do not need any geometric analysis or accurate
3D models at planning time, nor indeed do we need to perform
any kind of explicit collision checking, which is generally
the main computational bottleneck in sampling-based planners
[19].
In addition to the advantages in path planning that this
method offers and above and beyond related works, we in-
troduce an additional loss on the run-time AM optimisation
process which encourages the planning process to remain in
areas of high likelihood according to our prior belief under
the generative model. In our experiments we find that this
contribution is critical in enabling successful planning that
stays in feasible state-space.
II. RELATED WORK
Successful path planning for a robotic manipulator generally
consists of producing a kinematic sequence of joint states
through which the robot can actuate in order to move from
a start to a goal configuration, while moving only through
viable configurations. While goal positions may be specified
in the same joint space as the plan, in a manipulator context
it is more common for the goal position to be specified in R3
Cartesian end-effector space, or R6if the SO(3) rotation group
is included as well. Viable configurations are the intersection
of feasible states for the robot - i.e. those that are within
joint limits and do not result in self-collision - and collision-
free states with respect to obstacles in the environment. The
intersection of these defines the configuration space for the
robot in a given environment.
As analytically describing the valid configuration space is
generally intractable, sampling-based methods for planning
provide the ability to quickly find connected paths through
valid space, by checking for the validity of individual sampled
nodes. Variants of the Probabilistic Roadmap (PRM) and
Rapidly-exploring Random Tree (RRT) sampling-based algo-
rithms are widely used [1], [2], and provably asymptotically
optimal variants exists in PRM*, RRT* [3]. These methods
suffer from a trade-off between runtime and optimality: while
often relatively quick to find a feasible collision-free path, they
tend to employ a second, slower, stage of path optimisation
to shorten the path through the application of heuristics. In
the presence of restrictive constraints, both sampling- and
optimisation-based planners can be very slow to find an initial
feasible path [7].
Optimisation-based planners start from an initial path or
trajectory guess and then refine it until certain costs, such
as path length, are minimised, and differentiable constraints
satisfied. Techniques such as CHOMP [4] bridge the gap
between planning and optimal control by enabling planning
over path and dynamics. TrajOpt [20] differs from CHOMP
in the numerical optimisation method used and the method of
collision checking. The Gaussian Process Motion Planner [21]
leverages Gaussian process models to represent trajectories
and updates them through interpolation. Stochastic Trajectory
Optimization for Motion Planning (STOMP) [5] is notable in
this context as it is able to produce plans while optimising for
non-differentiable constraints , which our work enables with
gradients through trained performance predictors.
Planning with Deep Neural Networks. A recent line of
work has explored using deep networks to augment some or all
components of conventional planning methods. Qureshi et al.
[10], [11] train a pair of neural networks to embed point-cloud
environment representations and perform single timestep plan-
ning. Iterative application of the planning network produces a
path plan. Ichter and Pavone [9] learn an embedding space of
observations, and use RRT in this space with a learnt collision
checker to produce path plans, but need data to learn a forward
dynamics model in order to roll out the plan.
Another family of learning-based approaches to planning
learn embedding spaces from high dimensional data, and learn
forward dynamics models that operate on this learnt latent
space. Universal Planning Networks [12] learn deterministic
representations of high-dimensional data such that update
steps by gradient descent correspond to the unrolling of a
learned forward model. The Embed-to-Control works [13],
[22] employ variational inference in deep generative models
in which latent-space dynamics is locally linear, a property
that enables locally optimal control in these spaces. DVBFs
[23] improve on these models by relaxing the assumption
that the observation space is Markovian. PlaNet [24] uses
a latent-space dynamics model for planning in model-based
RL. However, planning in all these models tends to consist of
rolling out trajectories in time, finding a promising trajectory,
and executing the given actions. As such, these techniques tend
to become intractable for longer time horizons, and cannot be
thought of as path planning frameworks.
A different approach is that of encoding movement prim-
itives under the learning from demonstrations framework.
Conditional Neural Movement Primitives [25] extracts prior
knowledge from demonstrations and infers distributions over
trajectories conditioned on the current observation. Our ap-
proach differs in that we do not encode trajectories directly, but
rather learn a probabilistic model of robot states and generate
trajectories as we optimise in latent space.
Learning Inverse Kinematics. In this work, by learning a
joint embedding of joint angles qand end-effector positions
e, we are able to optimise for achieving an end-effector target
etarget, while planning state sequences in joint space. Note that
we do not care about the orientation in which the goal is
reached, therefore end-effector orientation is omitted in the
formulation of e. Learning the statistical model of kinematics
means we do not need to solve inverse kinematics (IK) at
any point. Prior work has sought to learn solutions to IK that
can cope with its ill-posed one-to-many nature for redundant
manipulators [26], [27], and to overcome the problems with
analytic and numerical approaches [28]–[30]. Ren et al. [27]
train a generative adversarial network to generate joint angles
from end-effector positions, with the discriminator acting on
the concatenation of both the input position and generated
joints. This method implicitly maximises p(q|e), but does not
address the multimodality of the true p(q|e)IK solutions.
Bocsi et al. [26] employed structured output learning to learn
a generative model for the joint distribution of joint angles
and end-effector positions. By modelling the joint instead of