
VP-STO: Via-point-based Stochastic Trajectory Optimization for
Reactive Robot Behavior
Julius Jankowski∗1,2, Lara Bruderm¨
uller∗3, Nick Hawes3and Sylvain Calinon1,2
Abstract— Achieving reactive robot behavior in complex
dynamic environments is still challenging as it relies on being
able to solve trajectory optimization problems quickly enough,
such that we can replan the future motion at frequencies
which are sufficiently high for the task at hand. We argue that
current limitations in Model Predictive Control (MPC) for robot
manipulators arise from inefficient, high-dimensional trajectory
representations and the negligence of time-optimality in the
trajectory optimization process. Therefore, we propose a motion
optimization framework that optimizes jointly over space and
time, generating smooth and timing-optimal robot trajectories
in joint-space. While being task-agnostic, our formulation
can incorporate additional task-specific requirements, such as
collision avoidance, and yet maintain real-time control rates,
demonstrated in simulation and real-world robot experiments
on closed-loop manipulation. For additional material, please
visit https://sites.google.com/oxfordrobotics.institute/vp-sto.
I. INTRODUCTION
In this paper we consider the problem of generating
continuous, timing-optimal and smooth trajectories for robots
operating in dynamic environments. Such task settings re-
quire the robot to be reactive to unforeseen changes in
the environment, e.g., due to dynamic obstacles, as well
as to be robust and compliant when operating alongside
or together with humans. However, generating this kind
of reactive and yet efficient robot behavior within a high-
dimensional configuration space is significantly challenging.
This is especially the case in robot manipulation scenarios
with many degrees of freedom (DoFs) as the resulting
high-dimensional and multi-objective optimization problems
are difficult to solve on-the-fly. A widespread approach in
robotics is to formulate the task of motion generation as
an optimization problem. Such trajectory-optimization based
methods aim at finding a trajectory that minimizes a cost
function, e.g., motion smoothness, subject to constraints,
e.g., collision avoidance. Solution strategies can either be
gradient-based or sampling-based. Approaches falling in the
former category, e.g., CHOMP [1] and TrajOpt [2], typically
employ second-order iterative methods to find locally optimal
solutions. However, they require the cost function to be
once or even twice-differentiable, which constitutes a major
limitation for manipulation tasks as they usually involve
*Authors contributed equally.
JJ and SC were supported by the Swiss National Science Foundation
(SNSF) through the CODIMAN project. LB was supported by an Amazon
Web Services Lighthouse scholarship. NH received EPSRC funding via the
“From Sensing to Collaboration” programme grant [EP/V000748/1].
1Idiap Research Institute, Martigny, CH; name.surname@idiap.ch
2Ecole Polytechnique F´
ed´
erale de Lausanne (EPFL), CH
3Oxford Robotics Institute, University of Oxford, UK; {larab,
nickh}@robots.ox.ac.uk.
Fig. 1. Experiment settings. Left: Pick-and-place scenario, where the task
is to grasp a bowling pin that is arbitrarily handed over to the robot and to
place it upright in the middle of the table. Right: Pushing scenario, where
the robot has to push the center of the green coffee packet to a moving
target location indicated by the tip of the metal stick.
many complex, discontinuous cost terms and constraints. In
contrast, sampling-based methods [3], [4] can operate on
discontinuous costs by sampling candidate trajectories from a
proposal distribution, evaluating them on the objective, and
updating the proposal distribution according to their rela-
tive performance. Compared to gradient-based optimization,
stochastic approaches typically also achieve higher robust-
ness to difficult reward landscapes due to their exploratory
properties [5]. Yet, achieving reactive robot behavior is
challenging as it requires solving trajectory optimization
problems at frequencies which are sufficiently high for the
task at hand. This issue can be alleviated in Model Predic-
tive Control (MPC) settings by optimizing over a shorter
receding time-horizon. Stochastic, gradient-free trajectory
optimization, such as Model-Predictive Path Integral (MPPI)
control [6] and the Cross-Entropy-Method (CEM) [4], com-
bined with MPC, also known as sampling-based MPC, has
proven state-of-the-art real-time performance on real robotic
systems in challenging and dynamic environments [7], [8],
[9]. However, these works still suffer from limited long-term
anticipation, e.g., getting stuck in front of obstacles, due to
the optimization over a short receding horizon.
Motivated by the above, we propose Via-Point-based
Stochastic Trajectory Optimization (VP-STO), a framework
that introduces the following contributions
1) A low-dimensional, time-continuous representation of
trajectories in joint-space based on via-points that by-
design respect kinodynamic constraints of the robot.
2) Stochastic via-point optimization, based on an evo-
lutionary strategy, aiming at minimizing movement
duration and task-related cost terms.
3) An MPC algorithm optimizing over the full horizon
for real-time application in complex high-dimensional
task settings, such as closed-loop object manipulation.
arXiv:2210.04067v2 [cs.RO] 14 Mar 2023