
2
optimizer is a likely trajectory of human motion and a planned
trajectory for the robot motion.
The main contributions of our work are:
•Formulation of space sharing human-robot motion plan-
ning problems formulated as trajectory optimization with
shooting in the joint human and robot state-space
•Introduction of latent space modifiers that can be used to
change the human prediction.
•Efficient gradient objective and constraints computational
models by using monolithic computational graphs.
The rest of the paper is organized as follows: In Section II,
we discuss relevant prior work. In Section III, we introduce our
framework theoretically and explain implementation details.
In Section IV we evaluate our prediction framework on real
motion data. We further discuss the framework in Section V.
Conclusions are drawn in Section VI.
II. RELATED WORK
A. Human-Aware Motion Planning
The rapidly growing research field of HRC is focusing on
robotic systems that are able to perform joint actions with
humans, in order to fulfill a common task. A main challenge
in close proximity interaction is blanching safety and comfort,
with time and energy efficient execution [8], [9]. Pro-activity
has also been investigated in many scenarios [10], [11].
In order to achieve this, the human partner needs to be
taken into account explicitly when planning the robot’s motion,
leading to human-aware motion planning systems. Human-
aware motion planning has been shown to improve human-
robot team fluency and human worker satisfaction [12]. One
way to introduce human-awareness is to incorporate a cost
function to evaluate the safety of a robot path [13], or
predict which part of the workspace will be occupied by
the human and avoid this area [14], [15]. In order to ensure
human comfort, reasoning explicitly on human’s kinematics,
field of view, posture, and preferences is possible [16]. For
robot navigation, Proxemics, considering public, personal and
private spaces, are important to ensure human comfort [17].
In contrast to prior work in human-aware motion planning,
we co-optimize robot and human motion, using a predictive
human behavior model. We incorporate interaction paradigms,
such as Proxemics, as constraints in trajectory optimization.
B. Human Behavior Prediction in Robotics
In close proximity interactions with humans, the ability to
anticipate the actions of the human partner is key. Hence,
intent prediction, which often consists of predicting a discrete
action or a goal position, has been investigated in [18], [19].
Object affordances can be used to improve the prediction of
human intent [20], [21]. It is often required to know the full
trajectory of the human. For example, it might be important
to know which part of the workspace the human will occupy.
This is often done in a second step in the aforementioned
work, for example, by using social forces [19]. Our method
can be combined with intent prediction similarly, as we have
demonstrated in our prior work [22], [23].
RNN
xH
1
RNN
xH
2
RNN
xH
3
RNN RNN
x0H
4x0H
5x0H
6
h1h2h3h4h5
Encoder input
Decoder output
Fig. 2. Human Motion Prediction with a RNN. The observed trajectory (blue)
is fed into RNN cells (red) and future states can be predicted (green).
Many works on predictive behavior models focus on directly
forecasting human motion. While 2d human motion prediction
is especially important for robot navigation [24]–[26], we are
not only interested in modeling navigation, but also in pick and
place tasks or handovers and thus require a full-body motion
prediction model.
C. Human Full-Body Prediction Models
Early methods for full-body or arm prediction, for example,
use inverse optimal control [27], [28]. However, the availabil-
ity of larger human motion data-sets and recent advances in
neural networks make deep learning techniques state-of-the-
art. Due to the sequential structure of motion data, RNNs
are suitable for full-body motion prediction [29]. A typical
encoder-decoder structure can be seen in Figure 2. The archi-
tecture can be further improved by adding residual connection
in the loop function [2]. It has also been shown that the rotation
representation and loss is important, for instance, using a
quaternion representation improved over prior work [3], [30].
Including a velocity connection can make predictions more
stable for longer time horizons [4]. Recently, motion prediction
using graph neural networks [31] or transformers [32] has been
shown to slightly improve the prediction performances.
Motion prediction based on neural networks promises good
results for predicting short-term motion. However, the models
have the issues that 1) they purely forecast human motion and
do not incorporate workspace geometry 2) they are not con-
trollable and thus can not be changed during motion planning.
In our work we tackle these issues by adding modifiers to
the network architecture, which allows for optimization-based
motion prediction.
D. Motion Optimization
Gradient-based optimization algorithms are widely used
in the field of robotics and optimal control for optimizing
trajectories [33]–[39]. These techniques have been shown to
successfully generate motions with a variety of kinematic
and dynamic objectives and constraints, such as obstacle
constraints [34].
Motion optimization has also been used to synthesize human
behavior for animating characters and is able to generate
realistic motions [40], [41]. In contrast, our work focuses on