
Learning Depth Vision-Based Personalized Robot Navigation
From Dynamic Demonstrations in Virtual Reality
Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz
Abstract— For the best human-robot interaction experience,
the robot’s navigation policy should take into account personal
preferences of the user. In this paper, we present a learning
framework complemented by a perception pipeline to train
a depth vision-based, personalized navigation controller from
user demonstrations. Our virtual reality interface enables the
demonstration of robot navigation trajectories under motion of
the user for dynamic interaction scenarios. The novel perception
pipeline enrolls a variational autoencoder in combination with
a motion predictor. It compresses the perceived depth images
to a latent state representation to enable efficient reasoning of
the learning agent about the robot’s dynamic environment. In
a detailed analysis and ablation study, we evaluate different
configurations of the perception pipeline. To further quantify
the navigation controller’s quality of personalization, we de-
velop and apply a novel metric to measure preference reflection
based on the Fr´
echet Distance. We discuss the robot’s navigation
performance in various virtual scenes and demonstrate the first
personalized robot navigation controller that solely relies on
depth images. A supplemental video highlighting our approach
is available online1.
I. INTRODUCTION
The personalization of robots will be a key factor for
comfortable and satisfying human-robot-interactions. As the
integration of robots at home or at work will inevitably
increase, the number one goal should be a naturally col-
laborative experience between users and the robot. However,
users might have personal preferences about specific aspects
of the robot’s behavior that define the personal golden stan-
dard of interaction. Falling short of user’s preferences could
lead to negative interaction experiences and consequently
frustration [1].
Where humans share the same environment with a mobile
robot, the robot’s navigation behavior significantly influences
the comfort of interaction [2], [3]. Consequently, basic
obstacle avoidance approaches are insufficient to address
individual preferences regarding proxemics, trajectory shape,
or area of navigation in a given environment, while being a
key component to successful navigation without question.
Instead, a robot’s navigation policy should be aware of
humans [4] and reflect the users’ personal preferences.
In our previous work [2] we demonstrated that pairing a
virtual reality (VR) interface with a reinforcement learning
J. de Heuvel, N. Corral, B. Kreis, and M. Bennewitz are with the
Humanoid Robots Lab, J. Conradi and A. Driemel are with the Group for
Algorithms and Complexity, University of Bonn, Germany. M. Bennewitz
and A. Driemel are additionally with the Lamarr Institute for Machine
Learning and Artificial Intelligence, Germany. This work has partially been
funded by the Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation) under the grant number BE 4420/2-2 (FOR 2535 Anticipating
Human Behavior).
1Full video: hrl.uni-bonn.de/publications/deheuvel23iros learning.mp4
Fig. 1. Our virtual reality (VR) interface allows the demonstration of robot
navigation preferences by drawing trajectories intuitively onto the floor. By
applying a learning-based framework, we achieve personalized navigation
using a depth vision-based perception pipeline.
(RL) framework enables the demonstration and training of
highly customizable navigation behaviors. The resulting nav-
igation controller outperformed non-personalized controllers
in terms of perceived comfort and interaction experience.
However, a key assumption in the previous work is an
always-present, static human of known pose in a predefined
environment with pose-encoded obstacles. This benefits the
learning process with a low-dimensional state space. To
overcome these assumptions, enrolling a depth vision sensor
to sense both human and obstacles is a possible solution [5].
However, depth vision cameras come at the cost of high-
dimensional, complex, and redundant output. Learning from
such high-dimensional data on dynamic scenes is a chal-
lenging task [6]. The question crystallizes, how do we teach
preferences of moving users in realistic environments, while
relying on state-of-the art sensor modalities?
To solve the challenges above, we introduce a depth
vision-based perception pipeline that is both lightweight,
human-aware and, most importantly, provides the robot with
a low-dimensional representation of the dynamic scene. This
pipeline i) detects the human and obstacles, ii) compresses
the perceived depth information, and iii) enables efficient
reasoning about the robot’s dynamic environment to the
learning framework. Our new system is able to learn per-
sonalized navigation preferences from a VR interface and
learning framework for dynamic scenes in which both robot
and human move.
In summary, the main contributions of our work are:
•Learning a preference-reflecting navigation controller
that relies solely on depth vision.
•A VR demonstration framework to record navigation
arXiv:2210.01683v3 [cs.RO] 31 Jul 2023