Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz

2025-04-29 0 0 5.52MB 9 页 10玖币
侵权投诉
Learning Depth Vision-Based Personalized Robot Navigation
From Dynamic Demonstrations in Virtual Reality
Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz
Abstract For the best human-robot interaction experience,
the robot’s navigation policy should take into account personal
preferences of the user. In this paper, we present a learning
framework complemented by a perception pipeline to train
a depth vision-based, personalized navigation controller from
user demonstrations. Our virtual reality interface enables the
demonstration of robot navigation trajectories under motion of
the user for dynamic interaction scenarios. The novel perception
pipeline enrolls a variational autoencoder in combination with
a motion predictor. It compresses the perceived depth images
to a latent state representation to enable efficient reasoning of
the learning agent about the robot’s dynamic environment. In
a detailed analysis and ablation study, we evaluate different
configurations of the perception pipeline. To further quantify
the navigation controller’s quality of personalization, we de-
velop and apply a novel metric to measure preference reflection
based on the Fr´
echet Distance. We discuss the robot’s navigation
performance in various virtual scenes and demonstrate the first
personalized robot navigation controller that solely relies on
depth images. A supplemental video highlighting our approach
is available online1.
I. INTRODUCTION
The personalization of robots will be a key factor for
comfortable and satisfying human-robot-interactions. As the
integration of robots at home or at work will inevitably
increase, the number one goal should be a naturally col-
laborative experience between users and the robot. However,
users might have personal preferences about specific aspects
of the robot’s behavior that define the personal golden stan-
dard of interaction. Falling short of user’s preferences could
lead to negative interaction experiences and consequently
frustration [1].
Where humans share the same environment with a mobile
robot, the robot’s navigation behavior significantly influences
the comfort of interaction [2], [3]. Consequently, basic
obstacle avoidance approaches are insufficient to address
individual preferences regarding proxemics, trajectory shape,
or area of navigation in a given environment, while being a
key component to successful navigation without question.
Instead, a robot’s navigation policy should be aware of
humans [4] and reflect the users’ personal preferences.
In our previous work [2] we demonstrated that pairing a
virtual reality (VR) interface with a reinforcement learning
J. de Heuvel, N. Corral, B. Kreis, and M. Bennewitz are with the
Humanoid Robots Lab, J. Conradi and A. Driemel are with the Group for
Algorithms and Complexity, University of Bonn, Germany. M. Bennewitz
and A. Driemel are additionally with the Lamarr Institute for Machine
Learning and Artificial Intelligence, Germany. This work has partially been
funded by the Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation) under the grant number BE 4420/2-2 (FOR 2535 Anticipating
Human Behavior).
1Full video: hrl.uni-bonn.de/publications/deheuvel23iros learning.mp4
Fig. 1. Our virtual reality (VR) interface allows the demonstration of robot
navigation preferences by drawing trajectories intuitively onto the floor. By
applying a learning-based framework, we achieve personalized navigation
using a depth vision-based perception pipeline.
(RL) framework enables the demonstration and training of
highly customizable navigation behaviors. The resulting nav-
igation controller outperformed non-personalized controllers
in terms of perceived comfort and interaction experience.
However, a key assumption in the previous work is an
always-present, static human of known pose in a predefined
environment with pose-encoded obstacles. This benefits the
learning process with a low-dimensional state space. To
overcome these assumptions, enrolling a depth vision sensor
to sense both human and obstacles is a possible solution [5].
However, depth vision cameras come at the cost of high-
dimensional, complex, and redundant output. Learning from
such high-dimensional data on dynamic scenes is a chal-
lenging task [6]. The question crystallizes, how do we teach
preferences of moving users in realistic environments, while
relying on state-of-the art sensor modalities?
To solve the challenges above, we introduce a depth
vision-based perception pipeline that is both lightweight,
human-aware and, most importantly, provides the robot with
a low-dimensional representation of the dynamic scene. This
pipeline i) detects the human and obstacles, ii) compresses
the perceived depth information, and iii) enables efficient
reasoning about the robot’s dynamic environment to the
learning framework. Our new system is able to learn per-
sonalized navigation preferences from a VR interface and
learning framework for dynamic scenes in which both robot
and human move.
In summary, the main contributions of our work are:
Learning a preference-reflecting navigation controller
that relies solely on depth vision.
A VR demonstration framework to record navigation
arXiv:2210.01683v3 [cs.RO] 31 Jul 2023
Fig. 2. Schematic representation of our architecture. a) Demonstration trajectories are drawn by the user in VR onto the floor using the handheld
controller. Subsequently, the trajectories are fed into the demonstration buffer. b) Our TD3 reinforcement learning architecture with an additional behavioral
cloning (BC) loss on the actor trains a personalized navigation policy that outputs linear and angular velocities. c) The robot-centric state space relies on a
depth vision perception pipeline, capturing the vicinity of the human and obstacles in the environment, as well as the relative goal position. A variational
autoencoder (VAE) compresses the raw images to a latent state representation, while a predictor (LSTM) provides subsequent state predictions.
preferences for a dynamic human-robot scenario.
The introduction and application of a novel metric to
quantify the quality of navigation preference reflection.
An extensive qualitative and quantitative analysis of
different perception configurations for personalized nav-
igation.
II. RELATED WORK
Adjusting or learning the navigation behavior of a robot
based on feedback or demonstration has been the focus of
various studies [7], [8], [9]. Especially, deep learning-based
approaches shine by their ability to learn from subtle and
implicit features in their environment [10], [11], [12]. This
is an ideal motivation to use a deep RL architecture for our
personalized navigation controller.
Fusing the potential of user demonstrations with a learning
architecture led to promising results in the field of robotic
manipulation tasks [13]. Therefore, this is a key concept for
our learning architecture and has successfully been applied
to the field of robot navigation [2].
Vision-based sensor modalities for navigation appeal due
to their cost-efficiency. For human-aware navigation, the
detection and explicit localization of pedestrians enabled
socially conforming navigation controllers [5], [14].
Recent advances in the field of depth vision-based naviga-
tion in combination with RL have been made by Hoeller et
al. [15], who study a state representation of depth-images
to efficiently learn navigation in dynamic environments. Our
proposed perception pipeline is built upon their successful
architecture.
Furthermore, a navigating agent benefits from dynamic
scene understanding. Predicting the movement of surround-
ing pedestrians and obstacles with Long Short-Term Mem-
ory (LSTM) models has lead to promising results [16], [17],
[15]. Therefore, we will integrate an LSTM architecture into
our perception pipeline.
While in our previous work [2] we presented one of the
first approaches at the intersection of navigation and robot
personalization, we now enhance the system by allowing the
user to demonstrate navigation trajectories under dynamic
motions and using only depth vision as controller input.
III. OUR APPROACH
In this work, we consider a robot navigating in the same
room as a single, human user. The user has personal prefer-
ences about the way the robot circumnavigates him/her while
pursuing a local goal in the same room. Such preferences
could lie in the approaching behavior or the robot’s trajectory.
We assume the robot to be provided a local goal from a
global planner. The local goal could be a door on the opposite
side of the current room to be traversed, or a location of
interest in the same room. Using such sparse local goals
several meters apart, we provide the controller with the
spatial and temporal freedom to navigate towards the goal
in a user-preferred personalized manner. The human shares
the navigation space with the robot, whether being dynamic
by walking through the room, or resting static. To achieve
preference-aligned and collision-free navigation behavior, the
robot relies only on a depth vision camera to sense the
distance to the human as well as obstacles. We formulate
personalized navigation as a learning task, where the robot
learns a personalized controller outputting linear and angular
velocity from VR demonstrations of the user.
A. Learning Architecture
The learning approach presented in this section is a hybrid
of reinforcement learning and behavior cloning.
RL refers to the optimization of environment interac-
tions, leading from state stst+1 that obey a Markov
Decision Process. The interacting agent receives a reward
rt=r(st, at)for taking an action at=πϕ(st)at time step
twith respect to a policy πϕ. The tuples (st, at, rt, st+1)
are referred to as state-action pairs. The optimization goal
is to maximize the overall return R=PT
i=tγ(it)rtof the
γ-discounted rewards, onward from time step t.
Fig. 2 depicts a schematic overview of our approach. We
enroll an off-policy twin-delayed deep deterministic policy
gradient (TD3) reinforcement learning architecture [18]. In
摘要:

LearningDepthVision-BasedPersonalizedRobotNavigationFromDynamicDemonstrationsinVirtualRealityJorgedeHeuvelNathanCorralBenediktKreisJacobusConradiAnneDriemelMarenBennewitzAbstract—Forthebesthuman-robotinteractionexperience,therobot’snavigationpolicyshouldtakeintoaccountpersonalpreferencesoftheuser.In...

展开>> 收起<<
Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:5.52MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注