Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz

2025-04-29 0 0 5.52MB 9 页 10玖币

侵权投诉

Learning Depth Vision-Based Personalized Robot Navigation

From Dynamic Demonstrations in Virtual Reality

Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz

Abstract— For the best human-robot interaction experience,

the robot’s navigation policy should take into account personal

preferences of the user. In this paper, we present a learning

framework complemented by a perception pipeline to train

a depth vision-based, personalized navigation controller from

user demonstrations. Our virtual reality interface enables the

demonstration of robot navigation trajectories under motion of

the user for dynamic interaction scenarios. The novel perception

pipeline enrolls a variational autoencoder in combination with

a motion predictor. It compresses the perceived depth images

to a latent state representation to enable efﬁcient reasoning of

the learning agent about the robot’s dynamic environment. In

a detailed analysis and ablation study, we evaluate different

conﬁgurations of the perception pipeline. To further quantify

the navigation controller’s quality of personalization, we de-

velop and apply a novel metric to measure preference reﬂection

based on the Fr´

echet Distance. We discuss the robot’s navigation

performance in various virtual scenes and demonstrate the ﬁrst

personalized robot navigation controller that solely relies on

depth images. A supplemental video highlighting our approach

is available online1.

I. INTRODUCTION

The personalization of robots will be a key factor for

comfortable and satisfying human-robot-interactions. As the

integration of robots at home or at work will inevitably

increase, the number one goal should be a naturally col-

laborative experience between users and the robot. However,

users might have personal preferences about speciﬁc aspects

of the robot’s behavior that deﬁne the personal golden stan-

dard of interaction. Falling short of user’s preferences could

lead to negative interaction experiences and consequently

frustration [1].

Where humans share the same environment with a mobile

robot, the robot’s navigation behavior signiﬁcantly inﬂuences

the comfort of interaction [2], [3]. Consequently, basic

obstacle avoidance approaches are insufﬁcient to address

individual preferences regarding proxemics, trajectory shape,

or area of navigation in a given environment, while being a

key component to successful navigation without question.

Instead, a robot’s navigation policy should be aware of

humans [4] and reﬂect the users’ personal preferences.

In our previous work [2] we demonstrated that pairing a

virtual reality (VR) interface with a reinforcement learning

J. de Heuvel, N. Corral, B. Kreis, and M. Bennewitz are with the

Humanoid Robots Lab, J. Conradi and A. Driemel are with the Group for

Algorithms and Complexity, University of Bonn, Germany. M. Bennewitz

and A. Driemel are additionally with the Lamarr Institute for Machine

Learning and Artiﬁcial Intelligence, Germany. This work has partially been

funded by the Deutsche Forschungsgemeinschaft (DFG, German Research

Foundation) under the grant number BE 4420/2-2 (FOR 2535 Anticipating

Human Behavior).

1Full video: hrl.uni-bonn.de/publications/deheuvel23iros learning.mp4

Fig. 1. Our virtual reality (VR) interface allows the demonstration of robot

navigation preferences by drawing trajectories intuitively onto the ﬂoor. By

applying a learning-based framework, we achieve personalized navigation

using a depth vision-based perception pipeline.

(RL) framework enables the demonstration and training of

highly customizable navigation behaviors. The resulting nav-

igation controller outperformed non-personalized controllers

in terms of perceived comfort and interaction experience.

However, a key assumption in the previous work is an

always-present, static human of known pose in a predeﬁned

environment with pose-encoded obstacles. This beneﬁts the

learning process with a low-dimensional state space. To

overcome these assumptions, enrolling a depth vision sensor

to sense both human and obstacles is a possible solution [5].

However, depth vision cameras come at the cost of high-

dimensional, complex, and redundant output. Learning from

such high-dimensional data on dynamic scenes is a chal-

lenging task [6]. The question crystallizes, how do we teach

preferences of moving users in realistic environments, while

relying on state-of-the art sensor modalities?

To solve the challenges above, we introduce a depth

vision-based perception pipeline that is both lightweight,

human-aware and, most importantly, provides the robot with

a low-dimensional representation of the dynamic scene. This

pipeline i) detects the human and obstacles, ii) compresses

the perceived depth information, and iii) enables efﬁcient

reasoning about the robot’s dynamic environment to the

learning framework. Our new system is able to learn per-

sonalized navigation preferences from a VR interface and

learning framework for dynamic scenes in which both robot

and human move.

In summary, the main contributions of our work are:

•Learning a preference-reﬂecting navigation controller

that relies solely on depth vision.

•A VR demonstration framework to record navigation

arXiv:2210.01683v3 [cs.RO] 31 Jul 2023

Fig. 2. Schematic representation of our architecture. a) Demonstration trajectories are drawn by the user in VR onto the ﬂoor using the handheld

controller. Subsequently, the trajectories are fed into the demonstration buffer. b) Our TD3 reinforcement learning architecture with an additional behavioral

cloning (BC) loss on the actor trains a personalized navigation policy that outputs linear and angular velocities. c) The robot-centric state space relies on a

depth vision perception pipeline, capturing the vicinity of the human and obstacles in the environment, as well as the relative goal position. A variational

autoencoder (VAE) compresses the raw images to a latent state representation, while a predictor (LSTM) provides subsequent state predictions.

preferences for a dynamic human-robot scenario.

•The introduction and application of a novel metric to

quantify the quality of navigation preference reﬂection.

•An extensive qualitative and quantitative analysis of

different perception conﬁgurations for personalized nav-

igation.

II. RELATED WORK

Adjusting or learning the navigation behavior of a robot

based on feedback or demonstration has been the focus of

various studies [7], [8], [9]. Especially, deep learning-based

approaches shine by their ability to learn from subtle and

implicit features in their environment [10], [11], [12]. This

is an ideal motivation to use a deep RL architecture for our

personalized navigation controller.

Fusing the potential of user demonstrations with a learning

architecture led to promising results in the ﬁeld of robotic

manipulation tasks [13]. Therefore, this is a key concept for

our learning architecture and has successfully been applied

to the ﬁeld of robot navigation [2].

Vision-based sensor modalities for navigation appeal due

to their cost-efﬁciency. For human-aware navigation, the

detection and explicit localization of pedestrians enabled

socially conforming navigation controllers [5], [14].

Recent advances in the ﬁeld of depth vision-based naviga-

tion in combination with RL have been made by Hoeller et

al. [15], who study a state representation of depth-images

to efﬁciently learn navigation in dynamic environments. Our

proposed perception pipeline is built upon their successful

architecture.

Furthermore, a navigating agent beneﬁts from dynamic

scene understanding. Predicting the movement of surround-

ing pedestrians and obstacles with Long Short-Term Mem-

ory (LSTM) models has lead to promising results [16], [17],

[15]. Therefore, we will integrate an LSTM architecture into

our perception pipeline.

While in our previous work [2] we presented one of the

ﬁrst approaches at the intersection of navigation and robot

personalization, we now enhance the system by allowing the

user to demonstrate navigation trajectories under dynamic

motions and using only depth vision as controller input.

III. OUR APPROACH

In this work, we consider a robot navigating in the same

room as a single, human user. The user has personal prefer-

ences about the way the robot circumnavigates him/her while

pursuing a local goal in the same room. Such preferences

could lie in the approaching behavior or the robot’s trajectory.

We assume the robot to be provided a local goal from a

global planner. The local goal could be a door on the opposite

side of the current room to be traversed, or a location of

interest in the same room. Using such sparse local goals

several meters apart, we provide the controller with the

spatial and temporal freedom to navigate towards the goal

in a user-preferred personalized manner. The human shares

the navigation space with the robot, whether being dynamic

by walking through the room, or resting static. To achieve

preference-aligned and collision-free navigation behavior, the

robot relies only on a depth vision camera to sense the

distance to the human as well as obstacles. We formulate

personalized navigation as a learning task, where the robot

learns a personalized controller outputting linear and angular

velocity from VR demonstrations of the user.

A. Learning Architecture

The learning approach presented in this section is a hybrid

of reinforcement learning and behavior cloning.

RL refers to the optimization of environment interac-

tions, leading from state st→st+1 that obey a Markov

Decision Process. The interacting agent receives a reward

rt=r(st, at)for taking an action at=πϕ(st)at time step

twith respect to a policy πϕ. The tuples (st, at, rt, st+1)

are referred to as state-action pairs. The optimization goal

is to maximize the overall return R=PT

i=tγ(i−t)rtof the

γ-discounted rewards, onward from time step t.

Fig. 2 depicts a schematic overview of our approach. We

enroll an off-policy twin-delayed deep deterministic policy

gradient (TD3) reinforcement learning architecture [18]. In

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningDepthVision-BasedPersonalizedRobotNavigationFromDynamicDemonstrationsinVirtualRealityJorgedeHeuvelNathanCorralBenediktKreisJacobusConradiAnneDriemelMarenBennewitzAbstract—Forthebesthuman-robotinteractionexperience,therobot’snavigationpolicyshouldtakeintoaccountpersonalpreferencesoftheuser.In...

展开>> 收起<<

Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality Jorge de Heuvel Nathan Corral Benedikt Kreis Jacobus Conradi Anne Driemel Maren Bennewitz

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: