SLOT-V Supervised Learning of Observer Models for Legible Robot Motion Planning in Manipulation

2025-04-15 0 0 654.34KB 8 页 10玖币

侵权投诉

SLOT-V: Supervised Learning of Observer Models for Legible Robot

Motion Planning in Manipulation

Sebastian Wallk¨

otter1,2and Mohamed Chetouani3and Ginevra Castellano2

Abstract— We present SLOT-V, a novel supervised learning

framework that learns observer models (human preferences)

from robot motion trajectories in a legibility context. Legibility

measures how easily a (human) observer can infer the robot’s

goal from a robot motion trajectory. When generating such

trajectories, existing planners often rely on an observer model

that estimates the quality of trajectory candidates. These

observer models are frequently hand-crafted or, occasionally,

learned from demonstrations. Here, we propose to learn them

in a supervised manner using the same data format that

is frequently used during the evaluation of aforementioned

approaches. We then demonstrate the generality of SLOT-

V using a Franka Emika in a simulated manipulation en-

vironment. For this, we show that it can learn to closely

predict various hand-crafted observer models, i.e., that SLOT-

V’s hypothesis space encompasses existing handcrafted models.

Next, we showcase SLOT-V’s ability to generalize by showing

that a trained model continues to perform well in environments

with unseen goal conﬁgurations and/or goal counts. Finally,

we benchmark SLOT-V’s sample efﬁciency (and performance)

against an existing IRL approach and show that SLOT-V learns

better observer models with less data. Combined, these results

suggest that SLOT-V can learn viable observer models. Better

observer models imply more legible trajectories, which may -

in turn - lead to better and more transparent human-robot

interaction.

I. INTRODUCTION

Transparency, a robot’s ability to communicate any hidden

internal state, is an element of artiﬁcial intelligence and

robotics that is currently gaining in importance. For example,

the European Union (EU) stated that transparency is an

important factor to achieve trustworthy AI in its 2019 ethics

guidelines [1]. The IEEE, too, has recognized the need for

transparency in autonomous systems and made a proposal

towards its standardization [2]. Further, several other ethical

standards on the topic have stated the need for transparency

[3].

At the same time, achieving transparency on a techni-

cal/implementation level is still a very active research topic.

In artiﬁcial intelligence and machine learning, one proposed

answer is to use explainable AI (XAI) techniques, and several

promising XAI approaches have been developed in recent

years [4]. Beyond XAI, robotics complements these tech-

niques with domain-speciﬁc approaches that use a robot’s

embodiment or the situatedness of human-robot interaction

scenarios [5].

1sebastian.wallkotter@it.uu.se

2Department of Information Technology, Uppsala University, Uppsala,

Sweden

3Institute for Intelligent Systems and Robotics, Sorbonne University,

CNRS UMR 7222, Paris, France

Fig. 1. Schematic overview of SLOT-V. (bottom:) SLOT-V computes

the legibility of a trajectory by calling an observer model individually for

each potential goal in the environment. (middle:) Assuming a (given) target

goal, the observer model computes a trajectory score by applying a value

function individually to each control point of a trajectory. (top:) The value

function (here: a feed-forward neural network) estimates how legible it is to

move through a given position to reach a given target goal. This is similar

to the idea of a value function in RL (hence the name); however, SLOT-V

is purely supervised.

One such approach to achieving transparency that is

unique to robots goes by the name legibility [6], [7]. Legi-

bility considers the scenario where a robot performs a goal-

oriented movement under human supervision (referred to

as the observer). In this scenario - which is typically a

manipulation scenario -, the robot ought to alter its movement

to communicate the intended goal and disambiguate it from

alternative goals so that the observer - who is uncertain

about the true goal - can quickly asses the situation. Com-

puting such a legible trajectory requires specialized motion

planning and several authors have suggested frameworks

to accomplish this [8], [9], [10]. A common trend among

these frameworks is that they construct a mathematical

model of the observer’s expectations (the observer model)

to vet candidate trajectories, which makes choosing a good

observer model crucial to the framework success.

Here, we take a closer look at the observer models used

in legibility and address a commonly faced limitation of

them, namely that they are frequently based on a researcher’s

intuition (hand-crafted). This hand-crafted nature of the

arXiv:2210.01412v1 [cs.RO] 4 Oct 2022

observer model is usually described as either a limitation or

a subject for future work, but - to our knowledge - still needs

a satisfactory answer. Hence, we propose SLOT-V (ﬁg. 1),

a novel supervised learning approach to extract the observer

model from labeled robot motion trajectories and make the

following contributions:

•We present a novel framework (SLOT-V) that takes la-

beled robot trajectories as input and learns the observer

model - a mathematical representation of the user’s

preferences regarding how the robot should move.

•We provide empirical evidence that (1) SLOT-V can

learn a wide range of observer models, that (2) SLOT-

V can generalize to unseen environments with different

goal counts and/or conﬁgurations, and that (3) SLOT-

V is more sample efﬁcient than an alternative inverse

reinforcement learning (IRL) approach.

II. RELATED WORK

A. Legibility

One of the ﬁrst major approaches to modelling legibil-

ity has been proposed by Dragan et al. [6], [7] and has

sparked extensive follow up [10], [11], [12], [13], [14],

[15], [16]. The work assumes that humans expect robots

to move efﬁciently and that we can model this expectation

using a cost function over trajectories (the observer model).

Using this function, we can then not only compute the

most expected trajectory (called predictability), but can also

compute a trajectory that maximizes the cost of moving

towards alternative goals while still reaching the original

target, i.e., a trajectory that minimizes the expectation that

the robot moves to any alternative goal (called legibility).

A follow-up to this idea has been proposed in Nikolaidis

et al. [10] under the title viewpoint-based legibility. Here leg-

ibility is not computed in world space. Instead the trajectory

and any potential goals are ﬁrst projected into a plane that

is aligned with the observers point of view and then Dragan

legibility is computed in the resulting space. This allows the

robot to take into account the human’s perspective and also

allows it to account for occlusion from the perspective of

the observer. Recently, this has been extended to multi-party

interactions [17]. Further, a similar line of thinking has been

used by Bodden et al. [8], where the authors project the

trajectory into a goal-space, a (manually) designed latent

space wherein it is easy to measure a trajectories expected

goal. Then the authors compute the score of a trajectory

as a sum of several terms of which one is the observer

model which takes the form of an integral over the distance

(measured in goal-space) between each point along the

trajectory and the target goal.

What is interesting about the aforementioned approaches

is that they all use an explicit observer model (a function that

rates/scores the legibility of a given trajectory) and that they

all engineer this observer model by hand (it is hand-crafted).

Considering that the focus of the aforementioned papers

is primarily on trajectory generation and motion-planning,

hand-crafting the observer model is adequate, as the goal

is to show the capabilities of the new planner. However,

a hand-crafted model of human expectations might not be

sophisticated enough to scale into complex environments, a

challenge that is also recognized in aforementioned works

as early as [7]. Our contribution addresses this gap and

suggests a new way to build such a (more sophisticated)

observer model.

Looking at recent works, Guletta et al. proposed HUMP

[18], a novel motion planner that generates human-like

upper-limb movement, He et al. [19] solved an inverse

kinematics problem to create legible motion, Gabert et al.

[20] used a sampling-based motion planner to create obstacle

avoidant yet legible trajectories, and Miura et al. extended the

idea of legibility to stochastic environments [21]. Looking at

legibility more generally, the idea has also been explored

in high-level task planning [22] and other discrete domains

[12], [23], [24], [25]. Other research explores legibility in

the domain of navigation and this is getting more and more

traction, likely due to a general rise of interest in autonomous

driving. Here, a common framework is HAMP [26], although

several other methods exist [27], [28], [29]. For more details,

we recommend one of the several excellent reviews on the

topic [30], [31], [32].

B. Machnine Learning in Legibility

Shifting our attention towards data-driven approaches to

legibility, we must ﬁrst mention the system developed by

Zhao et al. [9]. The authors use learning from demonstra-

tion and inverse reinforcement learning to train a neural

network that, from a partial manipulator trajectory, predicts

the observer’s expected goal. This model is then used to

formulate a reward function that is used to create policies

for legible movement using reinforcement learning. Lamb

et al. [33], [34], [35], [36] assume that legible motion is

human-like and use motion capture combined with model-

identiﬁcation to construct a controller that produces legible

motion. Busch et al. [37] use direct policy search on a novel

reward function that uses user feedback and asks the observer

to guess the goal. Zhou et al. [38] use a Bayesian model

to learn timings when a robot should launch a movement.

Zhang et al. [22] and Beetz et al. [39] independently explore

learning strategies for high-level task planning, and, ﬁnally,

Angelov et al. [40] propose the interesting idea of using a

causal model on the latent space of a deep auto-encoder to

learn the task speciﬁcations that make movements legible.

Most of the above approaches have in common that they

learn a policy from data and that this policy is later exploited

to produce legible behavior. This is useful, because it allows

us to learn important aspects of legibility directly from the

observer, which is potentially more accurate than building

such a policy by hand. A downside to learning a policy

this way is that we loose the ability to easily transit to

unseen environments. This works trivially for the planning

methods introduced above, but requires retraining (and often

new data) for policy-based methods. Our method does not

suffer from this limitation (which we demonstrate), because

we only learn the observer model and not a general model of

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SLOT-V:SupervisedLearningofObserverModelsforLegibleRobotMotionPlanninginManipulationSebastianWallk¨otter1;2andMohamedChetouani3andGinevraCastellano2AbstractWepresentSLOT-V,anovelsupervisedlearningframeworkthatlearnsobservermodels(humanpreferences)fromrobotmotiontrajectoriesinalegibilitycontext.Legi...

展开>> 收起<<

SLOT-V Supervised Learning of Observer Models for Legible Robot Motion Planning in Manipulation.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SLOT-V Supervised Learning of Observer Models for Legible Robot Motion Planning in Manipulation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: