SLOT-V Supervised Learning of Observer Models for Legible Robot Motion Planning in Manipulation

2025-04-15 0 0 654.34KB 8 页 10玖币
侵权投诉
SLOT-V: Supervised Learning of Observer Models for Legible Robot
Motion Planning in Manipulation
Sebastian Wallk¨
otter1,2and Mohamed Chetouani3and Ginevra Castellano2
Abstract We present SLOT-V, a novel supervised learning
framework that learns observer models (human preferences)
from robot motion trajectories in a legibility context. Legibility
measures how easily a (human) observer can infer the robot’s
goal from a robot motion trajectory. When generating such
trajectories, existing planners often rely on an observer model
that estimates the quality of trajectory candidates. These
observer models are frequently hand-crafted or, occasionally,
learned from demonstrations. Here, we propose to learn them
in a supervised manner using the same data format that
is frequently used during the evaluation of aforementioned
approaches. We then demonstrate the generality of SLOT-
V using a Franka Emika in a simulated manipulation en-
vironment. For this, we show that it can learn to closely
predict various hand-crafted observer models, i.e., that SLOT-
V’s hypothesis space encompasses existing handcrafted models.
Next, we showcase SLOT-V’s ability to generalize by showing
that a trained model continues to perform well in environments
with unseen goal configurations and/or goal counts. Finally,
we benchmark SLOT-V’s sample efficiency (and performance)
against an existing IRL approach and show that SLOT-V learns
better observer models with less data. Combined, these results
suggest that SLOT-V can learn viable observer models. Better
observer models imply more legible trajectories, which may -
in turn - lead to better and more transparent human-robot
interaction.
I. INTRODUCTION
Transparency, a robot’s ability to communicate any hidden
internal state, is an element of artificial intelligence and
robotics that is currently gaining in importance. For example,
the European Union (EU) stated that transparency is an
important factor to achieve trustworthy AI in its 2019 ethics
guidelines [1]. The IEEE, too, has recognized the need for
transparency in autonomous systems and made a proposal
towards its standardization [2]. Further, several other ethical
standards on the topic have stated the need for transparency
[3].
At the same time, achieving transparency on a techni-
cal/implementation level is still a very active research topic.
In artificial intelligence and machine learning, one proposed
answer is to use explainable AI (XAI) techniques, and several
promising XAI approaches have been developed in recent
years [4]. Beyond XAI, robotics complements these tech-
niques with domain-specific approaches that use a robot’s
embodiment or the situatedness of human-robot interaction
scenarios [5].
1sebastian.wallkotter@it.uu.se
2Department of Information Technology, Uppsala University, Uppsala,
Sweden
3Institute for Intelligent Systems and Robotics, Sorbonne University,
CNRS UMR 7222, Paris, France
Fig. 1. Schematic overview of SLOT-V. (bottom:) SLOT-V computes
the legibility of a trajectory by calling an observer model individually for
each potential goal in the environment. (middle:) Assuming a (given) target
goal, the observer model computes a trajectory score by applying a value
function individually to each control point of a trajectory. (top:) The value
function (here: a feed-forward neural network) estimates how legible it is to
move through a given position to reach a given target goal. This is similar
to the idea of a value function in RL (hence the name); however, SLOT-V
is purely supervised.
One such approach to achieving transparency that is
unique to robots goes by the name legibility [6], [7]. Legi-
bility considers the scenario where a robot performs a goal-
oriented movement under human supervision (referred to
as the observer). In this scenario - which is typically a
manipulation scenario -, the robot ought to alter its movement
to communicate the intended goal and disambiguate it from
alternative goals so that the observer - who is uncertain
about the true goal - can quickly asses the situation. Com-
puting such a legible trajectory requires specialized motion
planning and several authors have suggested frameworks
to accomplish this [8], [9], [10]. A common trend among
these frameworks is that they construct a mathematical
model of the observer’s expectations (the observer model)
to vet candidate trajectories, which makes choosing a good
observer model crucial to the framework success.
Here, we take a closer look at the observer models used
in legibility and address a commonly faced limitation of
them, namely that they are frequently based on a researcher’s
intuition (hand-crafted). This hand-crafted nature of the
arXiv:2210.01412v1 [cs.RO] 4 Oct 2022
observer model is usually described as either a limitation or
a subject for future work, but - to our knowledge - still needs
a satisfactory answer. Hence, we propose SLOT-V (fig. 1),
a novel supervised learning approach to extract the observer
model from labeled robot motion trajectories and make the
following contributions:
We present a novel framework (SLOT-V) that takes la-
beled robot trajectories as input and learns the observer
model - a mathematical representation of the user’s
preferences regarding how the robot should move.
We provide empirical evidence that (1) SLOT-V can
learn a wide range of observer models, that (2) SLOT-
V can generalize to unseen environments with different
goal counts and/or configurations, and that (3) SLOT-
V is more sample efficient than an alternative inverse
reinforcement learning (IRL) approach.
II. RELATED WORK
A. Legibility
One of the first major approaches to modelling legibil-
ity has been proposed by Dragan et al. [6], [7] and has
sparked extensive follow up [10], [11], [12], [13], [14],
[15], [16]. The work assumes that humans expect robots
to move efficiently and that we can model this expectation
using a cost function over trajectories (the observer model).
Using this function, we can then not only compute the
most expected trajectory (called predictability), but can also
compute a trajectory that maximizes the cost of moving
towards alternative goals while still reaching the original
target, i.e., a trajectory that minimizes the expectation that
the robot moves to any alternative goal (called legibility).
A follow-up to this idea has been proposed in Nikolaidis
et al. [10] under the title viewpoint-based legibility. Here leg-
ibility is not computed in world space. Instead the trajectory
and any potential goals are first projected into a plane that
is aligned with the observers point of view and then Dragan
legibility is computed in the resulting space. This allows the
robot to take into account the human’s perspective and also
allows it to account for occlusion from the perspective of
the observer. Recently, this has been extended to multi-party
interactions [17]. Further, a similar line of thinking has been
used by Bodden et al. [8], where the authors project the
trajectory into a goal-space, a (manually) designed latent
space wherein it is easy to measure a trajectories expected
goal. Then the authors compute the score of a trajectory
as a sum of several terms of which one is the observer
model which takes the form of an integral over the distance
(measured in goal-space) between each point along the
trajectory and the target goal.
What is interesting about the aforementioned approaches
is that they all use an explicit observer model (a function that
rates/scores the legibility of a given trajectory) and that they
all engineer this observer model by hand (it is hand-crafted).
Considering that the focus of the aforementioned papers
is primarily on trajectory generation and motion-planning,
hand-crafting the observer model is adequate, as the goal
is to show the capabilities of the new planner. However,
a hand-crafted model of human expectations might not be
sophisticated enough to scale into complex environments, a
challenge that is also recognized in aforementioned works
as early as [7]. Our contribution addresses this gap and
suggests a new way to build such a (more sophisticated)
observer model.
Looking at recent works, Guletta et al. proposed HUMP
[18], a novel motion planner that generates human-like
upper-limb movement, He et al. [19] solved an inverse
kinematics problem to create legible motion, Gabert et al.
[20] used a sampling-based motion planner to create obstacle
avoidant yet legible trajectories, and Miura et al. extended the
idea of legibility to stochastic environments [21]. Looking at
legibility more generally, the idea has also been explored
in high-level task planning [22] and other discrete domains
[12], [23], [24], [25]. Other research explores legibility in
the domain of navigation and this is getting more and more
traction, likely due to a general rise of interest in autonomous
driving. Here, a common framework is HAMP [26], although
several other methods exist [27], [28], [29]. For more details,
we recommend one of the several excellent reviews on the
topic [30], [31], [32].
B. Machnine Learning in Legibility
Shifting our attention towards data-driven approaches to
legibility, we must first mention the system developed by
Zhao et al. [9]. The authors use learning from demonstra-
tion and inverse reinforcement learning to train a neural
network that, from a partial manipulator trajectory, predicts
the observer’s expected goal. This model is then used to
formulate a reward function that is used to create policies
for legible movement using reinforcement learning. Lamb
et al. [33], [34], [35], [36] assume that legible motion is
human-like and use motion capture combined with model-
identification to construct a controller that produces legible
motion. Busch et al. [37] use direct policy search on a novel
reward function that uses user feedback and asks the observer
to guess the goal. Zhou et al. [38] use a Bayesian model
to learn timings when a robot should launch a movement.
Zhang et al. [22] and Beetz et al. [39] independently explore
learning strategies for high-level task planning, and, finally,
Angelov et al. [40] propose the interesting idea of using a
causal model on the latent space of a deep auto-encoder to
learn the task specifications that make movements legible.
Most of the above approaches have in common that they
learn a policy from data and that this policy is later exploited
to produce legible behavior. This is useful, because it allows
us to learn important aspects of legibility directly from the
observer, which is potentially more accurate than building
such a policy by hand. A downside to learning a policy
this way is that we loose the ability to easily transit to
unseen environments. This works trivially for the planning
methods introduced above, but requires retraining (and often
new data) for policy-based methods. Our method does not
suffer from this limitation (which we demonstrate), because
we only learn the observer model and not a general model of
摘要:

SLOT-V:SupervisedLearningofObserverModelsforLegibleRobotMotionPlanninginManipulationSebastianWallk¨otter1;2andMohamedChetouani3andGinevraCastellano2Abstract—WepresentSLOT-V,anovelsupervisedlearningframeworkthatlearnsobservermodels(humanpreferences)fromrobotmotiontrajectoriesinalegibilitycontext.Legi...

展开>> 收起<<
SLOT-V Supervised Learning of Observer Models for Legible Robot Motion Planning in Manipulation.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:学术论文 价格:10玖币 属性:8 页 大小:654.34KB 格式:PDF 时间:2025-04-15

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注