observer model is usually described as either a limitation or
a subject for future work, but - to our knowledge - still needs
a satisfactory answer. Hence, we propose SLOT-V (fig. 1),
a novel supervised learning approach to extract the observer
model from labeled robot motion trajectories and make the
following contributions:
•We present a novel framework (SLOT-V) that takes la-
beled robot trajectories as input and learns the observer
model - a mathematical representation of the user’s
preferences regarding how the robot should move.
•We provide empirical evidence that (1) SLOT-V can
learn a wide range of observer models, that (2) SLOT-
V can generalize to unseen environments with different
goal counts and/or configurations, and that (3) SLOT-
V is more sample efficient than an alternative inverse
reinforcement learning (IRL) approach.
II. RELATED WORK
A. Legibility
One of the first major approaches to modelling legibil-
ity has been proposed by Dragan et al. [6], [7] and has
sparked extensive follow up [10], [11], [12], [13], [14],
[15], [16]. The work assumes that humans expect robots
to move efficiently and that we can model this expectation
using a cost function over trajectories (the observer model).
Using this function, we can then not only compute the
most expected trajectory (called predictability), but can also
compute a trajectory that maximizes the cost of moving
towards alternative goals while still reaching the original
target, i.e., a trajectory that minimizes the expectation that
the robot moves to any alternative goal (called legibility).
A follow-up to this idea has been proposed in Nikolaidis
et al. [10] under the title viewpoint-based legibility. Here leg-
ibility is not computed in world space. Instead the trajectory
and any potential goals are first projected into a plane that
is aligned with the observers point of view and then Dragan
legibility is computed in the resulting space. This allows the
robot to take into account the human’s perspective and also
allows it to account for occlusion from the perspective of
the observer. Recently, this has been extended to multi-party
interactions [17]. Further, a similar line of thinking has been
used by Bodden et al. [8], where the authors project the
trajectory into a goal-space, a (manually) designed latent
space wherein it is easy to measure a trajectories expected
goal. Then the authors compute the score of a trajectory
as a sum of several terms of which one is the observer
model which takes the form of an integral over the distance
(measured in goal-space) between each point along the
trajectory and the target goal.
What is interesting about the aforementioned approaches
is that they all use an explicit observer model (a function that
rates/scores the legibility of a given trajectory) and that they
all engineer this observer model by hand (it is hand-crafted).
Considering that the focus of the aforementioned papers
is primarily on trajectory generation and motion-planning,
hand-crafting the observer model is adequate, as the goal
is to show the capabilities of the new planner. However,
a hand-crafted model of human expectations might not be
sophisticated enough to scale into complex environments, a
challenge that is also recognized in aforementioned works
as early as [7]. Our contribution addresses this gap and
suggests a new way to build such a (more sophisticated)
observer model.
Looking at recent works, Guletta et al. proposed HUMP
[18], a novel motion planner that generates human-like
upper-limb movement, He et al. [19] solved an inverse
kinematics problem to create legible motion, Gabert et al.
[20] used a sampling-based motion planner to create obstacle
avoidant yet legible trajectories, and Miura et al. extended the
idea of legibility to stochastic environments [21]. Looking at
legibility more generally, the idea has also been explored
in high-level task planning [22] and other discrete domains
[12], [23], [24], [25]. Other research explores legibility in
the domain of navigation and this is getting more and more
traction, likely due to a general rise of interest in autonomous
driving. Here, a common framework is HAMP [26], although
several other methods exist [27], [28], [29]. For more details,
we recommend one of the several excellent reviews on the
topic [30], [31], [32].
B. Machnine Learning in Legibility
Shifting our attention towards data-driven approaches to
legibility, we must first mention the system developed by
Zhao et al. [9]. The authors use learning from demonstra-
tion and inverse reinforcement learning to train a neural
network that, from a partial manipulator trajectory, predicts
the observer’s expected goal. This model is then used to
formulate a reward function that is used to create policies
for legible movement using reinforcement learning. Lamb
et al. [33], [34], [35], [36] assume that legible motion is
human-like and use motion capture combined with model-
identification to construct a controller that produces legible
motion. Busch et al. [37] use direct policy search on a novel
reward function that uses user feedback and asks the observer
to guess the goal. Zhou et al. [38] use a Bayesian model
to learn timings when a robot should launch a movement.
Zhang et al. [22] and Beetz et al. [39] independently explore
learning strategies for high-level task planning, and, finally,
Angelov et al. [40] propose the interesting idea of using a
causal model on the latent space of a deep auto-encoder to
learn the task specifications that make movements legible.
Most of the above approaches have in common that they
learn a policy from data and that this policy is later exploited
to produce legible behavior. This is useful, because it allows
us to learn important aspects of legibility directly from the
observer, which is potentially more accurate than building
such a policy by hand. A downside to learning a policy
this way is that we loose the ability to easily transit to
unseen environments. This works trivially for the planning
methods introduced above, but requires retraining (and often
new data) for policy-based methods. Our method does not
suffer from this limitation (which we demonstrate), because
we only learn the observer model and not a general model of