Robot Learning Theory of Mind through Self-Observation Exploiting the Intentions-Beliefs Synergy Francesca Bianco1and Dimitri Ognibene21

2025-05-04 0 0 286.68KB 8 页 10玖币
侵权投诉
Robot Learning Theory of Mind through Self-Observation: Exploiting
the Intentions-Beliefs Synergy
Francesca Bianco1and Dimitri Ognibene2,1,
Abstract In complex environments, where the human sen-
sory system reaches its limits, our behaviour is strongly driven
by our beliefs about the state of the world around us. Accessing
others’ beliefs, intentions, or mental states in general, could
thus allow for more effective social interactions in natural
contexts. Yet these variables are not directly observable. Theory
of Mind (TOM), the ability to attribute to other agents’ beliefs,
intentions, or mental states in general, is a crucial feature of
human social interaction and has become of interest to the
robotics community. Recently, new models that are able to
learn TOM have been introduced. In this paper, we show the
synergy between learning to predict low-level mental states,
such as intentions and goals, and attributing high-level ones,
such as beliefs. Assuming that learning of beliefs can take place
by observing own decision and beliefs estimation processes in
partially observable environments and using a simple feed-
forward deep learning model, we show that when learning
to predict others’ intentions and actions, faster and more
accurate predictions can be acquired if beliefs attribution is
learnt simultaneously with action and intentions prediction.
We show that the learning performance improves even when
observing agents with a different decision process and is
higher when observing beliefs-driven chunks of behaviour. We
propose that our architectural approach can be relevant for the
design of future adaptive social robots that should be able to
autonomously understand and assist human partners in novel
natural environments and tasks.
I. INTRODUCTION
Due to recent technological developments, the interactions
between AI and humans have become pervasive and hetero-
geneous, extending from voice assistants or recommender
systems supporting the online experience of millions of users
to autonomous driving cars. Principled models to represent
the human collaborators’ needs are being adopted [1] while
robotic perception in complex environments is becoming
more flexible and adaptive [2]–[5] even in social contexts,
robot sensory limits are starting to be actively managed [6],
[7]. However, robots and intelligent systems still have a
limited understanding of how sensory limits affect human
partners’ behaviour and lead them to rely on internal beliefs
about the state of the world. This strongly impacts human-
robot mutual understanding [8] and calls for an effort to
transfer the advance in robot perception management to
methods to better cope with human collaborators’ perceptual
limits [9]–[11].
The possibility of introducing in robots and AI systems
a Theory of Mind (TOM) [12], the ability to attribute to
*This work was not supported by any organization
1University of Essex, Colchester, UK
2Universit`
a degli Studi di Milano Bicocca, Milano, Italy
email:dimitri.ognibene@unimib.it
other agents’ beliefs, intentions, or mental states in general,
has recently raised hopes to further improve robots’ social
skills [13]–[16]. While some studies have explored human
partners’ tendency to attribute mental states to robots [17]–
[20], the expected practical impact of TOM led to a diverse
set of TOM implementations on robots. Several implementa-
tions relied on hardwired agents and task models that could
be applied to infer mental states in settings known at design
time [21]–[25]. A step forward is presented in [26] with an
algorithm to understand unknown agents relying upon Belief-
Desire-Intention models of previously met agents.
Recently, following [27] seminal work, several models
have introduced deep learning based TOM implementations
[28]–[33]. This novel approach, learning both beliefs and in-
tention attribution, should allow improved collaboration and
adaptive human-robot collaboration in complex environments
through a better understanding of humans’ mental states. In
this paper, we explore if the data-driven approach proposed
in [27] and related works leads to improved predictions
of the partner’s intentions, which is often the mental state
with the highest impact on the interaction performance.
The prediction of partners’ intentions, even within a system
producing prediction on several others’ unobservable mental
states, such as beliefs, will still rely only on the processing of
observable behavioural inputs, aka state-action trajectories.
In a purely supervised learning setting, such as that proposed
in [27], it is not immediate why performing an additional
set of predictions, increasing the demands on the social
perception system, should result in higher accuracy for the
prediction of others’ intentions. This approach introduces
additional complexity and noise that may hinder performance
(see [34]). Moreover, deep learning models as those proposed
in [27] are usually data hungry, which may further affect the
value of the approach. These factors may be some of the
reasons for the long time required for the full development
of TOM in infants [12], [35].
While all these considerations sound technically valid,
our results with simplified versions of the architecture pro-
posed in [27] show that the original hypothesis may be
true: learning is faster and more accurate if it takes place
simultaneously for the prediction of both intentions and
beliefs together. Our results also show that the impact of
learning beliefs attribution on intention prediction is stronger
in conditions of strong partial observability, e.g. when the
observed agent does not still know where his target is.
We found that when the system learns to predict intentions
and beliefs at the same time it can better disambiguate
and discard unrelated objects that are or have been in the
arXiv:2210.09435v1 [cs.RO] 17 Oct 2022
sensory field of the observed agent. This can be particularly
relevant for assistive applications, especially those based on
egovision that can monitor the sensory state of the partners
[36]. Our hypothesis is that this is due to the regularization
effect of multitask learning when tasks don’t present conflict
but synergy [37]–[40]. Indeed the observed accuracy gain
decreases when the dataset size exceeds a certain threshold
and makes regularization less helpful. On the other side,
when very limited experience is available, the performance
is slightly worse for joint prediction, maybe explaining the
necessity for the complex and multi-system developmental
process that seems to characterize humans [12].
Another issue of the approach proposed in [27] is the
lack of training samples, as others’ mental states are usually
not available through direct behaviour observation or in
datasets for training or prediction. We propose that self-
observation, the observation of the agent’s own behaviours
and the internal decision and beliefs estimation processes can
be used as the training signal. We tested that mental states
prediction can then be generalized to different agents, even
if within certain limits.
From a cognitive modelling perspective, the proposed
architecture detaches from the motor simulation tradition
common in robotics [6], [7], [41]–[43], where the motor
control system is used for both action execution and per-
ception. In our model, the motor control system is only
the source of the reference signal for the training of the
social perception system. Among the others, this strategy has
the advantage of reducing the strong interference between
action and observation that a motor simulation account
of social perception could present [44]–[46]. It could also
reduce the computational demands and delays that could be
involved in simulating others’ actions while it is still able
to exploit the observer’s (motor or action) knowledge for
social perception [47]. The presented architecture can be
related to the associative hypothesis of social perception [48]
and in particular, to Hommel et al.s [49] interpretation, as
described in [50], which has extended the associative view
to intention interpretation and proposes that associations be-
tween behaviours (actions) and underlying intentions (effect)
develop from an early age and can be used in later in life as a
means to infer and predict others’ intentions and behaviours.
The associative account has also been previously related
to others’ understanding by proposals of its involvement
in mirror neurons development (e.g., [51]), as well as in
sensorimotor matching for imitation [52]. The architecture
proposed in the present paper differs from these accounts as
it focuses on beliefs [53]. Specifically, It relies on associa-
tions between explicit belief representations learnt through
self-experience and consequent behaviours to improve the
prediction performance of others’ beliefs-driven behaviours.
II. METHODS
A. Architecture
The agent is composed of two components (see Fig.1),
one is the Social Perception System (SPS), that interpret and
allows for predictions of observed agents behaviour, in terms
of next actions, goals and even beliefs. The other component
is the control system that define the agent behaviour to
perform tasks similar to that it interprets in the SPS. The
quantities, e.g. intentions, beliefs, actions, produced during
the performance of the task are also used to fully train
the SPS, while others’ behaviours are used only for partial
training, as others’ beliefs cannot be observed.
Social Perception System
Behavior Trajectory Chunk
Prediction Net
Next
Action
Next
State
Target
Class
Next
Beliefs
Target
Position
Agent Control System
Decision
System
Beliefs
Estimator
Beliefs
Action
Target
Class
Self
Observation
Learning?
Teaching Signal
Robot Sensors
Objects PositionMap Agents Position
YES
NO
Fig. 1. The architecture utilised in the here reported studies, formed
of a shared prediction net torso and subsequently of separate prediction
heads. For the NoBeliefs architecture, the following prediction heads are
considered: 1. Target position, 2. Actor’s next action, and 3. Actor’s next
state. For the Beliefs architecture, the 4. belief prediction head (in red) is
also considered
1) Social Perception System (SPS): The SPS’s goal is to
make predictions about the observed actors’ future behaviour,
with a specific interest on the actor’s target position. Two
types of SPS were trained: NoBelief SPS, the baseline, which
predicts an actor’s target position, target class, next action,
and next resulting state but not the beliefs; Beliefs SPS, was
asked to predict, in addition to the previous, also the actor’s
beliefs.
a) Input sensing and routing: The SPS can observe
themselves or others; the input vector for the system can
then be provided by a common reference frame to represent
either the self-localisation state of the observer, during self-
observation learning, or the physical state of the other actor,
both for learning and prediction of other actor’s behaviour.
Several architectures have studied the problem [7], [41] of
how to switch between the processing of own and others’
data and how to acquire others’ physical states. Note that in
this case this function, while important, poses less constraints
to behaviour performance as it feeds not the execution
process but the social learning one [54].
b) Input encoding and pre-processing: The inputs is
formed by a number (max 5 in the reported experiment) of
past steps of a trajectory on a single grid map. Observed
actions-states pairs are combined through a spatialisation-
concatenation operation, whereby actions are tiled over space
into a tensor and concatenated to form a single tensor of
shape (11 x 11 x 20). While 11 x 11 represents the size
of the grid world environments, 20 vectors are provided
as inputs consisting of information regarding (a) actions (9
possible actions in experiments, thus 9 vectors); (b) objects
coordinates, including the target position (4 objects, thus 4
摘要:

RobotLearningTheoryofMindthroughSelf-Observation:ExploitingtheIntentions-BeliefsSynergyFrancescaBianco1andDimitriOgnibene2;1;Abstract—Incomplexenvironments,wherethehumansen-sorysystemreachesitslimits,ourbehaviourisstronglydrivenbyourbeliefsaboutthestateoftheworldaroundus.Accessingothers'beliefs,int...

展开>> 收起<<
Robot Learning Theory of Mind through Self-Observation Exploiting the Intentions-Beliefs Synergy Francesca Bianco1and Dimitri Ognibene21.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:286.68KB 格式:PDF 时间:2025-05-04

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注