2
I. INTRODUCTION
A central challenge of multi-agent systems (MAS) is coordinating the actions of multiple
autonomous agents in time and space, to accomplish cooperative tasks and achieve joint goals
[1], [2]. Developing successful multi-agent systems requires addressing controllability
challenges [3], [4] and dealing with synchronization control [5], formation control [6], task
allocation [7] and consensus formation [8]–[10].
Research in cognitive science may provide guiding principles to address the above
challenges, by identifying the cognitive strategies that groups of individuals use to
successfully interact with each other and to make collective decisions [11]–[14]. An extensive
body of research studied how two or more people coordinate their actions in time and space
during cooperative (human-human) joint actions, such as when performing team sports,
dancing or lifting something together [15], [16]. These studies have shown that successful
joint actions engage various cognitive mechanisms, whose level of sophistication plausibly
depends on task complexity. The simplest forms of coordination and imitation in pairs or
groups of individuals, such as the joint execution of rhythmic patterns, might not require
sophisticated cognitive processing, but could use simple mechanisms of behavioral
synchronization – perhaps based on coupled dynamical systems, analogous to the
synchronization of coupled pendulums [17]. However, more sophisticated types of joint
actions go beyond the mere alignment of behavior. For example, some joint actions require
making decisions together, e.g., the decision about where to place a table that we are lifting
together. These sophisticated forms of joint actions and joint decisions might benefit from
cognitive mechanisms for mutual prediction, mental state inference, sensorimotor
communication and shared task representations [16], [18]. The cognitive mechanisms
supporting joint action have been probed by numerous experiments [19]–[29], sometimes
with the aid of conceptual [30], computational [31]–[39], and robotic [40]–[43] models.
However, there is still a paucity of models that implement advanced cognitive abilities, such
as the inference of others' plans and the alignment of task knowledge across group members,
which have been identified in empirical studies of joint action. Furthermore, it is unclear
whether and how it is possible to develop joint action models from first principles; for
example, from the perspective of a generic inference or optimization scheme that unifies
multiple cognitive mechanisms required for joint action.
We advance an innovative framework for cooperative joint action and consensus in multi-
agent systems, inspired by the cognitive framework of active inference. Active inference is a
normative theory that describes the brain as a prediction machine, which learns an internal
(generative) model of the statistical regularities of the environment – including the statistics
of social interactions – and uses it to generate predictions that guide perceptual processing
and action planning [44]. Here, we use the predictive and inferential mechanisms of active
inference to implement sophisticated forms of joint action in dyads of interacting agents. The
model presented here formalizes joint action as a process of interactive inference based on
shared task knowledge between the agents [2], [45]. We exemplify the functioning of the
model in a “joint maze” task. In the task, two agents have to navigate in a maze, to reach and
press together either a red or a blue button. Each agent has probabilistic beliefs about the
joint task that the dyad is performing, which covers his own and the other agent's
contributions (e.g., should we both press a red or a blue button?). Each agent continuously
infers what the joint task is, based on his (stronger or weaker) prior belief and the
observation of the other agent's movements towards one of the two buttons. Then, he selects
an action (red or blue button press), in a way that simultaneously fulfills two objectives. The