
the emotion recognition system that uses the voice features
in addition to the facial expressions of a human for the robot
assistant functionality.
Companion robots can play with children and teach them,
as proposed in the research by Leite et al. [2] in which the
developed robots responded empathetically to several of the
children’s affective states. In addition to the voiced indication
of certain emotions and the body language, several papers are
focused on robots that can broadcast the emotional state by
their eyes, e.g., an eyeball robot developed by Shimizu et al.
[3].
Mobile robots are actively used as agents in teleoperation
and telepresence tasks for affective communication. Most of
these robots are designed to resemble the human body and
to perform various operations similar to a human [4]. How-
ever, telecommunication through the robotic avatar requires
delivering the robot to the working area, which often proves
challenging either due to the bulkiness of the robot or due
to the dangerous environment. The operator stations have
been equipped to organize the work of stationary robots, as
suggested in the research of Christian Lenz and Sven Behnke
[5], for telemanipulation by anthropomorphic avatar arms.
The mentioned above scenarios propose highly sufficient
robotic systems. However, the mobility of the mentioned
above robots is strictly limited by the workspace of the
robot’s upper body and the physical dimensions of the mobile
platform. Moreover, their implementation may be challeng-
ing to the user due to the high mass and relatively slow op-
eration of these systems. Meanwhile, a swarm of drones can
serve as an effective remote-control tool. Several researchers
explored applications of the robotic swarms in teleoperation,
for example, Serpiva et al. [6] with the SwarmPaint system
that utilizes a swarm of gesture-controlled drones to change
formations and paint by the light in the air. Recently, due to
the fast developments in telepresence technologies alongside
virtual and mixed reality technologies, the teleoperation of
drones avatars is suggested by Cordar et al. [7] for human
telepresence and foster empathy with virtual agents and
robots
In this paper, we propose a novel approach to the task of
telepresence, involving a swarm of drones in broadcasting
emotions from the operator to the user.
II. RELATED WORKS
Anthropomorphic robot avatars were extensively inves-
tigated and improved in recent years. Such systems as
TELESAR VI developed by Tachi et al. [8] allow dexterous
remote manipulation and communication through an avatar
designed to resemble the upper body of the human.
Several researchers investigated effective communication
through robot avatars. For example, Tsetserukou et al. [9]
explored remote affective communications and proposed the
robotic haptic device iFeel IM to augment the emotional
experience during online conversations. Bartneck et al. [10]
explored the dependence of human emotion perception on the
character’s embodiment, showing that there is no significant
difference in the perceived intensity and recognition accuracy
between robotic and screen characters. Chao-gang et al.
[11] proposed a facial emotion generation model based on
random graphs for virtual robots. A fuzzy emotion system
that controls the face and the voice modules was developed
by Vasquez et al. [12] for a tour-guide mobile robot.
Though facial expressions play a major role in emotional
recognition, the dynamic body postures could be recognized
with relatively high precision. Matsui et al. [13] proposed a
motion mapping approach to generate natural behavior for
humanoid robots by copying human gestures. Cohen et al.
[14] explored children’s reactions to the iCat and NAO robots
and achieved to design of well-recognized body postures for
NAO. The end-to-end neural network model was developed
by Yoon et al. [15] to generate sequences of human-like
gestures enhancing NAO speech content. The variational
auto-encoder framework was implemented by Marmpena et
al. [16] for generating numerous emotional body language
for the anthropomorphic Pepper robot.
III. SYSTEM ARCHITECTURE
In the developed architecture shown in Fig. 2 the user
interacts with the avatar swarm by visual interpretation of
the emotion while the avatar operator performs the various
body pose gestures to operate the avatar swarm of drones.
The tracking and localization of the drones are done through
the VICON mocap system which consists of 12 infra-red
(IR) cameras.
Fig. 2. Layout of the SwarMan system.
In the remote environment, the operator performs various
gestures which showcase different emotions. These poses are
captured by a DL-based gesture recognition algorithm which
includes the major upper-body nine landmarks which include
head, neck, left-shoulder, right-shoulder, left-elbow, right-
elbow, right-hand and left-hand. These landmarks are then
passed to the decision-making and agent allocation algorithm
where along with the localized positions of the swarm the
designated positions of the swarm of drones are calculated
according to the relative positions of the major nine joints
of the human upper body pose. The user interacts with the
swarm of drones visually to understand the emotions that
the operator was trying to perform. Along with the different
poses of the emotions, the light rings on the drones also
convey a psychological effect on the user for interpreting
the type of emotion which includes green for happy, red for
angry, white for neutral, yellow for confusion, and blue for
sad.
IV. TRAJECTORY GENERATION AND SWARM CONTROL
For a more immersive experience and intuitive control,
the operator of the avatar is controlling the swarm of drones