SwarMan Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition

2025-05-02 0 0 3.93MB 6 页 10玖币
侵权投诉
SwarMan: Anthropomorphic Swarm of Drones
Avatar with Body Tracking and Deep
Learning-Based Gesture Recognition
Ahmed Baza
Digital Engineering Center
Skoltech
Moscow, Russia
ahmed.baza@skoltech.ru
Ayush Gupta
Digital Engineering Center
Skoltech
Moscow, Russia
ayush.gupta@skoltech.ru
Ekaterina Dorzhieva
Digital Engineering Center
Skoltech
Moscow, Russia
ekaterina.dorzhieva@skoltech.ru
Aleksey Fedoseev
Digital Engineering Center
Skoltech
Moscow, Russia
aleksey.fedoseev@skoltech.ru
Dzmitry Tsetserukou
Digital Engineering Center
Skoltech
Moscow, Russia
d.tsetserukou@skoltech.ru
Abstract—Anthropomorphic robot avatars present a con-
ceptually novel approach to remote affective communication,
allowing people across the world a wider specter of emotional
and social exchanges over traditional 2D and 3D image data.
However, there are several limitations of current telepresence
robots, such as the high weight, complexity of the system that
prevents its fast deployment, and the limited workspace of the
avatars mounted on either static or wheeled mobile platforms.
In this paper, we present a novel concept of telecommuni-
cation through a robot avatar based on an anthropomorphic
swarm of drones; SwarMan. The developed system consists of
nine nanocopters controlled remotely by the operator through
a gesture recognition interface. SwarMan allows operators
to communicate by directly following their motions and by
recognizing one of the prerecorded emotional patterns, thus
rendering the captured emotion as illumination on the drones.
The LSTM MediaPipe network was trained on a collected
dataset of 600 short videos with five emotional gestures. The
accuracy of achieved emotion recognition was 97% on the test
dataset.
As communication through the swarm avatar significantly
changes the visual appearance of the operator, we investigated
the ability of the users to recognize and respond to emotions
performed by the swarm of drones. The experimental results
revealed a high consistency between the users in rating emo-
tions. Additionally, users indicated low physical demand (2.25
on the Likert scale) and were satisfied with their performance
(1.38 on the Likert scale) when communicating by the SwarMan
interface.
Index Terms—human-robot interaction, telecommunication
systems, long short-term memory (LSTM) networks, multi-
agent systems, affective communication
I. INTRODUCTION
With the latest development in robotics and telepresence
technology, along with the production of robots for man-
ufacturing purposes, more attention is paid to the use of
robots in everyday life. Novel research topics are emerging
in the field of service robots, suggesting their application
as companions to improve the mental state of humans. To
The reported study was funded by RFBR and CNRS according to the
research project No. 21-58-15006.
Fig. 1. (a) User interaction with SwarMan avatar. (b) The remote avatar
performs gestures in front of the camera. (c) Point landmarks of the
recognized ”Happy” gesture.
achieve the necessary behavior complexity, robots have to
accurately determine the state of the user to establish natural
communication.
For example, Muhammad Abdullah et al. [1] developed
arXiv:2210.01487v1 [cs.RO] 4 Oct 2022
the emotion recognition system that uses the voice features
in addition to the facial expressions of a human for the robot
assistant functionality.
Companion robots can play with children and teach them,
as proposed in the research by Leite et al. [2] in which the
developed robots responded empathetically to several of the
children’s affective states. In addition to the voiced indication
of certain emotions and the body language, several papers are
focused on robots that can broadcast the emotional state by
their eyes, e.g., an eyeball robot developed by Shimizu et al.
[3].
Mobile robots are actively used as agents in teleoperation
and telepresence tasks for affective communication. Most of
these robots are designed to resemble the human body and
to perform various operations similar to a human [4]. How-
ever, telecommunication through the robotic avatar requires
delivering the robot to the working area, which often proves
challenging either due to the bulkiness of the robot or due
to the dangerous environment. The operator stations have
been equipped to organize the work of stationary robots, as
suggested in the research of Christian Lenz and Sven Behnke
[5], for telemanipulation by anthropomorphic avatar arms.
The mentioned above scenarios propose highly sufficient
robotic systems. However, the mobility of the mentioned
above robots is strictly limited by the workspace of the
robot’s upper body and the physical dimensions of the mobile
platform. Moreover, their implementation may be challeng-
ing to the user due to the high mass and relatively slow op-
eration of these systems. Meanwhile, a swarm of drones can
serve as an effective remote-control tool. Several researchers
explored applications of the robotic swarms in teleoperation,
for example, Serpiva et al. [6] with the SwarmPaint system
that utilizes a swarm of gesture-controlled drones to change
formations and paint by the light in the air. Recently, due to
the fast developments in telepresence technologies alongside
virtual and mixed reality technologies, the teleoperation of
drones avatars is suggested by Cordar et al. [7] for human
telepresence and foster empathy with virtual agents and
robots
In this paper, we propose a novel approach to the task of
telepresence, involving a swarm of drones in broadcasting
emotions from the operator to the user.
II. RELATED WORKS
Anthropomorphic robot avatars were extensively inves-
tigated and improved in recent years. Such systems as
TELESAR VI developed by Tachi et al. [8] allow dexterous
remote manipulation and communication through an avatar
designed to resemble the upper body of the human.
Several researchers investigated effective communication
through robot avatars. For example, Tsetserukou et al. [9]
explored remote affective communications and proposed the
robotic haptic device iFeel IM to augment the emotional
experience during online conversations. Bartneck et al. [10]
explored the dependence of human emotion perception on the
character’s embodiment, showing that there is no significant
difference in the perceived intensity and recognition accuracy
between robotic and screen characters. Chao-gang et al.
[11] proposed a facial emotion generation model based on
random graphs for virtual robots. A fuzzy emotion system
that controls the face and the voice modules was developed
by Vasquez et al. [12] for a tour-guide mobile robot.
Though facial expressions play a major role in emotional
recognition, the dynamic body postures could be recognized
with relatively high precision. Matsui et al. [13] proposed a
motion mapping approach to generate natural behavior for
humanoid robots by copying human gestures. Cohen et al.
[14] explored children’s reactions to the iCat and NAO robots
and achieved to design of well-recognized body postures for
NAO. The end-to-end neural network model was developed
by Yoon et al. [15] to generate sequences of human-like
gestures enhancing NAO speech content. The variational
auto-encoder framework was implemented by Marmpena et
al. [16] for generating numerous emotional body language
for the anthropomorphic Pepper robot.
III. SYSTEM ARCHITECTURE
In the developed architecture shown in Fig. 2 the user
interacts with the avatar swarm by visual interpretation of
the emotion while the avatar operator performs the various
body pose gestures to operate the avatar swarm of drones.
The tracking and localization of the drones are done through
the VICON mocap system which consists of 12 infra-red
(IR) cameras.
Fig. 2. Layout of the SwarMan system.
In the remote environment, the operator performs various
gestures which showcase different emotions. These poses are
captured by a DL-based gesture recognition algorithm which
includes the major upper-body nine landmarks which include
head, neck, left-shoulder, right-shoulder, left-elbow, right-
elbow, right-hand and left-hand. These landmarks are then
passed to the decision-making and agent allocation algorithm
where along with the localized positions of the swarm the
designated positions of the swarm of drones are calculated
according to the relative positions of the major nine joints
of the human upper body pose. The user interacts with the
swarm of drones visually to understand the emotions that
the operator was trying to perform. Along with the different
poses of the emotions, the light rings on the drones also
convey a psychological effect on the user for interpreting
the type of emotion which includes green for happy, red for
angry, white for neutral, yellow for confusion, and blue for
sad.
IV. TRAJECTORY GENERATION AND SWARM CONTROL
For a more immersive experience and intuitive control,
the operator of the avatar is controlling the swarm of drones
摘要:

SwarMan:AnthropomorphicSwarmofDronesAvatarwithBodyTrackingandDeepLearning-BasedGestureRecognitionAhmedBazaDigitalEngineeringCenterSkoltechMoscow,Russiaahmed.baza@skoltech.ruAyushGuptaDigitalEngineeringCenterSkoltechMoscow,Russiaayush.gupta@skoltech.ruEkaterinaDorzhievaDigitalEngineeringCenterSkoltec...

展开>> 收起<<
SwarMan Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:3.93MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注