SwarMan Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition

2025-05-02 0 0 3.93MB 6 页 10玖币

侵权投诉

SwarMan: Anthropomorphic Swarm of Drones

Avatar with Body Tracking and Deep

Learning-Based Gesture Recognition

Ahmed Baza

Digital Engineering Center

Skoltech

Moscow, Russia

ahmed.baza@skoltech.ru

Ayush Gupta

Digital Engineering Center

Skoltech

Moscow, Russia

ayush.gupta@skoltech.ru

Ekaterina Dorzhieva

Digital Engineering Center

Skoltech

Moscow, Russia

ekaterina.dorzhieva@skoltech.ru

Aleksey Fedoseev

Digital Engineering Center

Skoltech

Moscow, Russia

aleksey.fedoseev@skoltech.ru

Dzmitry Tsetserukou

Digital Engineering Center

Skoltech

Moscow, Russia

d.tsetserukou@skoltech.ru

Abstract—Anthropomorphic robot avatars present a con-

ceptually novel approach to remote affective communication,

allowing people across the world a wider specter of emotional

and social exchanges over traditional 2D and 3D image data.

However, there are several limitations of current telepresence

robots, such as the high weight, complexity of the system that

prevents its fast deployment, and the limited workspace of the

avatars mounted on either static or wheeled mobile platforms.

In this paper, we present a novel concept of telecommuni-

cation through a robot avatar based on an anthropomorphic

swarm of drones; SwarMan. The developed system consists of

nine nanocopters controlled remotely by the operator through

a gesture recognition interface. SwarMan allows operators

to communicate by directly following their motions and by

recognizing one of the prerecorded emotional patterns, thus

rendering the captured emotion as illumination on the drones.

The LSTM MediaPipe network was trained on a collected

dataset of 600 short videos with ﬁve emotional gestures. The

accuracy of achieved emotion recognition was 97% on the test

dataset.

As communication through the swarm avatar signiﬁcantly

changes the visual appearance of the operator, we investigated

the ability of the users to recognize and respond to emotions

performed by the swarm of drones. The experimental results

revealed a high consistency between the users in rating emo-

tions. Additionally, users indicated low physical demand (2.25

on the Likert scale) and were satisﬁed with their performance

(1.38 on the Likert scale) when communicating by the SwarMan

interface.

Index Terms—human-robot interaction, telecommunication

systems, long short-term memory (LSTM) networks, multi-

agent systems, affective communication

I. INTRODUCTION

With the latest development in robotics and telepresence

technology, along with the production of robots for man-

ufacturing purposes, more attention is paid to the use of

robots in everyday life. Novel research topics are emerging

in the ﬁeld of service robots, suggesting their application

as companions to improve the mental state of humans. To

The reported study was funded by RFBR and CNRS according to the

research project No. 21-58-15006.

Fig. 1. (a) User interaction with SwarMan avatar. (b) The remote avatar

performs gestures in front of the camera. (c) Point landmarks of the

recognized ”Happy” gesture.

achieve the necessary behavior complexity, robots have to

accurately determine the state of the user to establish natural

communication.

For example, Muhammad Abdullah et al. [1] developed

arXiv:2210.01487v1 [cs.RO] 4 Oct 2022

the emotion recognition system that uses the voice features

in addition to the facial expressions of a human for the robot

assistant functionality.

Companion robots can play with children and teach them,

as proposed in the research by Leite et al. [2] in which the

developed robots responded empathetically to several of the

children’s affective states. In addition to the voiced indication

of certain emotions and the body language, several papers are

focused on robots that can broadcast the emotional state by

their eyes, e.g., an eyeball robot developed by Shimizu et al.

[3].

Mobile robots are actively used as agents in teleoperation

and telepresence tasks for affective communication. Most of

these robots are designed to resemble the human body and

to perform various operations similar to a human [4]. How-

ever, telecommunication through the robotic avatar requires

delivering the robot to the working area, which often proves

challenging either due to the bulkiness of the robot or due

to the dangerous environment. The operator stations have

been equipped to organize the work of stationary robots, as

suggested in the research of Christian Lenz and Sven Behnke

[5], for telemanipulation by anthropomorphic avatar arms.

The mentioned above scenarios propose highly sufﬁcient

robotic systems. However, the mobility of the mentioned

above robots is strictly limited by the workspace of the

robot’s upper body and the physical dimensions of the mobile

platform. Moreover, their implementation may be challeng-

ing to the user due to the high mass and relatively slow op-

eration of these systems. Meanwhile, a swarm of drones can

serve as an effective remote-control tool. Several researchers

explored applications of the robotic swarms in teleoperation,

for example, Serpiva et al. [6] with the SwarmPaint system

that utilizes a swarm of gesture-controlled drones to change

formations and paint by the light in the air. Recently, due to

the fast developments in telepresence technologies alongside

virtual and mixed reality technologies, the teleoperation of

drones avatars is suggested by Cordar et al. [7] for human

telepresence and foster empathy with virtual agents and

robots

In this paper, we propose a novel approach to the task of

telepresence, involving a swarm of drones in broadcasting

emotions from the operator to the user.

II. RELATED WORKS

Anthropomorphic robot avatars were extensively inves-

tigated and improved in recent years. Such systems as

TELESAR VI developed by Tachi et al. [8] allow dexterous

remote manipulation and communication through an avatar

designed to resemble the upper body of the human.

Several researchers investigated effective communication

through robot avatars. For example, Tsetserukou et al. [9]

explored remote affective communications and proposed the

robotic haptic device iFeel IM to augment the emotional

experience during online conversations. Bartneck et al. [10]

explored the dependence of human emotion perception on the

character’s embodiment, showing that there is no signiﬁcant

difference in the perceived intensity and recognition accuracy

between robotic and screen characters. Chao-gang et al.

[11] proposed a facial emotion generation model based on

random graphs for virtual robots. A fuzzy emotion system

that controls the face and the voice modules was developed

by Vasquez et al. [12] for a tour-guide mobile robot.

Though facial expressions play a major role in emotional

recognition, the dynamic body postures could be recognized

with relatively high precision. Matsui et al. [13] proposed a

motion mapping approach to generate natural behavior for

humanoid robots by copying human gestures. Cohen et al.

[14] explored children’s reactions to the iCat and NAO robots

and achieved to design of well-recognized body postures for

NAO. The end-to-end neural network model was developed

by Yoon et al. [15] to generate sequences of human-like

gestures enhancing NAO speech content. The variational

auto-encoder framework was implemented by Marmpena et

al. [16] for generating numerous emotional body language

for the anthropomorphic Pepper robot.

III. SYSTEM ARCHITECTURE

In the developed architecture shown in Fig. 2 the user

interacts with the avatar swarm by visual interpretation of

the emotion while the avatar operator performs the various

body pose gestures to operate the avatar swarm of drones.

The tracking and localization of the drones are done through

the VICON mocap system which consists of 12 infra-red

(IR) cameras.

Fig. 2. Layout of the SwarMan system.

In the remote environment, the operator performs various

gestures which showcase different emotions. These poses are

captured by a DL-based gesture recognition algorithm which

includes the major upper-body nine landmarks which include

head, neck, left-shoulder, right-shoulder, left-elbow, right-

elbow, right-hand and left-hand. These landmarks are then

passed to the decision-making and agent allocation algorithm

where along with the localized positions of the swarm the

designated positions of the swarm of drones are calculated

according to the relative positions of the major nine joints

of the human upper body pose. The user interacts with the

swarm of drones visually to understand the emotions that

the operator was trying to perform. Along with the different

poses of the emotions, the light rings on the drones also

convey a psychological effect on the user for interpreting

the type of emotion which includes green for happy, red for

angry, white for neutral, yellow for confusion, and blue for

sad.

IV. TRAJECTORY GENERATION AND SWARM CONTROL

For a more immersive experience and intuitive control,

the operator of the avatar is controlling the swarm of drones

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SwarMan:AnthropomorphicSwarmofDronesAvatarwithBodyTrackingandDeepLearning-BasedGestureRecognitionAhmedBazaDigitalEngineeringCenterSkoltechMoscow,Russiaahmed.baza@skoltech.ruAyushGuptaDigitalEngineeringCenterSkoltechMoscow,Russiaayush.gupta@skoltech.ruEkaterinaDorzhievaDigitalEngineeringCenterSkoltec...

展开>> 收起<<

SwarMan Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SwarMan Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: