A Perception-Driven Approach To Immersive Remote Telerobotics. Y . T. Tefera12 D. Mazzanti1 S. Anastasi3 D. G. Caldwell1 P. Fiorini2 and N. Deshpande1

2025-04-24 0 0 5.53MB 3 页 10玖币

侵权投诉

A Perception-Driven Approach To Immersive

Remote Telerobotics.

Y. T. Tefera1,2, D. Mazzanti1, S. Anastasi3, D. G. Caldwell1, P. Fiorini2, and N. Deshpande1

1Istituto Italiano di Tecnologia (IIT), Via Morego 30, 16163 Genova, Italy

2Department of Computer Science, University of Verona, Via Strada Le Grazie 15, 37134 Verona, Italy

3Istituto Nazionale per l’Assicurazione contro gli Infortuni sul Lavoro (INAIL), P.le Pastore 6, 00144 Rome, Italy

Abstract—Virtual Reality (VR) interfaces are increasingly used

as remote visualization media in telerobotics. Remote environ-

ments captured through RGB-D cameras and visualized using

VR interfaces can enhance operators’ situational awareness and

sense of presence. However, this approach has strict requirements

for the speed, throughput, and quality of the visualized 3D data.

Further, telerobotics requires operators to focus on their tasks

fully, requiring high perceptual and cognitive skills. This paper

shows a work-in-progress framework to address these challenges

by taking the human visual system (HVS) as an inspiration.

Human eyes use attentional mechanisms to select and draw user

engagement to a speciﬁc place from the dynamic environment.

Inspired by this, the framework implements functionalities to

draw users’s engagement to a speciﬁc place while simultaneously

reducing latency and bandwidth requirements.

Index Terms—PointCloud, Virtual Reality, Object Detection,

Foveated Rendering, Telerobotics

I. INTRODUCTION

Immersive remote telerobotics (IRT) allows real-time im-

mersive visualization and interaction by the user: perceiving

the color and 3D proﬁle of remote environments, while

simultaneously interacting with the robotic agents [6, 4].

However, such systems bring several challenges, including

high resolution in a wide ﬁeld-of-view (FOV), network latency,

bandwidth, and perceptual and cognitive constraints. Effective

implementations in this ﬁeld would immeasurably improve the

usability and performance for users.

The human visual system (HVS) has a unique characteristic:

it does not need every pixel in the FOV to be rendered at

uniformly high quality. It has the highest visual acuity at the

center of the FOV, which then drops off towards the periphery.

This characteristic can help address the above challenges to

some degree. In our earlier work in this domain, the user’s gaze

is exploited to divide the acquired 3D data (point-cloud) into

concentric conical regions of progressively reducing resolution

away from the center of the gaze, termed as foveated rendering

[7]. It was shown to improve latency and throughput in the data

communication. Some preliminary user trials showed that such

foveated rendering in VR had minimal impact on the quality

of user experience [7]. Despite these improvements, there

are limitations when dealing with dynamic and unstructured

visual information. Foveated rendering may lead to an inability

to notice signiﬁcant visual changes in the peripheral regions

This research was conducted in collaboration with the Italian National

Worker’s Compensation Authority (INAIL).

of the FOV. This paper builds on the previous work, by

emphasizing users’ attentional mechanisms by adding a real-

time scene understanding system and utilizing it to draw the

user’s attention to desired places in the FOV.

II. SYSTEM OVERVIEW

The proposed framework shown in Figure 1 comprises

remote and user site systems that integrate foveated rendering

(from our previous work [7]), with semantic scene understand-

ing.

The remote site consists of modules implemented for RGB-

D data acquisition, semantic scene segmentation, partition-

ing, foveated sampling, encoding, and parallel streaming.

The object detection and segmentation module implement a

technique to extract information about the environment and

object categories and position information using the state-of-

the-art neural network architecture, YOLACT [1], in real-time

(>30fps). The pipeline takes the input RGB image Cand

outputs Nnumber of object masks m:Ω→R. For all masks

∀m∈C, it gives a conﬁdence value, p∈ {0,100}, bounding

boxes b: Ω →N4, and class IDs It∈ {0,100}, as shown

in Figure 2. For all Ndetected objects, their colour and depth

masks are projected as a point-cloud using an unordered list

of surfels, where each surfel has a position p∈R3, a normal

n∈R3, a colour c∈R3, a weight w∈R, a radius r∈R,

an initialization timestamp t0, and a current timestamp t.

The foveated map partitioning and sampling concept in-

troduced in our work [7] is extended here. Whereas previ-

ously, the whole map was considered as a single entity for

partitioning and sampling, here each independent object map

Mnis divided into concentric conical regions. Each conical

region can then be approximated based on the monotonically

decreasing visual acuity in the foveation model, shown in Fig.

2 - (bottom-left) [3]. The R3space of each Mnregion is

further partitioned into axis-aligned voxels. In the previous

work [7], the foveated sampling would down-sample all the

voxels in the peripheral regions uniformly. However, this

would mean that objects in the peripheral regions would have

low quality and would not be noticed by users. Instead, in this

paper, with the object-level map, desired objects detected in

the periphery can be kept in high resolution, i.e., not down-

sampled, thus highlighting them to draw users’ attention. A

conceptual implementation is shown in Fig. 2 - (bottom-right),

arXiv:2210.05417v1 [cs.HC] 11 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

APerception-DrivenApproachToImmersiveRemoteTelerobotics.Y.T.Tefera1;2,D.Mazzanti1,S.Anastasi3,D.G.Caldwell1,P.Fiorini2,andN.Deshpande11IstitutoItalianodiTecnologia(IIT),ViaMorego30,16163Genova,Italy2DepartmentofComputerScience,UniversityofVerona,ViaStradaLeGrazie15,37134Verona,Italy3IstitutoNazional...

展开>> 收起<<

A Perception-Driven Approach To Immersive Remote Telerobotics. Y . T. Tefera12 D. Mazzanti1 S. Anastasi3 D. G. Caldwell1 P. Fiorini2 and N. Deshpande1.pdf

共3页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Perception-Driven Approach To Immersive Remote Telerobotics. Y . T. Tefera12 D. Mazzanti1 S. Anastasi3 D. G. Caldwell1 P. Fiorini2 and N. Deshpande1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: