Virtual Reality via Object Pose Estimation and Active Learning Realizing Telepresence Robots with Aerial Manipulation Capabilities

2025-05-06 0 0 4.47MB 43 页 10玖币

侵权投诉

Virtual Reality via Object Pose Estimation and

Active Learning: Realizing Telepresence Robots

with Aerial Manipulation Capabilities

Jongseok Lee1,2,*,Ribin Balachandran1,Konstantin Kondak1,Andre Coelho1,3,Marco De Stefano1,

Matthias Humt1,Jianxiang Feng1,Tamim Asfour2and Rudolph Triebel1,4

1Institute of Robotics and Mechatronics, German Aerospace Center (DLR)

2Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology (KIT)

3Robotics and Mechatronics Laboratory, University of Twente (UT)

4Chair of Computer Vision and Artiﬁcial Intelligence, Technical University of Munich (TUM)

*Correspondence to jongseok.lee@dlr.de

Abstract

This article presents a novel telepresence system for advancing aerial manipulation in dynamic and

unstructured environments. The proposed system not only features a haptic device, but also a virtual

reality (VR) interface that provides real-time 3D displays of the robot’s workspace as well as a

haptic guidance to its remotely located operator. To realize this, multiple sensors namely a LiDAR,

cameras and IMUs are utilized. For processing of the acquired sensory data, pose estimation pipelines

are devised for industrial objects of both known and unknown geometries. We further propose an

active learning pipeline in order to increase the sample eﬃciency of a pipeline component that relies

on Deep Neural Networks (DNNs) based object detection. All these algorithms jointly address

various challenges encountered during the execution of perception tasks in industrial scenarios.

In the experiments, exhaustive ablation studies are provided to validate the proposed pipelines.

Methodologically, these results commonly suggest how an awareness of the algorithms’ own failures

and uncertainty (‘introspection’) can be used tackle the encountered problems. Moreover, outdoor

experiments are conducted to evaluate the eﬀectiveness of the overall system in enhancing aerial

manipulation capabilities. In particular, with ﬂight campaigns over days and nights, from spring to

winter, and with diﬀerent users and locations, we demonstrate over 70 robust executions of pick-and-

place, force application and peg-in-hole tasks with the DLR cable-Suspended Aerial Manipulator

(SAM). As a result, we show the viability of the proposed system in future industrial applications1.

Keywords Pose Estimation, Active Learning, Virtual Reality, Telepresence, Aerial Manipulation.

1 Introduction

The global market for robotic inspection and maintenance is growing fast with an expected annual turnover of up to

4.37 billion dollars by 2025

. Recently, international corporations and organizations, such as General Electric, Sprint

Robotics, Baker Hughes and Boston Dynamics, have started initiatives to generate and evaluate robotic technologies

for inspection and maintenance applications. One of the most prominent directions for these real world industrial

1A video material accompanying this paper can be found at https://www.youtube.com/watch?v=JRnPIARW8xY

BIS Research, Global Inspection and Maintenance Robot Market: Focus on Type, Component, and End User - Analysis and Forecast, 2020-2025;

March 2020

arXiv:2210.09678v2 [cs.RO] 10 Feb 2023

Figure 1: Left: the cable-Suspended Aerial Manipulator, dubbed SAM (Sarkisov et al.,2019) during ﬁeld experiment.

Right: a ground station where an operator remotely controls the robotic arm through a haptic interface. In real world

applications of bilateral teleoperation, the operator is often remotely located without visual contact to the robot.

applications is aerial manipulation (Ollero et al.,2022). An aerial manipulation system is composed of robotic

manipulators and a controlled ﬂying platform (Fishman et al.,2021;Bodie et al.,2020;Kondak et al.,2014;Kim et al.,

2013). The platform enables coarse positioning while the manipulator enables dexterous grasping and manipulation for

complex tasks. Hence, these aerial platforms extend the mobility of robotic manipulators, which can be deployed at high

altitudes above ground, increasing safety for human workers while reducing costs. Examples of aerial manipulation

applications range from load transportation (Bernard and Kondak,2009), contact based inspection and maintenance in

chemical plants (Trujillo et al.,2019), bridges (Sanchez-Cuevas et al.,2019), power-line maintenance (Cacace et al.,

2021), to sensor installations in forests for ﬁre prevention (Hamaza et al.,2019).

In this article, the real world applications of aerial manipulators are envisioned for several industrial scenarios in

dynamic and unstructured environments. For these industrial applications of aerial manipulators, our current interests

are in the bilateral teleoperation concepts, i.e., a human operator remotely controls the robotic manipulator from a

safe area on ground and receives visual and haptic feedback from the robot. This increases human operator safety

while the robots execute their tasks in dangerous environments (Hulin et al.,2021;Hirzinger et al.,2003). Such a

concept is motivated by having a robotic system with a human-in-the-loop, where the system can leverage human

intelligence to reliably accomplish its missions. To realize this, existing works have focused on relevant components of

the system, namely force feedback teleoperation under time delays (Balachandran et al.,2021b;Artigas et al.,2016),

shared autonomy (Masone et al.,2018), human-machine interfaces (Kim and Oh,2021;Yashin et al.,2019;Wu et al.,

2018), and robotic perception for aerial manipulators (Karrer et al.,2016;Pumarola et al.,2019).

Building upon the aforementioned developments, we propose a novel virtual reality (VR)-based telepresence system for

an aerial manipulation system operating in industrial scenarios. Figures 1and 2illustrate the main idea. The proposed

system is intended for real world scenarios, where the remotely located robot performs aerial manipulation tasks, while

its human operator is inside a ground station without having direct visual contact with the robot (Figure 1). To this

end, we propose a system which does not only involve a haptic device to enable the sense of touch for the operator, but

also a VR to increase the sense of vision (Figure 2). While the live video streams can also provide a certain level of

situation awareness to the operator, several studies conﬁrm that adding a virtual environment where one can change its

sight-of-view, zoom in and out, and further provide haptic guidance, supports the operator in accomplishing the tasks

(Pace et al.,2021;Whitney et al.,2020;Huang et al.,2019). Our own ﬁeld studies also conﬁrm that augmenting live

video streams with 3D visual feedback and haptic guidance can enhance manipulation capabilities of aerial robots.

The main novelty of our VR based concept is its realization with a fully on-board perception system for a ﬂoating-base

robot, which does not rely on any external sensors like Vicon, or any pre-generated maps in outdoor environments.

Instead, multiple sensors, namely LiDAR, a monocular camera, a pair of stereo cameras and inertial measurements

VR creation

from robot perception

3D visual feedback and haptic guidance

Force feedback

teleoperation

Figure 2: The proposed telepresence system with VR from robot perception and active learning. In the proposed system,

the robot creates VR of its workspaces as a 3D visual feedback to the human operator, and further provides a haptic

guidance. The main novelty of this work is the realization of such a system for real world scenarios.

units (IMUs) are jointly utilized (Table 1). To achieve this, we propose object pose estimation and active learning

pipelines. First, in order to virtually display industrial objects with known geometry, we provide a simple extension of a

marker tracking algorithm (Wagner and Schmalstieg,2007) by combining with on-board Simultaneous Localization

And Mapping (SLAM). Second, if the objects of interests are geometrically unknown, we devise a LiDAR based pose

estimation pipeline that combines LiDAR Odometry And Mapping (LOAM Zhang and Singh (2017)) with a pose graph,

a point cloud registration algorithm (Besl and McKay,1992), and a Deep Neural Network (DNN) based object detector

(Lin et al.,2017). For both the cases, the combinations are facilitated by an introspection (Grimmett et al.,2016) module

that identiﬁes the reliability of the pose estimation. Finally, we present a pool based active learning pipeline, which uses

an explicit representation of DNN’s uncertainty, to generate the most informative samples for a DNN to learn from.

This enhances the sample eﬃciency of deploying DNN based algorithms in outdoor environments. We identify certain

real world challenges and describe in detail how these introspective approaches can mitigate these challenges.

With the DLR’s SAM platform (Sarkisov et al.,2019), the feasibility and beneﬁts of the proposed idea are examined.

To this end, we ﬁrst present ablation studies on the designed pipelines with indoor and outdoor data-sets from the robot

sensors. Here, the inﬂuence of each component is examined with regard to mitigating the identiﬁed challenges, and we

show the feasibility of creating the real-time VR, which can closely match the real workspaces of the robot. Moreover,

the eﬀectiveness of the proposed method is shown through outdoor experiments within the considered industrial scenario.

This scenario, which was designed under the scope of EU project AEROARMS (Ollero et al.,2018), is relevant to

inspection and maintenance applications for gas and oil industry. It involves pick-and-place and force-exertion tasks

during the mission, which is to deploy a robotic crawler for automating pipe inspection routines. Moreover, the SAM

platform executing peg-in-hole tasks with a margin of error less than 2.5 mm is further considered, which is one of the

standard manipulation tasks in industrial settings. By executing over 70 executions of the aforementioned tasks over

days and nights, from spring to winter, and with diﬀerent users and locations, the beneﬁts of our VR based telepresence

concept are illustrated for enhancing aerial manipulation capabilities in real world industrial applications.

In summary, the key contributions of this work are:

•

We propose an advanced VR based telepresence system for aerial manipulation, which provides a 3D visual

feedback and a haptic guidance. The system neither requires any external sensors nor pre-generated maps,

has been evaluated outside laboratory settings, and can cope with the challenges of a ﬂoating-base system.

Moreover, multiple sensors are fused to exploit their respective strengths for the given perception tasks.

•

We devise object pose estimation and active learning pipelines to realize the proposed system in dynamic

and unstructured environments. Challenges to existing methods are reported, and several ablation studies are

provided to validate the proposed approaches. Methodologically, this work suggests the relevance of robotic

introspection in realizing VR based telepresence robots with aerial manipulation capabilities.

•

We perform exhaustive ﬂight experiments over extended durations including 40 task executions in outdoor

environments, 27 task executions within a user validation study, and the operation of the system at night. Thus,

we establish the proposed concept as a viable future option for real world industrial applications.

Outside No external Floating-base Multiple

the laboratory sensors or pre manipulation exteroceptive

settings? generated map? system? sensors?

AeroVR (Yashin et al.,2019)7 7 3 7

ARMAR-6 (Pohl et al.,2020)7 3 7 7

ModelSegmentation(Kohn et al.,2018)7 3 7 7

AvatarDrone (Kim and Oh,2021)7 7 3 7

PaintCopter (Vempati et al.,2019)7 7 3 7

AR (Liu and Shen,2020)7 3 7 7

AR (Puljiz et al.,2020)7 3 7 7

GraspLook (Ponomareva et al.,2021)7 3 7 7

The proposed system 3 3 3 3

Table 1: Comparisons between the existing VR based robotic systems and the proposed system.

The paper starts with a survey of related work (Section 2) and provide the system description of SAM robot hardware,

human-machine interfaces, sensor choices, and integration (Section 3.1). We formulate the problem of the VR creation,

and identify challenges in realizing the system (Section 3.2). Then, the designed pipelines are presented, which are to

address these challenges (Section 4). In Section 5.1, we provide ablation studies to validate the designed framework,

while Section 5.2 contains the results of our ﬂight experiments. We report the lessons learned in Section 5.4 and

conclude the work with some future extensions in Section 6.

Relation to Previous Publications

This paper extends the author’s previous publications, namely Lee et al. (2020a)

and Lee et al. (2020b). In terms of methodology, we provide a LiDAR based pose estimation pipeline (Section 4.2). This

extension enables the creation of VR without relying on markers, which is required in industrial scenarios. The devised

active learning pipeline for object detection (Section 4.3) extends and brings the previous theoretical framework (Lee

et al.,2020b) to practical applications. Furthermore, with respect to experimental contributions, this article provides

new ablation studies that are associated with the new methods. Most importantly, exhaustive outdoor experiments for

manipulation tasks are further performed to examine the beneﬁts of the proposed VR based concept over extended

durations and characterize its technical readiness for industrial applications.

2 Related Work

The proposed VR based concept advances the area of VR interfaces for robotics. The comparison of this work to

existing works is summarized in Table 1. The current literature from diﬀerent domains of robotic research is discussed,

which are, pose estimation (Section 4.1 and 4.2), and active learning with DNNs (Section 4.3). Importantly, we stress

that this work is not to advance the state-of-the-art methods in these two areas. Rather, the aim is to apply and extend

them to realize a working system for the given industrial scenarios. For example, the provided extension of a marker

tracking algorithm with visual-inertial SLAM is not the main contribution of this paper. Lastly, we further locate our

work within the literature of aerial robotic perception in ﬁeld applications.

Virtual Reality Interfaces

In the past, several VR interfaces have been widely utilized in robotics including aerial

systems (Wonsick and Padir,2020). So far, the presented approaches often create the VR either by using external

sensors such as Vicon and a-priori generated maps. Notably, Vempati et al. (2019) utilizes a-priori generated maps for

the applications of VR in aerial painting. For aerial manipulation, Yashin et al. (2019) uses Vicon system to create the

VR while Kim and Oh (2021) renders the environment with a portable sensor kit (Oh et al.,2017). Recently, many VR

techniques have gained interest in the robotic manipulation community. Therein, many works (Haidu and Beetz,2021;

Zhang et al.,2020b) let a human perform demonstration in VR, and transfer the demonstrated manipulation skills to

real robots. These works greatly show the synergy between VR and robotics. As this paper demonstrates the feasibility

of creating VR with on-board sensors only, the work can contribute to many of these works in showing how one can

create a VR for robotics.

On the contrary, many researchers aimed to provide VR of the remote scene by applying 3D reconstruction techniques

(Ni et al.,2017;Kohn et al.,2018). For example, Kohn et al. (2018) presents an approach using RGB-D camera. As the

main challenge of reconstruction based methods is the limited bandwidth in communication, Kohn et al. (2018) proposes

an object recognition pipeline, i.e., replace the detected object with sparse virtual meshes and discard the dense sensor

data. Pohl et al. (2020) uses RGB-D sensor to construct a VR for aﬀordance based manipulation with a humanoid,

while Liu and Shen (2020) and Puljiz et al. (2020) create augmented reality for a drone and a manipulator, respectively.

Pace et al. (2021) conducts a user study, and argues that the point clouds of RGB-D sensors are noisy and inaccurate

(with artifacts), which motivates for point cloud pre-processing methods for telepresence applications (Pace et al.,2021).

In contrast, our approach is based on scene graphs (Section 3.2) with pose estimation, which is an alternative to 3D

reconstruction methods. Finally, the main novelties are illustrated in Table 1, which are the realizations of a VR based

telepresence system for outdoor environments using multiple sensors jointly. No external sensors or pre-generated maps

are used, while dealing with speciﬁc challenges of a ﬂoating-base manipulation system, i.e., the surface that holds a

robotic arm is constantly changing over time, thereby inducing motions for the attached sensors.

Object Pose Estimation

One of the crucial components in the proposed framework is object pose estimation algorithms.

This is because we utilize a scene graph representation, which requires 6D pose of the objects for creating a 3D display,

as opposed to a 3D reconstruction of the remote site. As the literature is vast, we refer to the survey (He et al.,2021) for

a comprehensive review. In this work, the main novelty is the working solutions for the considered application, which is

tailored towards realizing the proposed VR system. For this, the two scenarios are discussed below. These are visual

object pose estimation for objects of known geometry, and LiDAR based method for unknown geometry.

If the object is known and accessible a-priori, one of the robust solutions is to use ﬁdicual marker systems. Fidicual

markers, which create artiﬁcial features on the scene for pose estimation, are widely used in robotics. The use-

cases are for creating the ground truths (Wang and Olson,2016), where environments are known (Malyuta et al.,

2020), for simplifying the problem in lieu of sophisticated perception (Laiacker et al.,2016), and also calibration

and mapping (Nissler et al.,2018). However, as the herein aim is on real-time VR creation, this use-case demands

stringent requirements on their limitations in run-time, inherent time-delays and robustness. Therefore, an extension of

ARToolKitPlus is provided (Wagner and Schmalstieg,2007) with an on-board visual-inertial SLAM system.

For LiDAR, point cloud registration is often used for pose estimation. By ﬁnding the transformation between the

current scans and a CAD model of an object, we can obtain 6D pose of an object. Broadly, point cloud registration

algorithms can be classiﬁed as local (Park et al.,2017;Rusinkiewicz and Levoy,2001;Besl and McKay,1992) or

global (Zhou et al.,2016), and model based (Pomerleau et al.,2015) or learning based (Wang and Solomon,2019;

Zhang et al.,2020a). As CAD models of objects are often not available in the given industrial scenario, a DNN based

detector and the idea of LOAM with pose graphs are combined, in order to obtain robust object pose estimates that cope

with occlusions, moving parts and view point variations in the scene.

Active Learning for Neural Networks

The motivations are the considerations of ﬁeld robotic applications of DNN

based object detectors. Here, the need for labeled data can cause overhead in development processes, especially while

considering a long-term deployment of learning systems in outdoor environments. For example, weather conditions can

change depending on seasons, and we need to eﬃciently create labeled data. Active learning provides a principled way

to reduce manual annotations by explicitly picking data that are worth being labeled. One way to autonomously generate

the ”worth” of an unlabeled sample is to use uncertainty of DNNs. In the past, for robot perception, we ﬁnd active

learning frameworks using random forests, Gaussian processes, etc (Narr et al.,2016;Mund et al.,2015) while for

DNNs, MacKay (1992) pioneered an active learning approach based on Bayesian Neural Networks, i.e., a probabilistic

or stochastic DNN (Gawlikowski et al.,2021), which oﬀers a principled method for uncertainty quantiﬁcation. Recent

works can also be found on active learning for DNN based object detectors (Choi et al.,2021;Aghdam et al.,2019),

where the focus is on adaptations of active learning to existing object detection frameworks. These include new

acquisition functions (or selection criteria) and how uncertainty estimates are generated.

For uncertainty quantiﬁcation in DNNs, so-called Monte-Carlo dropout (MC-dropout Gal and Ghahramani (2016)) has

gained popularity recently. The main advantage of MC-dropout is that it is relatively easy to use and scale to large

data-set. However, MC-dropout requires a speciﬁc stochastic regularization called dropout (Srivastava et al.,2014).

This limits its use on already well trained architectures, because the current DNN based object detectors are often

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

VirtualRealityviaObjectPoseEstimationandActiveLearning:RealizingTelepresenceRobotswithAerialManipulationCapabilitiesJongseokLee1,2,*,RibinBalachandran1,KonstantinKondak1,AndreCoelho1,3,MarcoDeStefano1,MatthiasHumt1,JianxiangFeng1,TamimAsfour2andRudolphTriebel1,41InstituteofRoboticsandMechatronics,Ge...

展开>> 收起<<

Virtual Reality via Object Pose Estimation and Active Learning Realizing Telepresence Robots with Aerial Manipulation Capabilities.pdf

共43页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Virtual Reality via Object Pose Estimation and Active Learning Realizing Telepresence Robots with Aerial Manipulation Capabilities

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: