Virtual Reality via Object Pose Estimation and Active Learning Realizing Telepresence Robots with Aerial Manipulation Capabilities

2025-05-06 0 0 4.47MB 43 页 10玖币
侵权投诉
Virtual Reality via Object Pose Estimation and
Active Learning: Realizing Telepresence Robots
with Aerial Manipulation Capabilities
Jongseok Lee1,2,*,Ribin Balachandran1,Konstantin Kondak1,Andre Coelho1,3,Marco De Stefano1,
Matthias Humt1,Jianxiang Feng1,Tamim Asfour2and Rudolph Triebel1,4
1Institute of Robotics and Mechatronics, German Aerospace Center (DLR)
2Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology (KIT)
3Robotics and Mechatronics Laboratory, University of Twente (UT)
4Chair of Computer Vision and Artificial Intelligence, Technical University of Munich (TUM)
*Correspondence to jongseok.lee@dlr.de
Abstract
This article presents a novel telepresence system for advancing aerial manipulation in dynamic and
unstructured environments. The proposed system not only features a haptic device, but also a virtual
reality (VR) interface that provides real-time 3D displays of the robot’s workspace as well as a
haptic guidance to its remotely located operator. To realize this, multiple sensors namely a LiDAR,
cameras and IMUs are utilized. For processing of the acquired sensory data, pose estimation pipelines
are devised for industrial objects of both known and unknown geometries. We further propose an
active learning pipeline in order to increase the sample eciency of a pipeline component that relies
on Deep Neural Networks (DNNs) based object detection. All these algorithms jointly address
various challenges encountered during the execution of perception tasks in industrial scenarios.
In the experiments, exhaustive ablation studies are provided to validate the proposed pipelines.
Methodologically, these results commonly suggest how an awareness of the algorithms’ own failures
and uncertainty (‘introspection’) can be used tackle the encountered problems. Moreover, outdoor
experiments are conducted to evaluate the eectiveness of the overall system in enhancing aerial
manipulation capabilities. In particular, with flight campaigns over days and nights, from spring to
winter, and with dierent users and locations, we demonstrate over 70 robust executions of pick-and-
place, force application and peg-in-hole tasks with the DLR cable-Suspended Aerial Manipulator
(SAM). As a result, we show the viability of the proposed system in future industrial applications1.
Keywords Pose Estimation, Active Learning, Virtual Reality, Telepresence, Aerial Manipulation.
1 Introduction
The global market for robotic inspection and maintenance is growing fast with an expected annual turnover of up to
4.37 billion dollars by 2025
2
. Recently, international corporations and organizations, such as General Electric, Sprint
Robotics, Baker Hughes and Boston Dynamics, have started initiatives to generate and evaluate robotic technologies
for inspection and maintenance applications. One of the most prominent directions for these real world industrial
1A video material accompanying this paper can be found at https://www.youtube.com/watch?v=JRnPIARW8xY
2
BIS Research, Global Inspection and Maintenance Robot Market: Focus on Type, Component, and End User - Analysis and Forecast, 2020-2025;
March 2020
arXiv:2210.09678v2 [cs.RO] 10 Feb 2023
Figure 1: Left: the cable-Suspended Aerial Manipulator, dubbed SAM (Sarkisov et al.,2019) during field experiment.
Right: a ground station where an operator remotely controls the robotic arm through a haptic interface. In real world
applications of bilateral teleoperation, the operator is often remotely located without visual contact to the robot.
applications is aerial manipulation (Ollero et al.,2022). An aerial manipulation system is composed of robotic
manipulators and a controlled flying platform (Fishman et al.,2021;Bodie et al.,2020;Kondak et al.,2014;Kim et al.,
2013). The platform enables coarse positioning while the manipulator enables dexterous grasping and manipulation for
complex tasks. Hence, these aerial platforms extend the mobility of robotic manipulators, which can be deployed at high
altitudes above ground, increasing safety for human workers while reducing costs. Examples of aerial manipulation
applications range from load transportation (Bernard and Kondak,2009), contact based inspection and maintenance in
chemical plants (Trujillo et al.,2019), bridges (Sanchez-Cuevas et al.,2019), power-line maintenance (Cacace et al.,
2021), to sensor installations in forests for fire prevention (Hamaza et al.,2019).
In this article, the real world applications of aerial manipulators are envisioned for several industrial scenarios in
dynamic and unstructured environments. For these industrial applications of aerial manipulators, our current interests
are in the bilateral teleoperation concepts, i.e., a human operator remotely controls the robotic manipulator from a
safe area on ground and receives visual and haptic feedback from the robot. This increases human operator safety
while the robots execute their tasks in dangerous environments (Hulin et al.,2021;Hirzinger et al.,2003). Such a
concept is motivated by having a robotic system with a human-in-the-loop, where the system can leverage human
intelligence to reliably accomplish its missions. To realize this, existing works have focused on relevant components of
the system, namely force feedback teleoperation under time delays (Balachandran et al.,2021b;Artigas et al.,2016),
shared autonomy (Masone et al.,2018), human-machine interfaces (Kim and Oh,2021;Yashin et al.,2019;Wu et al.,
2018), and robotic perception for aerial manipulators (Karrer et al.,2016;Pumarola et al.,2019).
Building upon the aforementioned developments, we propose a novel virtual reality (VR)-based telepresence system for
an aerial manipulation system operating in industrial scenarios. Figures 1and 2illustrate the main idea. The proposed
system is intended for real world scenarios, where the remotely located robot performs aerial manipulation tasks, while
its human operator is inside a ground station without having direct visual contact with the robot (Figure 1). To this
end, we propose a system which does not only involve a haptic device to enable the sense of touch for the operator, but
also a VR to increase the sense of vision (Figure 2). While the live video streams can also provide a certain level of
situation awareness to the operator, several studies confirm that adding a virtual environment where one can change its
sight-of-view, zoom in and out, and further provide haptic guidance, supports the operator in accomplishing the tasks
(Pace et al.,2021;Whitney et al.,2020;Huang et al.,2019). Our own field studies also confirm that augmenting live
video streams with 3D visual feedback and haptic guidance can enhance manipulation capabilities of aerial robots.
The main novelty of our VR based concept is its realization with a fully on-board perception system for a floating-base
robot, which does not rely on any external sensors like Vicon, or any pre-generated maps in outdoor environments.
Instead, multiple sensors, namely LiDAR, a monocular camera, a pair of stereo cameras and inertial measurements
VR creation
from robot perception
3D visual feedback and haptic guidance
Force feedback
teleoperation
Figure 2: The proposed telepresence system with VR from robot perception and active learning. In the proposed system,
the robot creates VR of its workspaces as a 3D visual feedback to the human operator, and further provides a haptic
guidance. The main novelty of this work is the realization of such a system for real world scenarios.
units (IMUs) are jointly utilized (Table 1). To achieve this, we propose object pose estimation and active learning
pipelines. First, in order to virtually display industrial objects with known geometry, we provide a simple extension of a
marker tracking algorithm (Wagner and Schmalstieg,2007) by combining with on-board Simultaneous Localization
And Mapping (SLAM). Second, if the objects of interests are geometrically unknown, we devise a LiDAR based pose
estimation pipeline that combines LiDAR Odometry And Mapping (LOAM Zhang and Singh (2017)) with a pose graph,
a point cloud registration algorithm (Besl and McKay,1992), and a Deep Neural Network (DNN) based object detector
(Lin et al.,2017). For both the cases, the combinations are facilitated by an introspection (Grimmett et al.,2016) module
that identifies the reliability of the pose estimation. Finally, we present a pool based active learning pipeline, which uses
an explicit representation of DNN’s uncertainty, to generate the most informative samples for a DNN to learn from.
This enhances the sample eciency of deploying DNN based algorithms in outdoor environments. We identify certain
real world challenges and describe in detail how these introspective approaches can mitigate these challenges.
With the DLR’s SAM platform (Sarkisov et al.,2019), the feasibility and benefits of the proposed idea are examined.
To this end, we first present ablation studies on the designed pipelines with indoor and outdoor data-sets from the robot
sensors. Here, the influence of each component is examined with regard to mitigating the identified challenges, and we
show the feasibility of creating the real-time VR, which can closely match the real workspaces of the robot. Moreover,
the eectiveness of the proposed method is shown through outdoor experiments within the considered industrial scenario.
This scenario, which was designed under the scope of EU project AEROARMS (Ollero et al.,2018), is relevant to
inspection and maintenance applications for gas and oil industry. It involves pick-and-place and force-exertion tasks
during the mission, which is to deploy a robotic crawler for automating pipe inspection routines. Moreover, the SAM
platform executing peg-in-hole tasks with a margin of error less than 2.5 mm is further considered, which is one of the
standard manipulation tasks in industrial settings. By executing over 70 executions of the aforementioned tasks over
days and nights, from spring to winter, and with dierent users and locations, the benefits of our VR based telepresence
concept are illustrated for enhancing aerial manipulation capabilities in real world industrial applications.
In summary, the key contributions of this work are:
We propose an advanced VR based telepresence system for aerial manipulation, which provides a 3D visual
feedback and a haptic guidance. The system neither requires any external sensors nor pre-generated maps,
has been evaluated outside laboratory settings, and can cope with the challenges of a floating-base system.
Moreover, multiple sensors are fused to exploit their respective strengths for the given perception tasks.
We devise object pose estimation and active learning pipelines to realize the proposed system in dynamic
and unstructured environments. Challenges to existing methods are reported, and several ablation studies are
provided to validate the proposed approaches. Methodologically, this work suggests the relevance of robotic
introspection in realizing VR based telepresence robots with aerial manipulation capabilities.
We perform exhaustive flight experiments over extended durations including 40 task executions in outdoor
environments, 27 task executions within a user validation study, and the operation of the system at night. Thus,
we establish the proposed concept as a viable future option for real world industrial applications.
Outside No external Floating-base Multiple
the laboratory sensors or pre manipulation exteroceptive
settings? generated map? system? sensors?
AeroVR (Yashin et al.,2019)7 7 3 7
ARMAR-6 (Pohl et al.,2020)7 3 7 7
ModelSegmentation(Kohn et al.,2018)7 3 7 7
AvatarDrone (Kim and Oh,2021)7 7 3 7
PaintCopter (Vempati et al.,2019)7 7 3 7
AR (Liu and Shen,2020)7 3 7 7
AR (Puljiz et al.,2020)7 3 7 7
GraspLook (Ponomareva et al.,2021)7 3 7 7
The proposed system 3 3 3 3
Table 1: Comparisons between the existing VR based robotic systems and the proposed system.
The paper starts with a survey of related work (Section 2) and provide the system description of SAM robot hardware,
human-machine interfaces, sensor choices, and integration (Section 3.1). We formulate the problem of the VR creation,
and identify challenges in realizing the system (Section 3.2). Then, the designed pipelines are presented, which are to
address these challenges (Section 4). In Section 5.1, we provide ablation studies to validate the designed framework,
while Section 5.2 contains the results of our flight experiments. We report the lessons learned in Section 5.4 and
conclude the work with some future extensions in Section 6.
Relation to Previous Publications
This paper extends the author’s previous publications, namely Lee et al. (2020a)
and Lee et al. (2020b). In terms of methodology, we provide a LiDAR based pose estimation pipeline (Section 4.2). This
extension enables the creation of VR without relying on markers, which is required in industrial scenarios. The devised
active learning pipeline for object detection (Section 4.3) extends and brings the previous theoretical framework (Lee
et al.,2020b) to practical applications. Furthermore, with respect to experimental contributions, this article provides
new ablation studies that are associated with the new methods. Most importantly, exhaustive outdoor experiments for
manipulation tasks are further performed to examine the benefits of the proposed VR based concept over extended
durations and characterize its technical readiness for industrial applications.
2 Related Work
The proposed VR based concept advances the area of VR interfaces for robotics. The comparison of this work to
existing works is summarized in Table 1. The current literature from dierent domains of robotic research is discussed,
which are, pose estimation (Section 4.1 and 4.2), and active learning with DNNs (Section 4.3). Importantly, we stress
that this work is not to advance the state-of-the-art methods in these two areas. Rather, the aim is to apply and extend
them to realize a working system for the given industrial scenarios. For example, the provided extension of a marker
tracking algorithm with visual-inertial SLAM is not the main contribution of this paper. Lastly, we further locate our
work within the literature of aerial robotic perception in field applications.
Virtual Reality Interfaces
In the past, several VR interfaces have been widely utilized in robotics including aerial
systems (Wonsick and Padir,2020). So far, the presented approaches often create the VR either by using external
sensors such as Vicon and a-priori generated maps. Notably, Vempati et al. (2019) utilizes a-priori generated maps for
the applications of VR in aerial painting. For aerial manipulation, Yashin et al. (2019) uses Vicon system to create the
VR while Kim and Oh (2021) renders the environment with a portable sensor kit (Oh et al.,2017). Recently, many VR
techniques have gained interest in the robotic manipulation community. Therein, many works (Haidu and Beetz,2021;
Zhang et al.,2020b) let a human perform demonstration in VR, and transfer the demonstrated manipulation skills to
real robots. These works greatly show the synergy between VR and robotics. As this paper demonstrates the feasibility
of creating VR with on-board sensors only, the work can contribute to many of these works in showing how one can
create a VR for robotics.
On the contrary, many researchers aimed to provide VR of the remote scene by applying 3D reconstruction techniques
(Ni et al.,2017;Kohn et al.,2018). For example, Kohn et al. (2018) presents an approach using RGB-D camera. As the
main challenge of reconstruction based methods is the limited bandwidth in communication, Kohn et al. (2018) proposes
an object recognition pipeline, i.e., replace the detected object with sparse virtual meshes and discard the dense sensor
data. Pohl et al. (2020) uses RGB-D sensor to construct a VR for aordance based manipulation with a humanoid,
while Liu and Shen (2020) and Puljiz et al. (2020) create augmented reality for a drone and a manipulator, respectively.
Pace et al. (2021) conducts a user study, and argues that the point clouds of RGB-D sensors are noisy and inaccurate
(with artifacts), which motivates for point cloud pre-processing methods for telepresence applications (Pace et al.,2021).
In contrast, our approach is based on scene graphs (Section 3.2) with pose estimation, which is an alternative to 3D
reconstruction methods. Finally, the main novelties are illustrated in Table 1, which are the realizations of a VR based
telepresence system for outdoor environments using multiple sensors jointly. No external sensors or pre-generated maps
are used, while dealing with specific challenges of a floating-base manipulation system, i.e., the surface that holds a
robotic arm is constantly changing over time, thereby inducing motions for the attached sensors.
Object Pose Estimation
One of the crucial components in the proposed framework is object pose estimation algorithms.
This is because we utilize a scene graph representation, which requires 6D pose of the objects for creating a 3D display,
as opposed to a 3D reconstruction of the remote site. As the literature is vast, we refer to the survey (He et al.,2021) for
a comprehensive review. In this work, the main novelty is the working solutions for the considered application, which is
tailored towards realizing the proposed VR system. For this, the two scenarios are discussed below. These are visual
object pose estimation for objects of known geometry, and LiDAR based method for unknown geometry.
If the object is known and accessible a-priori, one of the robust solutions is to use fidicual marker systems. Fidicual
markers, which create artificial features on the scene for pose estimation, are widely used in robotics. The use-
cases are for creating the ground truths (Wang and Olson,2016), where environments are known (Malyuta et al.,
2020), for simplifying the problem in lieu of sophisticated perception (Laiacker et al.,2016), and also calibration
and mapping (Nissler et al.,2018). However, as the herein aim is on real-time VR creation, this use-case demands
stringent requirements on their limitations in run-time, inherent time-delays and robustness. Therefore, an extension of
ARToolKitPlus is provided (Wagner and Schmalstieg,2007) with an on-board visual-inertial SLAM system.
For LiDAR, point cloud registration is often used for pose estimation. By finding the transformation between the
current scans and a CAD model of an object, we can obtain 6D pose of an object. Broadly, point cloud registration
algorithms can be classified as local (Park et al.,2017;Rusinkiewicz and Levoy,2001;Besl and McKay,1992) or
global (Zhou et al.,2016), and model based (Pomerleau et al.,2015) or learning based (Wang and Solomon,2019;
Zhang et al.,2020a). As CAD models of objects are often not available in the given industrial scenario, a DNN based
detector and the idea of LOAM with pose graphs are combined, in order to obtain robust object pose estimates that cope
with occlusions, moving parts and view point variations in the scene.
Active Learning for Neural Networks
The motivations are the considerations of field robotic applications of DNN
based object detectors. Here, the need for labeled data can cause overhead in development processes, especially while
considering a long-term deployment of learning systems in outdoor environments. For example, weather conditions can
change depending on seasons, and we need to eciently create labeled data. Active learning provides a principled way
to reduce manual annotations by explicitly picking data that are worth being labeled. One way to autonomously generate
the ”worth” of an unlabeled sample is to use uncertainty of DNNs. In the past, for robot perception, we find active
learning frameworks using random forests, Gaussian processes, etc (Narr et al.,2016;Mund et al.,2015) while for
DNNs, MacKay (1992) pioneered an active learning approach based on Bayesian Neural Networks, i.e., a probabilistic
or stochastic DNN (Gawlikowski et al.,2021), which oers a principled method for uncertainty quantification. Recent
works can also be found on active learning for DNN based object detectors (Choi et al.,2021;Aghdam et al.,2019),
where the focus is on adaptations of active learning to existing object detection frameworks. These include new
acquisition functions (or selection criteria) and how uncertainty estimates are generated.
For uncertainty quantification in DNNs, so-called Monte-Carlo dropout (MC-dropout Gal and Ghahramani (2016)) has
gained popularity recently. The main advantage of MC-dropout is that it is relatively easy to use and scale to large
data-set. However, MC-dropout requires a specific stochastic regularization called dropout (Srivastava et al.,2014).
This limits its use on already well trained architectures, because the current DNN based object detectors are often
摘要:

VirtualRealityviaObjectPoseEstimationandActiveLearning:RealizingTelepresenceRobotswithAerialManipulationCapabilitiesJongseokLee1,2,*,RibinBalachandran1,KonstantinKondak1,AndreCoelho1,3,MarcoDeStefano1,MatthiasHumt1,JianxiangFeng1,TamimAsfour2andRudolphTriebel1,41InstituteofRoboticsandMechatronics,Ge...

展开>> 收起<<
Virtual Reality via Object Pose Estimation and Active Learning Realizing Telepresence Robots with Aerial Manipulation Capabilities.pdf

共43页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:43 页 大小:4.47MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 43
客服
关注