Robot to Human Object Handover using Vision and Joint Torque Sensor Modalities Mohammadhadi Mohandes1 Behnam Moradi2Kamal Gupta1 and Mehran

2025-05-03 0 0 1.28MB 12 页 10玖币

侵权投诉

Robot to Human Object Handover using Vision and

Joint Torque Sensor Modalities

Mohammadhadi Mohandes1, Behnam Moradi2Kamal Gupta1, and Mehran

Mehrandezh2

1School of Engineering Science, Simon Fraser University, Canada,

mmohande,Kamal@sfu.edu

2Faculty of Engineering and Applied science, University of Regina, Canada,

bmn891,mehran.mehrandezh@uregina.ca

Abstract. We present a robot-to-human object handover algorithm and imple-

ment it on a 7-DOF arm equipped with a 3-ﬁnger mechanical hand. The system

performs a fully autonomous and robust object handover to a human receiver

in real-time. Our algorithm relies on two complementary sensor modalities: joint

torque sensors on the arm and an eye-in-hand RGB-D camera for sensor feedback.

Our approach is entirely implicit, i.e., there is no explicit communication between

the robot and the human receiver. Information obtained via the aforementioned

sensor modalities are used as inputs to their related deep neural networks. While

the torque sensor network detects the human receiver’s “intention” such as: pull,

hold, or bump, the vision sensor network detects if the receiver’s ﬁngers have

wrapped around the object. Networks’ outputs are then fused, based on which a

decision is made to either release the object or not. Despite substantive challenges

in sensor feedback synchronization, object and human hand detection, our system

achieves robust robot-to-human handover with 98% accuracy in our preliminary

real experiments using human receivers.

Keywords: Robot-to-human object handover, object detection, Human-Robot

Interaction

1 Introduction

Human-Robot Interaction (HRI) is a wide and diverse area of research. A robot-to-

human (R2H) handover task, as a sub-topic of HRI, is deﬁned as a mission of trans-

ferring an object from a giver (an autonomous robotic system) to a receiver (a human

operator). Our focus is on direct R2H tasks where the robot directly delivers the object

in human receiver’s hand. A successful object handover mission happens when the giver

makes sure that the receiver has fully taken possession of the object and feels safe to

let go of the object. Failure in R2H object handover often occurs when there is a wrong

interpretation (by the giver) of the actions applied on the object by the receiver. Hence,

the problem of failure detection plays an important role in a successful handover and

requires detecting human “intention” accurately [1]. Safety, reliability, and robustness

of the object handover therefore depend directly on the sensor modalities such as vision,

force/torque, tactile, and their respective interpretation. In particular, the physical con-

tact phase between the receiver and the object poses a serious challenge. Early releasing

is considered to be a safety challenge while late releasing can cause higher interaction

forces [2]. In this paper, we use two key sensor modalities, joint torque sensors of the

arm and an eye-in-hand RGB-D camera for accurate determination of when the robotic

hand should release the object, hence resulting in a robust R2H handover. More specif-

ically, the joint torque time series data is used to train a CNN network that predicts

the receiver’s action/intention (pull, pull-up, push, bump, hold, and no action) during

the contact phase, and a second network, a Single Shot multibox Detector (SSD) [3]

to detect ﬁngertips and objects in real-time in order to robustly determine the physical

contact between the human receiver’s hand and the object. The outputs of the two CNNs

are then fed to a ﬁnite state machine, that in essence, results in a release command only

arXiv:2210.15085v1 [cs.RO] 27 Oct 2022

if the vision pipeline detects contact between the receiver’s ﬁngers and the object, and

the torque pipeline detects a pull, pull-up, and hold. Our initial experimental results

with human receivers are extremely positive showing a 98% success rate in R2H tasks.

While joint force/torque sensors have been used in previous works on R2H handover

tasks [ [4] and [5]], and have been combined with a specialized simple optical sensor,

designed speciﬁcally to detect object motion [ [6] and [7]]3, the key contributions of our

work are: i) we use joint torque sensors’ data in a novel way, i.e., we use a time series

of joint torques to detect human receiver’s action/intention for R2H handover tasks,

(ii) we use an eye-in-hand RGB-D camera and detect ﬁnger contacts with the object

in real-time (30fps), and iii) to combine i) and ii) as an algorithmic fusion approach to

make a robust RELEASE decision. Our preliminary real experiments with human re-

ceivers show a 98% success rate. We also compare our method’s success rate with some

existing R2H systems [8–12] that have used success rate as an evaluation metric. Please

note that some other works report human satisfaction surveys to evaluate R2H systems,

e.g., [7, 13], which is diﬀerent than the success rate metric that we report.

The rest of this paper is organized as follows: Section II presents a comprehensive

literature review on vision-based and force/torque-based object handover. Section III

presents the methodology and the algorithmic foundations of our work. Section IV shows

the experimental results. Finally, section V presents conclusions and future works.

2 Related Work

A key challenge in robot-to-human object handover (R2H), unlike robot-to-robot han-

dover (R2R), as we mentioned earlier in the introduction is that there is no real-time

sensor data exchanging between the human receiver and the robot other than onboard

sensors of the robot. The robot (in our case, a 7-DoF Gen 3 Kinova arm) with a 3-ﬁngered

mechanical hand (Schunk SDH) has two sensor modalities: i) joint torque sensors and

ii) an eye-in-hand RGB-D camera. Therefore, we have focused on these two modalities

in our current work and our literature review below focuses on R2H handover works

that use one or both of these two modalities to understand the intention of the human

receiver in R2H handover tasks. The research community has attacked this challenge

using two main approaches: Vision-based and force/torque based. We ﬁrst outline some

general vision-based approaches from the machine vision community and then present

the R2H literature.

2.1 Vision-based computations in general

In the machine vision community, vision data has been used in a variety of ways -

detecting human gaze, human body conﬁguration, human hand, and object detection.

A key requirement in R2H tasks is to accomplish this in real-time. Detecting human’s

hand and the object in real-time is investigated in [14], [15], and [16]. Human body

tracking and its related pose with respect to the object is investigated in [17] and [18].

From the perception perspective, Single Shot multibox Detector (SSD), a CNN-based

network architecture that was introduced by Lio et al. [3] is particularly appealing for

object detection for R2H tasks and we adapt it for our application along with a bounding

box regression algorithm has been taken from Googles Inception Network. The SSD

network combined with bounding box regression is able to outperform Faster R-CNN

(another competing neural network-based architecture for object detection) in accuracy

and in speed to obtain 59+ fps. SSD is capable of detecting multiple objects. Unlike

R-CNN methods, it propagates the feature map in one forward pass throughout the

network. This is the main reason that SSD is able to operate in real-time and handle

object overlap in the data points. SSD uses a pre-trained network as a basic net which is

3In fact, that work used a specialized simple optical sensor precisely because they mention the

unacceptable amount of computation time that would be taken for processing RGB images,

a problem that we solve via the use of SSD network.

trained on the ImageNet dataset. There are multiple convolutional layers and each one of

them is individually and directly connected to the fully connected layer. A combination

of SSD and bounding box regression allows our network to detect multiple ﬁngertips

with diﬀerent scales [3].

2.2 Vision-Based Object Handover

Real-time object detection and pose estimation is a challenging problem associated with

vision-based object handover [19]. To address this problem, minimum jerk trajectory

algorithm is used to predict the receiver’s intention [20]. Gaussian process is also used

to estimate human motion in object handover scenarios. Gaussian mixture regressor is

proposed by Lue et al. to predict human body motion while trying to receive the object

from the robot [21]. The receiver’s hand pose estimation can also be used to estimate

the approaching speed of the hand [22].

Vision-based approaches fundamentally utilize classical computer vision and more

recently, deep-learning techniques to detect the object using an RGB-D camera, as

mentioned above in Section 2.1. The output of vision-based techniques is mainly detected

bounding boxes around the object in RGB image and estimated pose of the object in

the point cloud.

Strabala et al. [23] reported a comprehensive investigation of robot-to-robot and

robot-to-human object handover scenarios. The ﬁrst scenario is to understand the way

humans exchange physical objects followed by recording the physical behavior of both

the giver and the receiver. The next scenario was to codify the human-to-human object

handover in order to implement it as a robot-to-human object handover algorithm. In

this scenario, when the robot is the giver and the human operator is the receiver, the

robot should go through three crucial steps of detecting a human’s body, eye gaze,

and hands to conﬁrm that the human operator is ready to receive the object. In our

case, since the camera is wrist mounted (eye-in-hand), the ﬁngertips and object are the

natural choices to be detected.

Grigore et al. [10] focused on adding eye gaze detection and head orientation as

a user intention model to an HMM-based R2H object handover. Their results demon-

strate signiﬁcant improvements in object handover success rate by integrating this vision

sensor-based feedback into the robots control system. The use of eye gaze has been also

promoted in [8] where R2H handover has been tested on robots distributing ﬂyers to

uncooperative passing pedestrians to make a better successful ratio. Koene et al. [11]

and Prada et al. [12] developed a color segmentation method with a Kinect sensor to

estimate the human’s hand location during the handover process. [9] compared the suc-

cess rate between direct delivery and indirect delivery, implemented on an EL-E robot

equipped with a force/torque sensor and a laser-pointer interface with a camera detect-

ing the 3d location of handover. Sileo et al. [4] developed a vision-based robot-to-robot

object handover algorithm that was able to go through predetermined steps to deliver a

known object from one robot to another without using explicit communication between

two agents. Other than visual data, they also incorporated force/torque data from both

end eﬀectors to increase the robustness of the handover process. For object detection,

Faster-RCNN and YOLO deep learning networks were utilized to detect objects in real-

time. A key distinction in our work is that we utilized data from all joint-level torque

sensors to classify the receiver’s intention while performing the handover mission. In ad-

dition, it is conjectured that better results can be obtained by fusing vision and torque

data in order to perform a robust and safe object handover mission. However, when it

comes to object detection, sensor data synchronization poses a serious challenge, be-

cause image processing is normally slow. Our SSD ﬁngertips detection performs at 59+

fps and is more accurate than YOLO and Faster-RCNN, thereby meeting the sensor

synchronization challenge.

2.3 Force/torque sensor based Object Handover

In [6] a handover controller is proposed and implemented on a 2-ﬁnger gripper installed

on Baxter robot. A key novelty of the system is that it re-grasps a fast slipping object via

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RobottoHumanObjectHandoverusingVisionandJointTorqueSensorModalitiesMohammadhadiMohandes1,BehnamMoradi2KamalGupta1,andMehranMehrandezh21SchoolofEngineeringScience,SimonFraserUniversity,Canada,mmohande,Kamal@sfu.edu2FacultyofEngineeringandAppliedscience,UniversityofRegina,Canada,bmn891,mehran.mehrande...

展开>> 收起<<

Robot to Human Object Handover using Vision and Joint Torque Sensor Modalities Mohammadhadi Mohandes1 Behnam Moradi2Kamal Gupta1 and Mehran.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Robot to Human Object Handover using Vision and Joint Torque Sensor Modalities Mohammadhadi Mohandes1 Behnam Moradi2Kamal Gupta1 and Mehran

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: