Robot to Human Object Handover using Vision and Joint Torque Sensor Modalities Mohammadhadi Mohandes1 Behnam Moradi2Kamal Gupta1 and Mehran

2025-05-03 0 0 1.28MB 12 页 10玖币
侵权投诉
Robot to Human Object Handover using Vision and
Joint Torque Sensor Modalities
Mohammadhadi Mohandes1, Behnam Moradi2Kamal Gupta1, and Mehran
Mehrandezh2
1School of Engineering Science, Simon Fraser University, Canada,
mmohande,Kamal@sfu.edu
2Faculty of Engineering and Applied science, University of Regina, Canada,
bmn891,mehran.mehrandezh@uregina.ca
Abstract. We present a robot-to-human object handover algorithm and imple-
ment it on a 7-DOF arm equipped with a 3-finger mechanical hand. The system
performs a fully autonomous and robust object handover to a human receiver
in real-time. Our algorithm relies on two complementary sensor modalities: joint
torque sensors on the arm and an eye-in-hand RGB-D camera for sensor feedback.
Our approach is entirely implicit, i.e., there is no explicit communication between
the robot and the human receiver. Information obtained via the aforementioned
sensor modalities are used as inputs to their related deep neural networks. While
the torque sensor network detects the human receiver’s “intention” such as: pull,
hold, or bump, the vision sensor network detects if the receiver’s fingers have
wrapped around the object. Networks’ outputs are then fused, based on which a
decision is made to either release the object or not. Despite substantive challenges
in sensor feedback synchronization, object and human hand detection, our system
achieves robust robot-to-human handover with 98% accuracy in our preliminary
real experiments using human receivers.
Keywords: Robot-to-human object handover, object detection, Human-Robot
Interaction
1 Introduction
Human-Robot Interaction (HRI) is a wide and diverse area of research. A robot-to-
human (R2H) handover task, as a sub-topic of HRI, is defined as a mission of trans-
ferring an object from a giver (an autonomous robotic system) to a receiver (a human
operator). Our focus is on direct R2H tasks where the robot directly delivers the object
in human receiver’s hand. A successful object handover mission happens when the giver
makes sure that the receiver has fully taken possession of the object and feels safe to
let go of the object. Failure in R2H object handover often occurs when there is a wrong
interpretation (by the giver) of the actions applied on the object by the receiver. Hence,
the problem of failure detection plays an important role in a successful handover and
requires detecting human “intention” accurately [1]. Safety, reliability, and robustness
of the object handover therefore depend directly on the sensor modalities such as vision,
force/torque, tactile, and their respective interpretation. In particular, the physical con-
tact phase between the receiver and the object poses a serious challenge. Early releasing
is considered to be a safety challenge while late releasing can cause higher interaction
forces [2]. In this paper, we use two key sensor modalities, joint torque sensors of the
arm and an eye-in-hand RGB-D camera for accurate determination of when the robotic
hand should release the object, hence resulting in a robust R2H handover. More specif-
ically, the joint torque time series data is used to train a CNN network that predicts
the receiver’s action/intention (pull, pull-up, push, bump, hold, and no action) during
the contact phase, and a second network, a Single Shot multibox Detector (SSD) [3]
to detect fingertips and objects in real-time in order to robustly determine the physical
contact between the human receiver’s hand and the object. The outputs of the two CNNs
are then fed to a finite state machine, that in essence, results in a release command only
arXiv:2210.15085v1 [cs.RO] 27 Oct 2022
if the vision pipeline detects contact between the receiver’s fingers and the object, and
the torque pipeline detects a pull, pull-up, and hold. Our initial experimental results
with human receivers are extremely positive showing a 98% success rate in R2H tasks.
While joint force/torque sensors have been used in previous works on R2H handover
tasks [ [4] and [5]], and have been combined with a specialized simple optical sensor,
designed specifically to detect object motion [ [6] and [7]]3, the key contributions of our
work are: i) we use joint torque sensors’ data in a novel way, i.e., we use a time series
of joint torques to detect human receiver’s action/intention for R2H handover tasks,
(ii) we use an eye-in-hand RGB-D camera and detect finger contacts with the object
in real-time (30fps), and iii) to combine i) and ii) as an algorithmic fusion approach to
make a robust RELEASE decision. Our preliminary real experiments with human re-
ceivers show a 98% success rate. We also compare our method’s success rate with some
existing R2H systems [8–12] that have used success rate as an evaluation metric. Please
note that some other works report human satisfaction surveys to evaluate R2H systems,
e.g., [7, 13], which is different than the success rate metric that we report.
The rest of this paper is organized as follows: Section II presents a comprehensive
literature review on vision-based and force/torque-based object handover. Section III
presents the methodology and the algorithmic foundations of our work. Section IV shows
the experimental results. Finally, section V presents conclusions and future works.
2 Related Work
A key challenge in robot-to-human object handover (R2H), unlike robot-to-robot han-
dover (R2R), as we mentioned earlier in the introduction is that there is no real-time
sensor data exchanging between the human receiver and the robot other than onboard
sensors of the robot. The robot (in our case, a 7-DoF Gen 3 Kinova arm) with a 3-fingered
mechanical hand (Schunk SDH) has two sensor modalities: i) joint torque sensors and
ii) an eye-in-hand RGB-D camera. Therefore, we have focused on these two modalities
in our current work and our literature review below focuses on R2H handover works
that use one or both of these two modalities to understand the intention of the human
receiver in R2H handover tasks. The research community has attacked this challenge
using two main approaches: Vision-based and force/torque based. We first outline some
general vision-based approaches from the machine vision community and then present
the R2H literature.
2.1 Vision-based computations in general
In the machine vision community, vision data has been used in a variety of ways -
detecting human gaze, human body configuration, human hand, and object detection.
A key requirement in R2H tasks is to accomplish this in real-time. Detecting human’s
hand and the object in real-time is investigated in [14], [15], and [16]. Human body
tracking and its related pose with respect to the object is investigated in [17] and [18].
From the perception perspective, Single Shot multibox Detector (SSD), a CNN-based
network architecture that was introduced by Lio et al. [3] is particularly appealing for
object detection for R2H tasks and we adapt it for our application along with a bounding
box regression algorithm has been taken from Googles Inception Network. The SSD
network combined with bounding box regression is able to outperform Faster R-CNN
(another competing neural network-based architecture for object detection) in accuracy
and in speed to obtain 59+ fps. SSD is capable of detecting multiple objects. Unlike
R-CNN methods, it propagates the feature map in one forward pass throughout the
network. This is the main reason that SSD is able to operate in real-time and handle
object overlap in the data points. SSD uses a pre-trained network as a basic net which is
3In fact, that work used a specialized simple optical sensor precisely because they mention the
unacceptable amount of computation time that would be taken for processing RGB images,
a problem that we solve via the use of SSD network.
trained on the ImageNet dataset. There are multiple convolutional layers and each one of
them is individually and directly connected to the fully connected layer. A combination
of SSD and bounding box regression allows our network to detect multiple fingertips
with different scales [3].
2.2 Vision-Based Object Handover
Real-time object detection and pose estimation is a challenging problem associated with
vision-based object handover [19]. To address this problem, minimum jerk trajectory
algorithm is used to predict the receiver’s intention [20]. Gaussian process is also used
to estimate human motion in object handover scenarios. Gaussian mixture regressor is
proposed by Lue et al. to predict human body motion while trying to receive the object
from the robot [21]. The receiver’s hand pose estimation can also be used to estimate
the approaching speed of the hand [22].
Vision-based approaches fundamentally utilize classical computer vision and more
recently, deep-learning techniques to detect the object using an RGB-D camera, as
mentioned above in Section 2.1. The output of vision-based techniques is mainly detected
bounding boxes around the object in RGB image and estimated pose of the object in
the point cloud.
Strabala et al. [23] reported a comprehensive investigation of robot-to-robot and
robot-to-human object handover scenarios. The first scenario is to understand the way
humans exchange physical objects followed by recording the physical behavior of both
the giver and the receiver. The next scenario was to codify the human-to-human object
handover in order to implement it as a robot-to-human object handover algorithm. In
this scenario, when the robot is the giver and the human operator is the receiver, the
robot should go through three crucial steps of detecting a human’s body, eye gaze,
and hands to confirm that the human operator is ready to receive the object. In our
case, since the camera is wrist mounted (eye-in-hand), the fingertips and object are the
natural choices to be detected.
Grigore et al. [10] focused on adding eye gaze detection and head orientation as
a user intention model to an HMM-based R2H object handover. Their results demon-
strate significant improvements in object handover success rate by integrating this vision
sensor-based feedback into the robots control system. The use of eye gaze has been also
promoted in [8] where R2H handover has been tested on robots distributing flyers to
uncooperative passing pedestrians to make a better successful ratio. Koene et al. [11]
and Prada et al. [12] developed a color segmentation method with a Kinect sensor to
estimate the human’s hand location during the handover process. [9] compared the suc-
cess rate between direct delivery and indirect delivery, implemented on an EL-E robot
equipped with a force/torque sensor and a laser-pointer interface with a camera detect-
ing the 3d location of handover. Sileo et al. [4] developed a vision-based robot-to-robot
object handover algorithm that was able to go through predetermined steps to deliver a
known object from one robot to another without using explicit communication between
two agents. Other than visual data, they also incorporated force/torque data from both
end effectors to increase the robustness of the handover process. For object detection,
Faster-RCNN and YOLO deep learning networks were utilized to detect objects in real-
time. A key distinction in our work is that we utilized data from all joint-level torque
sensors to classify the receiver’s intention while performing the handover mission. In ad-
dition, it is conjectured that better results can be obtained by fusing vision and torque
data in order to perform a robust and safe object handover mission. However, when it
comes to object detection, sensor data synchronization poses a serious challenge, be-
cause image processing is normally slow. Our SSD fingertips detection performs at 59+
fps and is more accurate than YOLO and Faster-RCNN, thereby meeting the sensor
synchronization challenge.
2.3 Force/torque sensor based Object Handover
In [6] a handover controller is proposed and implemented on a 2-finger gripper installed
on Baxter robot. A key novelty of the system is that it re-grasps a fast slipping object via
摘要:

RobottoHumanObjectHandoverusingVisionandJointTorqueSensorModalitiesMohammadhadiMohandes1,BehnamMoradi2KamalGupta1,andMehranMehrandezh21SchoolofEngineeringScience,SimonFraserUniversity,Canada,mmohande,Kamal@sfu.edu2FacultyofEngineeringandAppliedscience,UniversityofRegina,Canada,bmn891,mehran.mehrande...

展开>> 收起<<
Robot to Human Object Handover using Vision and Joint Torque Sensor Modalities Mohammadhadi Mohandes1 Behnam Moradi2Kamal Gupta1 and Mehran.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.28MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注