Trixi the Librarian Fabian Wieczorek dept. informatik TAMS

2025-05-06 0 0 5.44MB 10 页 10玖币
侵权投诉
Trixi the Librarian
Fabian Wieczorek
dept. informatik, TAMS
University of Hamburg
Hamburg, Germany
fabian.wieczorek@uni-hamburg.de
Bj¨
orn Sygo
dept. informatik, TAMS
University of Hamburg
Hamburg, Germany
bjoern.sygo@uni-hamburg.de
Shang-Ching Liu
dept. informatik, TAMS
University of Hamburg
Hamburg, Germany
shang-ching.liu@studium.uni-hamburg.de
Mykhailo Koshil
dept. informatik, TAMS
University of Hamburg
Hamburg, Germany
mykhailo.koshil@studium.uni-hamburg.de
Abstract—(Fabian) In this work, we present a three-part
system that automatically sorts books on a shelf using the PR-
2 platform. The paper describes a methodology to sufficiently
detect and recognize books using a multistep vision pipeline
based on deep learning models as well as conventional computer
vision. Furthermore, the difficulties of relocating books using a
bi-manual robot along with solutions based on MoveIt and BioIK
are being addressed. Experiments show that the performance
is overall good enough to repeatedly sort three books on a
shelf. Nevertheless, further improvements are being discussed,
potentially leading to a more robust book recognition and more
versatile manipulation techniques.
Index Terms—Librarian, Robot, Book grasping, Object detec-
tion, Implementation
I. INTRODUCTION (MYKHAILO)
While the use of industrial robots is already widespread, the
use of service robots remains limited [32]. There are a few
reasons for this. First, implementing the use of the service
robot in the business requires not only the acquisition of
the robot itself but also leads to changes in the organization
like training the personnel, adapting the environment for the
robot, etc. Second, while the industrial robot is used in a
highly structured environment, the use case of the service
robot implies working alongside humans, thus having less
structure and more unseen situations. This leads to the high
complexity of design, and as a result service robots often fail
and require human intervention, which puts their commercial
use in question.
Therefore the main purpose of this work is rather show-
casing and testing the feasibility of automating the work of a
librarian on the task of manipulating the books on the shelf,
rather than creating a commercially viable product. As the
original plan included interaction with the visitors, this puts
our system in the category of ‘Professional Social Service
Robots‘ according to the [16]. In this work, we aim at creating
a service robot that will work in the library, so it can be
deployed on the site without major changes to accommodate
the robot.
The project is based on the PR-2 platform that is available
in our department Fig. 2, and which is suitable for bi-manual
manipulation. To tackle the task of book manipulation was
divided into two sub-tasks: manipulation and perception. And
while both of these tasks were solved to some extent, the
main contribution of this work is combining them in form of
a librarian robot that can operate in the library environment
with as few modifications as possible.
II. RELATED WORKS
A. Book manipulation (Bj¨
orn / Mykhailo)
There already exist different concepts on librarian robots.
In example UJI librarian robot [26], [25]. It utilizes a single
Mitsubishi PA-10 arm mounted on a mobile base. It can move
around the library, locate the wanted shelf and book, and re-
trieve it using a specially customized two-finger gripper. Also,
experiments have been made using the UJI robot equipped
with a three-finger gripper to grasp books, using tactile sensors
[18]. The motion planning for the book tilting is investigated
further in [24]. Here, the authors develop a probabilistic
motion planning algorithm that allows for planning in low-
dimensional sub-manifolds, created by the constraints in the
planning space. The developed planner was then tested in a
scenario similar to ours and using MoveIt [5] framework.
A lot of approaches rely on environment modification in
order to facilitate the robot’s functioning, mostly to solve the
navigation and object detection. For example, a robot that
was designed to work in a highly structured environment,
where it would pick and arrange books [15]. It utilizes a
two-finger gripper to pick the books. The robot relies heav-
ily on landmarks in the environment, utilizing a floor with
radio-frequency identification (RFID) tags to navigate and
an intelligent bookshelf to locate the books. Recent work
[37] explores the librarian scenario using the robot similar
to [25], but focuses on navigation and position using QR code
and binocular vision, rather than manipulation. So the book
manipulation is done by using a special parallel gripper and a
predefined position for placing.
arXiv:2210.10110v2 [cs.RO] 20 Oct 2022
Some works focus primarily on manipulation. In [19], the
authors investigate the use of a bi-manual setup with a suction
gripper, in a setting similar to ours and train the fully con-
nected network to predict which object to support with a non-
suction gripper for the safe extraction of the selected object.
The network is trained to perform in an environment similar
to ours, i.e. bookshelf. Other works solve the book grasping
outside of the library environment. One example would be
[20], which utilized a combination between suction and a two-
finger gripper for grasping books in different configurations.
There is no readily available solution to our knowledge, that
would automate book manipulation in the library environment
similar to ours.
B. Perception (Shang-Ching Liu / Fabian)
In the perception related tasks, we try to model the real-time
books in the scene and furthermore matching individual book
to the known book database. Thus, we dig into each part for
previous achievement.
For detection method like YOLO [27] or Fast r-cnn [9]
are two main method directions in state-of-art, the YOLO is
more efficient with bounding box output and Fast r-cnn gives
precise segmentation result. The evolution model of YOLO —
YOLO-v5 [14] have well documentation and robust pipeline
and utilities such as Roboflow [28] to proceed fine-tuning
technique to extract the book spine, which we choose as the
approach for book spine detection.
For book matching there are SIFT [23] to find the key-
point of the picture, HSV (for hue, saturation, value) [36]
histogram to understand the color encoding, fuzzywoozy to
measure the text similarity between detection text with book
title in database.
Inventory Management in a library is a tedious task that
has been tried to automize in the past decade. Book spines
standing on the shelf were attempted to be detected and rec-
ognized using different computer vision methods without the
aid of special markers. A frequently seen approach to detect
book spines is to use edge detection along with further line
segment processing [3], [6], [22], [34]. Often, an orthographic
representation of the book spines is required. To detect the
spines independently from the viewpoint, Talker et al. used
a constrained active contour model allowing the spines to be
non-parallel to the image axis [35].
The detection part is crucial to find book spine candidates,
but recognizing them correctly plays an evenly important role
in inventory management. While many approaches focus on
text recognition to identify the book spines [3], [6], [22], [34],
Fowers et al. made use of difference of gaussian (DOG) over
the YCbCr color space to extract features [7]. In combination
with SIFT, this approach does not depend on an OCR engine
(e.g. Tesseract [31]) while yielding robust performance.
Comparing the results of the mentioned work meaningfully
is hard as no standardized way in the field of book spine
recognition exists. However, in the domain of scene text
recognition (STR), which can be utilized for text-based book
spine recognition, a framework was developed by Baek et al.
to allow comparison of different model architectures [1]. Since
deep learning methods have generally not been widely applied
in book spine recognition, this work tries to incorporate such
an STR model to perform text matching.
III. SYSTEM OVERVIEW (SHANG-CHING LIU)
The central system can be separated into three parts, includ-
ing Manipulation, vision pipeline, and task planning, as shown
in figure 1. Task planning module controlling vision pipeline
module and Manipulation module. The Vision pipeline takes
the RGBD camera (Azure Kinect) scene as input and matches
the books in the scene to the books database, and finally
creates a MoveIt Planning Scene in visualization. The Ma-
nipulation has a controller to control two hands of the Robot
(PR-2). One is combined with Shadow hands, and another is
combined with a two fingers gripper, as shown in the figure
2j.
Fig. 1: System Overview
IV. PERCEPTION (FABIAN / SHANG-CHING LIU)
A. Preprocessing (Fabian)
The camera of the PR-2 is located on top of its head. This
will create a perspective projection of the shelf (fig. 3, left;
fig. 5, right) making book spine detection more challenging
since the edges tend to be not aligned with the image axis.
To automatically mitigate this problem without rearranging the
hardware, a perspective transformation is applied to the image
twice as shown in fig. 3. For each shelf level, the corners
(red/blue dots) are determined using the AprilTags known pose
along with offsets matching the shelfs dimensions. The 3D
points are then projected onto the image to be used as anchor
points for the transformation. The results are two images with
the book spine edges being aligned with the image axes (fig. 3,
right).
To project back from the corrected images to the original
image, the inverse projection matrix is also computed and will
be used later on.
摘要:

TrixitheLibrarianFabianWieczorekdept.informatik,TAMSUniversityofHamburgHamburg,Germanyfabian.wieczorek@uni-hamburg.deBj¨ornSygodept.informatik,TAMSUniversityofHamburgHamburg,Germanybjoern.sygo@uni-hamburg.deShang-ChingLiudept.informatik,TAMSUniversityofHamburgHamburg,Germanyshang-ching.liu@studium.u...

展开>> 收起<<
Trixi the Librarian Fabian Wieczorek dept. informatik TAMS.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:5.44MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注