Trixi the Librarian Fabian Wieczorek dept. informatik TAMS

2025-05-06 0 0 5.44MB 10 页 10玖币

侵权投诉

Trixi the Librarian

Fabian Wieczorek

dept. informatik, TAMS

University of Hamburg

Hamburg, Germany

fabian.wieczorek@uni-hamburg.de

Bj¨

orn Sygo

dept. informatik, TAMS

University of Hamburg

Hamburg, Germany

bjoern.sygo@uni-hamburg.de

Shang-Ching Liu

dept. informatik, TAMS

University of Hamburg

Hamburg, Germany

shang-ching.liu@studium.uni-hamburg.de

Mykhailo Koshil

dept. informatik, TAMS

University of Hamburg

Hamburg, Germany

mykhailo.koshil@studium.uni-hamburg.de

Abstract—(Fabian) In this work, we present a three-part

system that automatically sorts books on a shelf using the PR-

2 platform. The paper describes a methodology to sufﬁciently

detect and recognize books using a multistep vision pipeline

based on deep learning models as well as conventional computer

vision. Furthermore, the difﬁculties of relocating books using a

bi-manual robot along with solutions based on MoveIt and BioIK

are being addressed. Experiments show that the performance

is overall good enough to repeatedly sort three books on a

shelf. Nevertheless, further improvements are being discussed,

potentially leading to a more robust book recognition and more

versatile manipulation techniques.

Index Terms—Librarian, Robot, Book grasping, Object detec-

tion, Implementation

I. INTRODUCTION (MYKHAILO)

While the use of industrial robots is already widespread, the

use of service robots remains limited [32]. There are a few

reasons for this. First, implementing the use of the service

robot in the business requires not only the acquisition of

the robot itself but also leads to changes in the organization

like training the personnel, adapting the environment for the

robot, etc. Second, while the industrial robot is used in a

highly structured environment, the use case of the service

robot implies working alongside humans, thus having less

structure and more unseen situations. This leads to the high

complexity of design, and as a result service robots often fail

and require human intervention, which puts their commercial

use in question.

Therefore the main purpose of this work is rather show-

casing and testing the feasibility of automating the work of a

librarian on the task of manipulating the books on the shelf,

rather than creating a commercially viable product. As the

original plan included interaction with the visitors, this puts

our system in the category of ‘Professional Social Service

Robots‘ according to the [16]. In this work, we aim at creating

a service robot that will work in the library, so it can be

deployed on the site without major changes to accommodate

the robot.

The project is based on the PR-2 platform that is available

in our department Fig. 2, and which is suitable for bi-manual

manipulation. To tackle the task of book manipulation was

divided into two sub-tasks: manipulation and perception. And

while both of these tasks were solved to some extent, the

main contribution of this work is combining them in form of

a librarian robot that can operate in the library environment

with as few modiﬁcations as possible.

II. RELATED WORKS

A. Book manipulation (Bj¨

orn / Mykhailo)

There already exist different concepts on librarian robots.

In example UJI librarian robot [26], [25]. It utilizes a single

Mitsubishi PA-10 arm mounted on a mobile base. It can move

around the library, locate the wanted shelf and book, and re-

trieve it using a specially customized two-ﬁnger gripper. Also,

experiments have been made using the UJI robot equipped

with a three-ﬁnger gripper to grasp books, using tactile sensors

[18]. The motion planning for the book tilting is investigated

further in [24]. Here, the authors develop a probabilistic

motion planning algorithm that allows for planning in low-

dimensional sub-manifolds, created by the constraints in the

planning space. The developed planner was then tested in a

scenario similar to ours and using MoveIt [5] framework.

A lot of approaches rely on environment modiﬁcation in

order to facilitate the robot’s functioning, mostly to solve the

navigation and object detection. For example, a robot that

was designed to work in a highly structured environment,

where it would pick and arrange books [15]. It utilizes a

two-ﬁnger gripper to pick the books. The robot relies heav-

ily on landmarks in the environment, utilizing a ﬂoor with

radio-frequency identiﬁcation (RFID) tags to navigate and

an intelligent bookshelf to locate the books. Recent work

[37] explores the librarian scenario using the robot similar

to [25], but focuses on navigation and position using QR code

and binocular vision, rather than manipulation. So the book

manipulation is done by using a special parallel gripper and a

predeﬁned position for placing.

arXiv:2210.10110v2 [cs.RO] 20 Oct 2022

Some works focus primarily on manipulation. In [19], the

authors investigate the use of a bi-manual setup with a suction

gripper, in a setting similar to ours and train the fully con-

nected network to predict which object to support with a non-

suction gripper for the safe extraction of the selected object.

The network is trained to perform in an environment similar

to ours, i.e. bookshelf. Other works solve the book grasping

outside of the library environment. One example would be

[20], which utilized a combination between suction and a two-

ﬁnger gripper for grasping books in different conﬁgurations.

There is no readily available solution to our knowledge, that

would automate book manipulation in the library environment

similar to ours.

B. Perception (Shang-Ching Liu / Fabian)

In the perception related tasks, we try to model the real-time

books in the scene and furthermore matching individual book

to the known book database. Thus, we dig into each part for

previous achievement.

For detection method like YOLO [27] or Fast r-cnn [9]

are two main method directions in state-of-art, the YOLO is

more efﬁcient with bounding box output and Fast r-cnn gives

precise segmentation result. The evolution model of YOLO —

YOLO-v5 [14] have well documentation and robust pipeline

and utilities such as Roboﬂow [28] to proceed ﬁne-tuning

technique to extract the book spine, which we choose as the

approach for book spine detection.

For book matching there are SIFT [23] to ﬁnd the key-

point of the picture, HSV (for hue, saturation, value) [36]

histogram to understand the color encoding, fuzzywoozy to

measure the text similarity between detection text with book

title in database.

Inventory Management in a library is a tedious task that

has been tried to automize in the past decade. Book spines

standing on the shelf were attempted to be detected and rec-

ognized using different computer vision methods without the

aid of special markers. A frequently seen approach to detect

book spines is to use edge detection along with further line

segment processing [3], [6], [22], [34]. Often, an orthographic

representation of the book spines is required. To detect the

spines independently from the viewpoint, Talker et al. used

a constrained active contour model allowing the spines to be

non-parallel to the image axis [35].

The detection part is crucial to ﬁnd book spine candidates,

but recognizing them correctly plays an evenly important role

in inventory management. While many approaches focus on

text recognition to identify the book spines [3], [6], [22], [34],

Fowers et al. made use of difference of gaussian (DOG) over

the YCbCr color space to extract features [7]. In combination

with SIFT, this approach does not depend on an OCR engine

(e.g. Tesseract [31]) while yielding robust performance.

Comparing the results of the mentioned work meaningfully

is hard as no standardized way in the ﬁeld of book spine

recognition exists. However, in the domain of scene text

recognition (STR), which can be utilized for text-based book

spine recognition, a framework was developed by Baek et al.

to allow comparison of different model architectures [1]. Since

deep learning methods have generally not been widely applied

in book spine recognition, this work tries to incorporate such

an STR model to perform text matching.

III. SYSTEM OVERVIEW (SHANG-CHING LIU)

The central system can be separated into three parts, includ-

ing Manipulation, vision pipeline, and task planning, as shown

in ﬁgure 1. Task planning module controlling vision pipeline

module and Manipulation module. The Vision pipeline takes

the RGBD camera (Azure Kinect) scene as input and matches

the books in the scene to the books database, and ﬁnally

creates a MoveIt Planning Scene in visualization. The Ma-

nipulation has a controller to control two hands of the Robot

(PR-2). One is combined with Shadow hands, and another is

combined with a two ﬁngers gripper, as shown in the ﬁgure

2j.

Fig. 1: System Overview

IV. PERCEPTION (FABIAN / SHANG-CHING LIU)

A. Preprocessing (Fabian)

The camera of the PR-2 is located on top of its head. This

will create a perspective projection of the shelf (ﬁg. 3, left;

ﬁg. 5, right) making book spine detection more challenging

since the edges tend to be not aligned with the image axis.

To automatically mitigate this problem without rearranging the

hardware, a perspective transformation is applied to the image

twice as shown in ﬁg. 3. For each shelf level, the corners

(red/blue dots) are determined using the AprilTags known pose

along with offsets matching the shelfs dimensions. The 3D

points are then projected onto the image to be used as anchor

points for the transformation. The results are two images with

the book spine edges being aligned with the image axes (ﬁg. 3,

right).

To project back from the corrected images to the original

image, the inverse projection matrix is also computed and will

be used later on.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TrixitheLibrarianFabianWieczorekdept.informatik,TAMSUniversityofHamburgHamburg,Germanyfabian.wieczorek@uni-hamburg.deBj¨ornSygodept.informatik,TAMSUniversityofHamburgHamburg,Germanybjoern.sygo@uni-hamburg.deShang-ChingLiudept.informatik,TAMSUniversityofHamburgHamburg,Germanyshang-ching.liu@studium.u...

展开>> 收起<<

Trixi the Librarian Fabian Wieczorek dept. informatik TAMS.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Trixi the Librarian Fabian Wieczorek dept. informatik TAMS

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: