Scrape Cut Paste and Learn Automated Dataset Generation Applied to Parcel Logistics Alexander Naumann12 Felix Hertlein1 Benchun Zhou2 Laura D orr12and Kai Furmans2

2025-05-03 0 0 6.86MB 6 页 10玖币

侵权投诉

Scrape, Cut, Paste and Learn: Automated Dataset

Generation Applied to Parcel Logistics

Alexander Naumann1,2, Felix Hertlein1, Benchun Zhou2, Laura D¨

orr1,2and Kai Furmans2

Abstract—State-of-the-art approaches in computer vision heav-

ily rely on sufﬁciently large training datasets. For real-world

applications, obtaining such a dataset is usually a tedious task.

In this paper, we present a fully automated pipeline to generate

a synthetic dataset for instance segmentation in four steps. In

contrast to existing work, our pipeline covers every step from

data acquisition to the ﬁnal dataset. We ﬁrst scrape images for the

objects of interest from popular image search engines and since

we rely only on text-based queries the resulting data comprises

a wide variety of images. Hence, image selection is necessary as

a second step. This approach of image scraping and selection

relaxes the need for a real-world domain-speciﬁc dataset that

must be either publicly available or created for this purpose. We

employ an object-agnostic background removal model and com-

pare three different methods for image selection: Object-agnostic

pre-processing, manual image selection and CNN-based image

selection. In the third step, we generate random arrangements of

the object of interest and distractors on arbitrary backgrounds.

Finally, the composition of the images is done by pasting the

objects using four different blending methods. We present a case

study for our dataset generation approach by considering parcel

segmentation. For the evaluation we created a dataset of parcel

photos that were annotated automatically. We ﬁnd that (1) our

dataset generation pipeline allows a successful transfer to real

test images (Mask AP 86.2), (2) a very accurate image selection

process - in contrast to human intuition - is not crucial and

a broader category deﬁnition can help to bridge the domain

gap, (3) the usage of blending methods is beneﬁcial compared

to simple copy-and-paste. We made our full code for scraping,

image composition and training publicly available at https://a-

nau.github.io/parcel2d.

I. INTRODUCTION

Common computer vision tasks, such as instance detection

or segmentation have a tremendous potential to help the

automation of processes in many industries. For instance,

those techniques can be applied for process monitoring or

quality control [1]. However, since the object of interest

can vary widely depending on the underlying use-case, the

availability of a ready-to-use dataset of sufﬁcient size is a

common problem in practice. Manual data acquisition and

annotation is a time-consuming and costly task, which is

why synthetic datasets have become more and more popular

[2]. When training on a synthetic dataset, with the goal of

employing the trained Convolutional Neural Network (CNN)

for the real use-case application, the domain gap between the

synthetic and the real images has to be taken into account [3].

1The authors are with the FZI Research Center for Information Technology,

Karlsruhe, Germany {anaumann, hertlein, doerr}@fzi.de

2The authors are with the Institute for Material Handling and Lo-

gistics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

{benchun.zhou, kai.furmans}@kit.edu

Fig. 1: Overview of our dataset generation pipeline: (1) We

scrape images from popular image search engines. (2) We

use and compare three different methods for image selection,

i.e. basic pre-processing, manual selection and CNN-based

selection. (3) The objects of interest and the distractors are

pasted onto a randomly selected background. (4) We use

four different blending methods to ensure invariance to local

pasting artifacts as suggested by Dwibedi et al. [2].

A promising approach for quick and efﬁcient synthetic dataset

generation was presented by Dwibedi et al. [2]. They randomly

paste objects and distractors onto background images, while

using different blending methods to reduce the inﬂuence of

local pasting artifacts. We extend this approach by adding an

automated image selection pipeline as visualized in Fig. 1.

Thus, our pipeline is fully automated and the creation of

synthetic datasets is facilitated further. Note, that not only

dataset creation, but also a dataset update, e.g. after a domain

shift, is feasible.

We present our results on a case study of parcel detec-

tion. Parcel detection and segmentation is highly relevant in

industry since it can help to automate and monitor supply

chains [4]. A smoothly running supply chain is crucial for

manufacturing industries, pharmaceutical companies and far

beyond. Furthermore, having a pre-trained backbone that is a

strong feature extractor for the application use-case facilitates

the development of solutions for downstream tasks, such as

arXiv:2210.09814v1 [cs.CV] 18 Oct 2022

keypoint detection or 3D reconstruction. Our main contribu-

tions are:

•we extend [2] by adding image scraping and different

image selection methods,

•we analyze the inﬂuence of the image selection method

on the capacity for transfer learning,

•we present a real-world dataset of parcel images that is

used for evaluation, and

•we make our code publicly available, to facilitate the

generation of tailored datasets for custom domains.

The paper is organized as follows. We present related

literature in Sec. II. Subsequently, we describe our dataset

generation approach in Sec. III. The evaluation is presented

in Sec. IV and the paper concludes with Sec. V.

II. RELATED WORK

The idea of generating an artiﬁcial training dataset is

widespread, due to the high cost that incur for capturing and

annotating a tailor-made dataset for a use-case. We ﬁrst present

relevant literature regarding the creation of artiﬁcial datasets

and subsequently delve into the application area of logistics.

Artiﬁcial Dataset Generation: Artiﬁcial datasets can ei-

ther be rendered or composed. When rendering images, we can

carefully choose a desired image layout and easily generate

a multiplicity of annotations - even the ones that are very

costly to obtain, such as 3D annotations. BlenderProc [5]

is a procedural Blender1pipeline that enables photorealistic

renderings to create synthetic datasets. Examples for popular

rendered datasets include [6], [7].

In contrast to that, image datasets can also be generated

by composition. Image composition is the task of seamlessly

combining two images by cutting a foreground object from one

image and pasting it onto another image. This is an important

task in computer vision with a wide range of applications.

Niu et al. [8] present a comprehensive survey on the topic,

and we refer to them for details on applications and subtasks

included in image composition. For our work, we focus on

simple image composition and neglect effects that might make

images look unrealistic to humans, as this has proven to be

sufﬁcient for training the backbone of a neural network [9].

More explicitly, inconsistencies introduced by incompatible

colors, unreasonable illumination, mismatching size of objects,

or their location are not considered.

Dwibedi et al. [2] present a procedure to generate a targeted

dataset for instance segmentation. As input, a set of images

for each category, picturing solely the object of interest with a

modest background, is needed. They recommend diverse view-

points, in order to enable detection from diverse viewpoints

as well. A foreground background segmentation network is

trained to obtain segmentation masks for the foreground

objects. In addition, suitable background images need to be

chosen. Afterwards, objects are cut out with their mask from

the images and pasted onto a background image. Dwibedi et

1See https://www.blender.org/.

al. ensure invariance to local artifacts from pasting by applying

a set of blending methods. The exact same images are synthe-

sized multiple times, where only the blending method varies.

They show that this method enables training a neural network

for instance segmentation and that combining the synthetic

data with only 10 % of the real training data surpasses the

performance compared to training on all real data. Ghiasi et

al. [9] present a similar technique, however, they use existing

annotated datasets as their source for both the foreground and

the background and found scale jittering to be very efﬁcient.

First two images within a dataset are randomly chosen and

their scale is jittered. Subsequently, objects from one image

are cut out by using their given annotated mask and pasted ran-

domly onto the second image. During this process annotations

within the second image are adjusted accordingly, i.e. adjusted

for occlusion. They do not use geometric transformations such

as rotation and ﬁnd Gaussian blurring not to be beneﬁcial.

Ghiasi et al. conclude that their method is highly effective

and robust. Mensink et al. [3] present a study on the inﬂuence

of several factors on the performance for transfer learning.

They ﬁnd that the image domain is the most important factor

and that the target dataset should be contained in the source

dataset to achieve best results.

In our work, we follow an approach similar to Dwibedi et

al. [2], however, fully automate the foreground object image

retrieval by using web scraping and a pre-processing pipeline.

Applications in Logistics.: Work on the plane-wise seg-

mentation of parcels, without the need for a custom training

dataset was presented by Naumann et al. [10]. Plane segmen-

tation information is combined with contour detection to gen-

erate plane-level segmentations. Small load carriers have been

targeted using synthetic training data [11]. Furthermore, the

problem of packaging structure recognition has been tackled

[12]–[14]. Packaging structure recognition aims at localizing

and counting small load carriers that are stacked onto a pallet.

III. DATASET GENERATION

Our dataset generation approach is based on Dwibedi et

al. [2]. We follow a similar procedure, apart from the data

acquisition approach. This section is organized as follows: In

Sec. III-A, we explain the data acquisition through web scrap-

ing. Subsequently, we present three different image selection

methods which yield three different datasets in Sec. III-B. The

image generation is explained in Sec. III-C and ﬁnally we

present our real dataset in Sec. III-D.

A. Image Scraping

In order to generate a synthetic dataset, it is crucial to have a

sufﬁciently large set of images picturing the object of interest.

We approach this problem by scraping images from popular

image search engines. We use four different search engines:

•Google Images: images.google.com,

•Bing Images: bing.com/images,

•Yahoo Images: images.search.yahoo.com and

•Baidu Images: image.baidu.com.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Scrape,Cut,PasteandLearn:AutomatedDatasetGenerationAppliedtoParcelLogisticsAlexanderNaumann1;2,FelixHertlein1,BenchunZhou2,LauraD¨orr1;2andKaiFurmans2AbstractState-of-the-artapproachesincomputervisionheav-ilyrelyonsufcientlylargetrainingdatasets.Forreal-worldapplications,obtainingsuchadatasetisusu...

展开>> 收起<<

Scrape Cut Paste and Learn Automated Dataset Generation Applied to Parcel Logistics Alexander Naumann12 Felix Hertlein1 Benchun Zhou2 Laura D orr12and Kai Furmans2.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Scrape Cut Paste and Learn Automated Dataset Generation Applied to Parcel Logistics Alexander Naumann12 Felix Hertlein1 Benchun Zhou2 Laura D orr12and Kai Furmans2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: