Scrape Cut Paste and Learn Automated Dataset Generation Applied to Parcel Logistics Alexander Naumann12 Felix Hertlein1 Benchun Zhou2 Laura D orr12and Kai Furmans2

2025-05-03 0 0 6.86MB 6 页 10玖币
侵权投诉
Scrape, Cut, Paste and Learn: Automated Dataset
Generation Applied to Parcel Logistics
Alexander Naumann1,2, Felix Hertlein1, Benchun Zhou2, Laura D¨
orr1,2and Kai Furmans2
Abstract—State-of-the-art approaches in computer vision heav-
ily rely on sufficiently large training datasets. For real-world
applications, obtaining such a dataset is usually a tedious task.
In this paper, we present a fully automated pipeline to generate
a synthetic dataset for instance segmentation in four steps. In
contrast to existing work, our pipeline covers every step from
data acquisition to the final dataset. We first scrape images for the
objects of interest from popular image search engines and since
we rely only on text-based queries the resulting data comprises
a wide variety of images. Hence, image selection is necessary as
a second step. This approach of image scraping and selection
relaxes the need for a real-world domain-specific dataset that
must be either publicly available or created for this purpose. We
employ an object-agnostic background removal model and com-
pare three different methods for image selection: Object-agnostic
pre-processing, manual image selection and CNN-based image
selection. In the third step, we generate random arrangements of
the object of interest and distractors on arbitrary backgrounds.
Finally, the composition of the images is done by pasting the
objects using four different blending methods. We present a case
study for our dataset generation approach by considering parcel
segmentation. For the evaluation we created a dataset of parcel
photos that were annotated automatically. We find that (1) our
dataset generation pipeline allows a successful transfer to real
test images (Mask AP 86.2), (2) a very accurate image selection
process - in contrast to human intuition - is not crucial and
a broader category definition can help to bridge the domain
gap, (3) the usage of blending methods is beneficial compared
to simple copy-and-paste. We made our full code for scraping,
image composition and training publicly available at https://a-
nau.github.io/parcel2d.
I. INTRODUCTION
Common computer vision tasks, such as instance detection
or segmentation have a tremendous potential to help the
automation of processes in many industries. For instance,
those techniques can be applied for process monitoring or
quality control [1]. However, since the object of interest
can vary widely depending on the underlying use-case, the
availability of a ready-to-use dataset of sufficient size is a
common problem in practice. Manual data acquisition and
annotation is a time-consuming and costly task, which is
why synthetic datasets have become more and more popular
[2]. When training on a synthetic dataset, with the goal of
employing the trained Convolutional Neural Network (CNN)
for the real use-case application, the domain gap between the
synthetic and the real images has to be taken into account [3].
1The authors are with the FZI Research Center for Information Technology,
Karlsruhe, Germany {anaumann, hertlein, doerr}@fzi.de
2The authors are with the Institute for Material Handling and Lo-
gistics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
{benchun.zhou, kai.furmans}@kit.edu
Fig. 1: Overview of our dataset generation pipeline: (1) We
scrape images from popular image search engines. (2) We
use and compare three different methods for image selection,
i.e. basic pre-processing, manual selection and CNN-based
selection. (3) The objects of interest and the distractors are
pasted onto a randomly selected background. (4) We use
four different blending methods to ensure invariance to local
pasting artifacts as suggested by Dwibedi et al. [2].
A promising approach for quick and efficient synthetic dataset
generation was presented by Dwibedi et al. [2]. They randomly
paste objects and distractors onto background images, while
using different blending methods to reduce the influence of
local pasting artifacts. We extend this approach by adding an
automated image selection pipeline as visualized in Fig. 1.
Thus, our pipeline is fully automated and the creation of
synthetic datasets is facilitated further. Note, that not only
dataset creation, but also a dataset update, e.g. after a domain
shift, is feasible.
We present our results on a case study of parcel detec-
tion. Parcel detection and segmentation is highly relevant in
industry since it can help to automate and monitor supply
chains [4]. A smoothly running supply chain is crucial for
manufacturing industries, pharmaceutical companies and far
beyond. Furthermore, having a pre-trained backbone that is a
strong feature extractor for the application use-case facilitates
the development of solutions for downstream tasks, such as
arXiv:2210.09814v1 [cs.CV] 18 Oct 2022
keypoint detection or 3D reconstruction. Our main contribu-
tions are:
we extend [2] by adding image scraping and different
image selection methods,
we analyze the influence of the image selection method
on the capacity for transfer learning,
we present a real-world dataset of parcel images that is
used for evaluation, and
we make our code publicly available, to facilitate the
generation of tailored datasets for custom domains.
The paper is organized as follows. We present related
literature in Sec. II. Subsequently, we describe our dataset
generation approach in Sec. III. The evaluation is presented
in Sec. IV and the paper concludes with Sec. V.
II. RELATED WORK
The idea of generating an artificial training dataset is
widespread, due to the high cost that incur for capturing and
annotating a tailor-made dataset for a use-case. We first present
relevant literature regarding the creation of artificial datasets
and subsequently delve into the application area of logistics.
Artificial Dataset Generation: Artificial datasets can ei-
ther be rendered or composed. When rendering images, we can
carefully choose a desired image layout and easily generate
a multiplicity of annotations - even the ones that are very
costly to obtain, such as 3D annotations. BlenderProc [5]
is a procedural Blender1pipeline that enables photorealistic
renderings to create synthetic datasets. Examples for popular
rendered datasets include [6], [7].
In contrast to that, image datasets can also be generated
by composition. Image composition is the task of seamlessly
combining two images by cutting a foreground object from one
image and pasting it onto another image. This is an important
task in computer vision with a wide range of applications.
Niu et al. [8] present a comprehensive survey on the topic,
and we refer to them for details on applications and subtasks
included in image composition. For our work, we focus on
simple image composition and neglect effects that might make
images look unrealistic to humans, as this has proven to be
sufficient for training the backbone of a neural network [9].
More explicitly, inconsistencies introduced by incompatible
colors, unreasonable illumination, mismatching size of objects,
or their location are not considered.
Dwibedi et al. [2] present a procedure to generate a targeted
dataset for instance segmentation. As input, a set of images
for each category, picturing solely the object of interest with a
modest background, is needed. They recommend diverse view-
points, in order to enable detection from diverse viewpoints
as well. A foreground background segmentation network is
trained to obtain segmentation masks for the foreground
objects. In addition, suitable background images need to be
chosen. Afterwards, objects are cut out with their mask from
the images and pasted onto a background image. Dwibedi et
1See https://www.blender.org/.
al. ensure invariance to local artifacts from pasting by applying
a set of blending methods. The exact same images are synthe-
sized multiple times, where only the blending method varies.
They show that this method enables training a neural network
for instance segmentation and that combining the synthetic
data with only 10 % of the real training data surpasses the
performance compared to training on all real data. Ghiasi et
al. [9] present a similar technique, however, they use existing
annotated datasets as their source for both the foreground and
the background and found scale jittering to be very efficient.
First two images within a dataset are randomly chosen and
their scale is jittered. Subsequently, objects from one image
are cut out by using their given annotated mask and pasted ran-
domly onto the second image. During this process annotations
within the second image are adjusted accordingly, i.e. adjusted
for occlusion. They do not use geometric transformations such
as rotation and find Gaussian blurring not to be beneficial.
Ghiasi et al. conclude that their method is highly effective
and robust. Mensink et al. [3] present a study on the influence
of several factors on the performance for transfer learning.
They find that the image domain is the most important factor
and that the target dataset should be contained in the source
dataset to achieve best results.
In our work, we follow an approach similar to Dwibedi et
al. [2], however, fully automate the foreground object image
retrieval by using web scraping and a pre-processing pipeline.
Applications in Logistics.: Work on the plane-wise seg-
mentation of parcels, without the need for a custom training
dataset was presented by Naumann et al. [10]. Plane segmen-
tation information is combined with contour detection to gen-
erate plane-level segmentations. Small load carriers have been
targeted using synthetic training data [11]. Furthermore, the
problem of packaging structure recognition has been tackled
[12]–[14]. Packaging structure recognition aims at localizing
and counting small load carriers that are stacked onto a pallet.
III. DATASET GENERATION
Our dataset generation approach is based on Dwibedi et
al. [2]. We follow a similar procedure, apart from the data
acquisition approach. This section is organized as follows: In
Sec. III-A, we explain the data acquisition through web scrap-
ing. Subsequently, we present three different image selection
methods which yield three different datasets in Sec. III-B. The
image generation is explained in Sec. III-C and finally we
present our real dataset in Sec. III-D.
A. Image Scraping
In order to generate a synthetic dataset, it is crucial to have a
sufficiently large set of images picturing the object of interest.
We approach this problem by scraping images from popular
image search engines. We use four different search engines:
Google Images: images.google.com,
Bing Images: bing.com/images,
Yahoo Images: images.search.yahoo.com and
Baidu Images: image.baidu.com.
摘要:

Scrape,Cut,PasteandLearn:AutomatedDatasetGenerationAppliedtoParcelLogisticsAlexanderNaumann1;2,FelixHertlein1,BenchunZhou2,LauraD¨orr1;2andKaiFurmans2Abstract—State-of-the-artapproachesincomputervisionheav-ilyrelyonsufcientlylargetrainingdatasets.Forreal-worldapplications,obtainingsuchadatasetisusu...

展开>> 收起<<
Scrape Cut Paste and Learn Automated Dataset Generation Applied to Parcel Logistics Alexander Naumann12 Felix Hertlein1 Benchun Zhou2 Laura D orr12and Kai Furmans2.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:6.86MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注