
INTEGRATING PRE-PROCESSING PIPELINES IN ODC BASED FRAMEWORK
U.Otamendi, I.Azpiroz, M.Quartulli, I.Olaizola
Vicomtech Foundation
Basque Research and Technology Alliance (BRTA)
Donostia-San Sebasti´
an, 20009, Spain
ABSTRACT
Using on-demand processing pipelines to generate vir-
tual geospatial products is beneficial to optimizing resource
management and decreasing processing requirements and
data storage space. Additionally, pre-processed products im-
prove data quality for data-driven analytical algorithms, such
as machine learning or deep learning models. This paper
proposes a method to integrate virtual products based on inte-
grating open-source processing pipelines. In order to validate
and evaluate the functioning of this approach, we have inte-
grated it into a geo-imagery management framework based
on Open Data Cube (ODC). To validate the methodology,
we have performed three experiments developing on-demand
processing pipelines using multi-sensor remote sensing data,
for instance, Sentinel-1 and Sentinel-2. These pipelines are
integrated using open-source processing frameworks.
Index Terms—SNAP, Sentinel, data exploitation, man-
agement, optimization, machine learning, data processing
1. INTRODUCTION
Geospatial imagery is widely used in multiple fields of en-
vironmental management approaches based on modern com-
puting, such as deep learning [1]. For instance, periodic data
provided by satellites are useful for analysis and pattern ex-
traction from a time series. The method offers a more accurate
understanding of the evolution of the explored area. How-
ever, these high spatial resolution data require a large storage
capacity. In addition, the processing of these data is compu-
tationally demanding [2, 3].
Productive geo-imagery processing for rapid mapping is
highly dependent on the efficiency of local statistics gener-
ation from remote sensing images. An automated computa-
tion supposes a substantial advance for agronomists, scien-
tists, and satellite-derived data users.
In a previous paper [4] we proposed a methodology to
address the limitations of non-expert users in managing and
processing remote sensing and geo-imagery data. This sys-
tem automatically ingests geospatial data and allows non-
expert users to manage geospatial data in data-driven algo-
rithms without requiring knowledge of remote sensing or
Fig. 1: The figure shows an overview of the proposed method-
ology to generate on-demand geospatial virtual products via
processing pipelines. As shown, the non-expert user can de-
clare a virtual product. Then, the framework uses the avail-
able processing operations to create a processing pipeline that
converts the source geospatial data to the desired format. Fi-
nally, the resulting product is ingested by the Open Data Cube
architecture, allowing the non-expert user to use the data in
analytical processes.
geo-imagery exploitation. However, this considerably limits
the exploration capability of modified products. Conse-
quently, a non-expert user will only be limited to analyzing
those products that the satellite imagery distributors have
previously defined.
Therefore, this hinders the optimal use of the data in the
performance of the algorithmic processes. In this sense, the
main goal of the current contribution is to describe the inte-
gration of on-demand processing pipelines in an ODC-based
infrastructure (see fig. 1). This approach provides several
benefits of resource optimization and data quality improve-
ment. Additionally, users acquire the ability to create virtual
geospatial data based on processing pipelines to automatically
generate adequate data to train and use data-driven models
[5].
The implementation of this methodology has been inte-
grated with the Open Data Cube (ODC) based architecture
proposed in the previous data management paper [4]. In or-
der to validate this approach, we have performed three experi-
arXiv:2210.01528v1 [cs.CV] 4 Oct 2022