2
quantify their utilisation, we introduce a taxonomy of data
interpretations in Section V-A.
Large labelled datasets are critical for training robust deep
learning models. But, while there is an abundance of images,
the corresponding labels can be much more difficult to come
by. For some tasks, like crop segmentation, the labels can be
discerned directly from the image, but for most agricultural
tasks, the target quantity is not so directly visible. This can
be because the relationship between reflectance and the target
value is complicated by various soil and biochemical attributes,
as in the case of predicting Leaf Area Index (LAI), or because
the target quantity is only knowable by analysing a time
sequence, as in yield prediction. For such tasks, collecting
data from ground level is necessary, but is more expensive to
obtain. So, this review, analyses the data sources for each task
and highlights the publicly available data.
The ultimate goal of most agricultural research is to help
improve the yield and quality of our crops. But, the processes
which turn sunlight, water, carbon dioxide, nutrients and
minerals into the food we eat are varied and complex. They
can manifest as broad visible changes, or as subtle chemical
changes. Agricultural research can target any one of those
pathways. By using a systematic search, this review identifies
which agricultural quantities researchers are attempting to
measure from satellite images using deep learning in practice
(see Section VI). However, there remains open questions.
Which quantities could or should be measured from space?
Are there subtle signals in satellite images that truly provide
information about the plants? Can deep learning uncover them
if there are? While there have been some successes using deep
learning on satellite images in crop segmentation and yield
prediction, difficult challenges remain for other tasks.
In summary, the contributions of this review are:
1) A gentle introduction to the use of satellite images, and
how this differs to generic computer vision tasks.
2) A taxonomy of data shapes and interpretations, and a
quantification of how often each is used for each task.
3) A tabulated list of references which includes this taxon-
omy, identifying which methods were used and which
worked best in each study (see Supplementary materi-
als).
4) Quantitative analysis of the performance of various deep
learning approaches on agricultural tasks.
5) An investigation of what datasets and data sources are
available.
6) Identification of the breadth of agricultural tasks using
satellite images, including general information, specific
challenges and suggestions to help adopt/improve deep
learning for each task.
II. SEARCH STRATEGY
To create an initial list of papers, we used a search query
for Clarivate’s Web of Science. To broadly find papers at the
intersection of deep learning, satellite images and agriculture,
we used both generic and specific terms for each (see Table
I). For deep learning, this was specific algorithm names. For
agriculture this was specific crop names from the Cropland
Data Layer [123]. The resultant tagged library of studies is
available as supplementary materials.
The initial search yielded 770 studies. We performed an
initial rapid pass through the collection of studies to filter out
studies that were not at the intersection of deep learning, satel-
lite imagery and agriculture, ultimately yielding 193 studies.
The majority of these studies were for crop segmentation and
yield prediction, thus, the studies for those tasks were further
filtered as follows:
•2020 and earlier: study is included if it has at least x cita-
tions on Google Scholar (x= 50 for crop segmentation;
x= 25 for yield prediction)
•Jan 2021 - October 2022: all were included.
We did not include methods using UAV imagery because we
were interested in methods for resolving the tension between
object size and pixel size in satellite imagery. For crop segmen-
tation studies (Section VI-A), we only include studies which
used multiple agricultural classes. Soil monitoring studies
(Section VI-B) often only implied an agricultural significance,
but, since soil has such a strong influence on agriculture, and
relatively few studies, we include all found soil monitoring
studies, even if they did not explicitly have an agricultural
motivation.
Although this review is systematic, it is not exhaustive,
and not just because of the above filtering. By limiting the
review to studies indexed by Clarivate’s Web of Science, we
are deliberately selecting for higher profile works than if we
included searches across all published literature. We rely on
the manual filtering stage to ensure that we only include
relevant works. And while the search terms may not reveal all
possible relevant studies, we believe that they are sufficient to
return a representative sample of all relevant studies.
There was also some inconsistency in terminology in the
reviewed studies. In the interest of clarity, and to assist anyone
unfamiliar with these terms, the variations are summarised in
Table II.
III. SATELLITE IMAGES
Objects imaged by satellites are typically significantly
smaller than the ground spatial distance (GSD) covered by
each pixel. For example, the colour of each pixel in a satel-
lite image of farmland might be aggregated from hundreds,
thousands or even millions of individual plants. This massive
difference in scale between object and pixel sizes has encour-
aged researchers to focus on understanding the contents of
individual pixels as a combination of various surface types.
This naturally encouraged per-pixel algorithms [7, 8], rather
than the typical computer vision approaches which primarily
use the structured pattern of multiple spatially-related pixels
to understand an image [33, 132].
While the spatial resolution relative to the imaged objects
is much worse for satellite imagery, the spectral resolution is
often significantly better. Almost all satellite imagery have at
least 4 colour channels (red, green, blue and near-infrared),
many have more than 10 colour channels (e.g. Sentinel-2),
and some have over 100 different colour channels [112],
providing significantly more information per pixel than typical