images.
The paper is organized in 6 sections as follows. Section II
provides a brief overview of related works on which the
current paper is based. Section III explains the motivation
and applicability of the implemented vision-based algorithm,
while Section IV presents the approach used for determining
the absolute geographical coordinates of a UAV. Section V dis-
cusses the experimental localization results. Finally, Section VI
concludes the work and outlines future research directions.
II. BACKGROUND
Previous studies directly related to our paper include seman-
tic segmentation based path planning [13], localization using
open source Google Earth (GE) aerial images [14] and pose
estimation with neural networks trained with georeferenced
satellite photographs [15]. In addition to these, an open source
image segmentation model originally used for building virtual
worlds [16] can prove its utility in providing useful input both
for localization and navigation purposes, as proposed in [13].
Additionally, a graph neural network that provides fea-
ture computation and matching for outdoor images, Super-
Glue [11], became part of the implementation in our proposed
localization algorithm due to its remarkable performance of
matching features in photographs which significantly differ
in perspective and lighting conditions. The model proved
its efficient application not only for matching features, but
also for the perspective transformations (namely homography)
used in computing geographical coordinates when running the
implemented vision-based localization algorithm. Before se-
lecting Superglue as a feature matcher between drone camera
photographs and satellite images, template matching [17] and
SIFT features [18] were also taken into consideration. The
latter two options were dropped due to their relatively high
computation time and inaccuracy, compared to [11].
Another remarkable paper is [12], which uses an advanced
template-based matching algorithm to compute the pose es-
timation of a UAV using only images. Because of insuffi-
cient computation resources at the time, no implementation
is provided on the onboard computer of a drone. Another
major difference, compared to our approach, is that the drone
navigates specifically urban environments, where computing
and matching visual features is much less of a challenge,
compared to natural, in the wild, flying areas.
Currently, the state-of-the-art of navigation approaches in
a priori unknown environments are Simultaneous Localiza-
tion and Mapping (SLAM) algorithms, which have arguably
reached a high-degree of robustness and reliability in the past
decade [19]. An important building block of SLAM algo-
rithms are VO methods [20], which allow UAVs to accurately
determine their position while navigating new environments.
However, one major limiting factor of these approaches is
that they make the assumption the UAV is flying at a low
enough altitude that an RGB monocular or stereo camera can
easily track position shifts in detected features from frame to
frame. Our work focuses on high-altitude flight (120 meters), a
situation where a different approach for localization is needed,
as presented in Section III.
III. PROVIDING GNSS-FREE VISION-BASED
LOCALIZATION
Our main goal is developing a localization algorithm that
does not rely on GNSS for long-distance flights, but only
on a monocular wide-angle camera. This kind of approach
proves its utility in situations where GNSS signal cannot be
reliably used and as a failsafe alternative that enables the drone
to reach its goal position or at least land safely in a pre-
established location. New commercial implementations such
as autonomous drone delivery could use the new localization
method to improve the reliability of navigation.
The core attribute of the project is the environment in which
vision-based localization is provided: in the wild, denoting
natural (non-urban) environments where artificial structures
such as buildings and roads are sparse. This characteristic
transforms what would be a trivial feature matching and ho-
mography computation problem inside a city into a challeng-
ing process, due to the difficulty of finding salient features in
natural environments. Nevertheless, the final implementation
is able to provide accurate localization results using a neural
network for feature matching between drone photographs and
satellite images.
Another significant feature of the implemented localization
algorithm is that the pre-mapping process does not require the
UAV to fly. The map is built using exclusively open-source
satellite images, with the objective of enabling autonomous
localization in any area where the drone can fly, assuming it
is legally and physically possible (due to harsh environmental
conditions).
IV. METHODOLOGY
Accurate feature matching of images is only useful if
the drone camera photographs can be precisely linked to
geographical coordinates. An important assumption of our
proposed localization method is that the flight area of the UAV
is known a priori, so a map for that specific zone can be built
and uploaded to the onboard computer for offline use. The
map is composed of rectangular sections representing RGB
satellite images with an approximate resolution of 1400×1200
pixels, collected from GE. Each one of these sections are
collected from the same perspective, with the camera view
perpendicular to the ground surface and from an altitude that
offers a similar field of view to a wide angle camera. One
example of these sections can be observed in Fig. 2, where
the transparent white rectangle represents the georeferenced
map tile. There is a linear relation (see Equations 1 and 2)
between the pixel coordinates of the image file and absolute
geographical coordinates.
A. Georeferencing map sections
Because the flight area is generally too large to be repre-
sented as only one image file, the map is split into different
section, each one of them with two distinct geographical