
vices such as drones [8]. In addition, the use of event-driven
learning rules such as Spike-Timing-Dependent Plasticity
(STDP) [9] enables unsupervised learning at the edge using
emerging sub-milliwatt neuromorphic chips [10], [11], as
opposed to power-hungry GPUs. Furthermore, SNN-STDP
systems are also a good choice for continual, on-line learning
[12], dropping the need for offline SNN training. Hence, this
work studies the use of SNNs equipped with STDP learning
for fusing DVS and radar data on power-constrained drones.
The use of radar sensing for drone navigation has recently
been investigated in a growing number of works [13],
[14], [15], [16], [17]. Indeed, fusing radar sensing with
vision-based sensors such as DVS is attractive since radars
intrinsically provide complementary information to camera
vision, such as target velocity and position in a range-azimuth
map (also called bird’s eye view vs. projective plane in
cameras) [17]. In addition, radars are robust to environmental
conditions as they are not sensitive to occlusion by dirt and
can sense in the dark [14].
This work deviates from most fusion-based SLAMs by
proposing a first-of-its-kind, bio-inspired SLAM system fus-
ing event camera with radar (see Fig. 2, 4), using SNNs that
continuously learn via STDP, as observed in the brain [9].
It also deviates from most learning-based SLAM systems
which typically require the offline training of a Deep Neural
Network (DNN) on a dataset of the working environment
captured beforehand [2]. In contrast, our DVS-Radar fusion
SNN learns on the fly and keeps adapting its weights via
unsupervised STDP as the drone explores the environment.
At the same time, the SNN outputs are fed to a bio-inspired
RatSLAM back-end [18] for loop closure detection and map
correction. Crucially, our continual STDP learning approach
enables the deployment of our system in environments not
captured by datasets and therefore, not known a priori (vs.
offline training in state-of-the-art DNN-based SLAMs [2]).
We use our sensor fusion drone shown in Fig. 2 to jointly
acquire DVS and radar data during multiple drone flights in a
challenging, indoor environment, in order to perform SLAM
(see Fig. 2 b). We assess the performance of our proposed
DVS-Radar SNN-STDP SLAM system against ground truth
positioning, recorded via Ultra Wide Band (UWB) beacons
[6]. The main contributions of this paper are the following:
1) We propose what is, to the best of our knowledge, the
first continual-learning SLAM system which fuses an
event-based camera and an FMCW radar using SNNs.
2) We propose a method for radar-gyroscope odometry,
where radar sensing provides the drone’s velocity, and
a method for obstacle modelling via radar detections.
3) We experimentally assess the performance of our
SLAM system on three different flight sequences in a
challenging warehouse environment and we show the
robustness of our system to strong lighting variations.
This paper is organized as follows. Related works are
discussed in Section II, followed by background theory
covered in Section III. Our proposed methods are presented
in Section IV. Experimental results are shown in Section V.
Conclusions are provided in Section VI.
II. RELATED WORKS
A growing number of bio-inspired [18], [19], [20] and
sensor fusion [1], [2], [4] SLAM systems have been proposed
in recent years . Among the most related to this work, a DVS-
RGB SLAM system has been proposed in [1], providing
robust state estimations by fusing event-based cameras, RGB
and raw IMU odometry. In addition, the system of [1] has
been implemented on a drone for indoor navigation and was
shown to be robust to drastic changes in lighting conditions
and in low-light scenarios. In contrast to [1], which makes
use of hand-crafted features, our system uses an SNN with
STDP learning. In addition, we do not fuse the richer
information obtained from RGB as in [1], but rather fuse
DVS with sparser radar detections instead, which makes the
reliable template matching of the SNN outputs challenging.
Recently, the LatentSLAM system has been proposed in
[2] as a learning-based pipeline using a DNN encoder which
provides latent codes for template matching, trained offline
on a dataset capturing the environment in which the robot
must navigate. The inferred latent codes are fed to the loop
closure detection and map correction back-end of the popular
RatSLAM system [18] to correct the drift in raw odometry.
In contrast, our proposed continual learning system does not
require any offline training phase, enabling its deployment in
unseen environments without the requirement of capturing a
dataset of the working environment beforehand.
As stated earlier, we make use of the RatSLAM loop
closure detection and map correction back-end in this work
[18], [21]. RatSLAM has been proposed as a bio-inspired
system following the navigational processes of the rat’s
hippocampus [18]. Even though the original RatSLAM uses
raw RGB images for template matching, further evolutions
of RatSLAM, such as LatentSLAM, replace the raw RGB
input by the stream of associated latent codes obtained
through learned feature extraction [2]. In this work, we feed
both our proposed radar-gyroscope odometry and the latent
codes inferred by our proposed continual learning SNN-
STDP fusion system to the RatSLAM back-end.
Since RGB-based SLAMs constitute today’s state of the
art, we will benchmark our DVS-Radar SLAM against both
LatentSLAM and RatSLAM, and against ORB features [22]
extensively used in state-of-the-art SLAM systems [3], [23].
III. SNN BACKGROUND THEORY
Unlike frame-based DNNs, SNNs make use of event-
driven spiking neurons as activation function, often modelled
using the Leaky Integrate and Fire (LIF) neurons [24]:
(dV
dt =1
τm(Jin −V)with Jin = ¯wT
syn ¯s(t)
σ= 1, V ←− 0if V≥µ, else σ= 0 (1)
where σis the spiking output, Vthe membrane potential,
τmthe membrane time constant, µthe neuron threshold
and Jin = ¯wT
syn ¯s(t)the input to the neuron, resulting from
the inner product between the neuron weights ¯wsyn and the
spiking input vector ¯s(t). The LIF continuously integrates
its input Jin in Vfollowing (1). When Vcrosses the firing