Fusing Event-based Camera and Radar for SLAM Using Spiking Neural Networks with Continual STDP Learning Ali Safa14 Tim Verbelen24 Ilja Ocket4 Andr e Bourdoux4 Hichem Sahli34

2025-05-06 0 0 3.18MB 7 页 10玖币
侵权投诉
Fusing Event-based Camera and Radar for SLAM Using Spiking
Neural Networks with Continual STDP Learning
Ali Safa1,4, Tim Verbelen2,4, Ilja Ocket4, Andr´
e Bourdoux4, Hichem Sahli3,4,
Francky Catthoor1,4, Georges Gielen1,4
Abstract This work proposes a first-of-its-kind SLAM ar-
chitecture fusing an event-based camera and a Frequency Mod-
ulated Continuous Wave (FMCW) radar for drone navigation.
Each sensor is processed by a bio-inspired Spiking Neural Net-
work (SNN) with continual Spike-Timing-Dependent Plasticity
(STDP) learning, as observed in the brain. In contrast to most
learning-based SLAM systems, our method does not require any
offline training phase, but rather the SNN continuously learns
features from the input data on the fly via STDP. At the same
time, the SNN outputs are used as feature descriptors for loop
closure detection and map correction. We conduct numerous
experiments to benchmark our system against state-of-the-art
RGB methods and we demonstrate the robustness of our DVS-
Radar SLAM approach under strong lighting variations.
MULTIMEDIA MATERIAL
Please watch a demo video of our SNN-based DVS-Radar
fusion SLAM at https://youtu.be/a7gvZWNHGoI
I. INTRODUCTION
Simultaneous Localisation and Mapping (SLAM) is an
important problem for autonomous agents such as drones
[1], [2]. Most state-of-the-art SLAM systems integrate raw
odometry data with feature matching using standard RGB
cameras in order to detect loop closures (i.e., places already
visited by the agent) and to correct the drift in raw odometry
accordingly [3]. However, using RGB cameras alone does
not provide upmost robustness, such as insensitivity to light-
ing variations and environmental conditions [1]. Therefore,
multi-sensor SLAM systems fusing RGB with e.g., lidar,
radar and event-based cameras have emerged with increased
robustness compared to RGB alone [2], [4].
Event-based cameras (also called Dynamic Vision Sensors
or DVS) are a novel type of imaging sensor composed
of independent pixels xij that asynchronously emit spikes
whenever the change in light log-intensity |Lij |sensed
by the pixel crosses a certain threshold C[5] (see Fig.
1). In contrast to RGB cameras, DVS cameras can still
perform well in low-light conditions [1] and produce a
spatio-temporal stream that contains patterns in both the
spatial and spike timing dimensions, making them a natural
This research has received funding from the Flemish Government (AI
Research Program) and the European Union’s ECSEL Joint Undertaking
under grant agreement n° 826655 - project TEMPO.
1Faculty of Electrical Engineering (ESAT) KU Leuven, 3001, Belgium
2IDLab, Gent University, B-9052 Gent, Belgium
3ETRO, VUB, 1050 Brussels, Belgium
4imec, Kapeldreef 75, 3001, Leuven, Belgium {Ali.Safa,
Tim.Verbelen, Ilja.Ocket, Andre.Bourdoux,
Hichem.Sahli, Francky.Catthoor}@imec.be,
Georges.Gielen@kuleuven.be
Fig. 2: a) Our sensor fusion drone setup used for jointly
acquiring event camera and radar data. The warehouse
environment (detailed in [6]) in which the drone navigates
is also shown. The warehouse is equipped with Ultra Wide
Band (UWB) localisation for ground truth acquisition [6]. b)
Event and radar data are fed to SNNs that continually learn
on the fly via STDP (see Fig. 4). The SNN output is used for
loop closure detection to perform SLAM. Radar detections
are also used to model the obstacles (black dots in b).
choice as input for Spiking Neural Networks (SNNs). Indeed,
SNNs are bio-plausible neural networks that make use of
spiking neurons, communicating via binary activations in an
event-driven manner, matching the DVS principle [7].
SNNs have recently gained huge attention for ultra-low-
energy and -area processing in power-constrained edge de-
Fig. 1: Conceptual illustration of event-based vision. a) The
camera outputs a stream of spikes in both space and time.
b) Each pixel xij fires a spike when its change in light log-
intensity Lij crosses a threshold. The spike is positive when
Lij >0(red dots) and negative (blue dots) otherwise.
arXiv:2210.04236v1 [cs.RO] 9 Oct 2022
vices such as drones [8]. In addition, the use of event-driven
learning rules such as Spike-Timing-Dependent Plasticity
(STDP) [9] enables unsupervised learning at the edge using
emerging sub-milliwatt neuromorphic chips [10], [11], as
opposed to power-hungry GPUs. Furthermore, SNN-STDP
systems are also a good choice for continual, on-line learning
[12], dropping the need for offline SNN training. Hence, this
work studies the use of SNNs equipped with STDP learning
for fusing DVS and radar data on power-constrained drones.
The use of radar sensing for drone navigation has recently
been investigated in a growing number of works [13],
[14], [15], [16], [17]. Indeed, fusing radar sensing with
vision-based sensors such as DVS is attractive since radars
intrinsically provide complementary information to camera
vision, such as target velocity and position in a range-azimuth
map (also called bird’s eye view vs. projective plane in
cameras) [17]. In addition, radars are robust to environmental
conditions as they are not sensitive to occlusion by dirt and
can sense in the dark [14].
This work deviates from most fusion-based SLAMs by
proposing a first-of-its-kind, bio-inspired SLAM system fus-
ing event camera with radar (see Fig. 2, 4), using SNNs that
continuously learn via STDP, as observed in the brain [9].
It also deviates from most learning-based SLAM systems
which typically require the offline training of a Deep Neural
Network (DNN) on a dataset of the working environment
captured beforehand [2]. In contrast, our DVS-Radar fusion
SNN learns on the fly and keeps adapting its weights via
unsupervised STDP as the drone explores the environment.
At the same time, the SNN outputs are fed to a bio-inspired
RatSLAM back-end [18] for loop closure detection and map
correction. Crucially, our continual STDP learning approach
enables the deployment of our system in environments not
captured by datasets and therefore, not known a priori (vs.
offline training in state-of-the-art DNN-based SLAMs [2]).
We use our sensor fusion drone shown in Fig. 2 to jointly
acquire DVS and radar data during multiple drone flights in a
challenging, indoor environment, in order to perform SLAM
(see Fig. 2 b). We assess the performance of our proposed
DVS-Radar SNN-STDP SLAM system against ground truth
positioning, recorded via Ultra Wide Band (UWB) beacons
[6]. The main contributions of this paper are the following:
1) We propose what is, to the best of our knowledge, the
first continual-learning SLAM system which fuses an
event-based camera and an FMCW radar using SNNs.
2) We propose a method for radar-gyroscope odometry,
where radar sensing provides the drone’s velocity, and
a method for obstacle modelling via radar detections.
3) We experimentally assess the performance of our
SLAM system on three different flight sequences in a
challenging warehouse environment and we show the
robustness of our system to strong lighting variations.
This paper is organized as follows. Related works are
discussed in Section II, followed by background theory
covered in Section III. Our proposed methods are presented
in Section IV. Experimental results are shown in Section V.
Conclusions are provided in Section VI.
II. RELATED WORKS
A growing number of bio-inspired [18], [19], [20] and
sensor fusion [1], [2], [4] SLAM systems have been proposed
in recent years . Among the most related to this work, a DVS-
RGB SLAM system has been proposed in [1], providing
robust state estimations by fusing event-based cameras, RGB
and raw IMU odometry. In addition, the system of [1] has
been implemented on a drone for indoor navigation and was
shown to be robust to drastic changes in lighting conditions
and in low-light scenarios. In contrast to [1], which makes
use of hand-crafted features, our system uses an SNN with
STDP learning. In addition, we do not fuse the richer
information obtained from RGB as in [1], but rather fuse
DVS with sparser radar detections instead, which makes the
reliable template matching of the SNN outputs challenging.
Recently, the LatentSLAM system has been proposed in
[2] as a learning-based pipeline using a DNN encoder which
provides latent codes for template matching, trained offline
on a dataset capturing the environment in which the robot
must navigate. The inferred latent codes are fed to the loop
closure detection and map correction back-end of the popular
RatSLAM system [18] to correct the drift in raw odometry.
In contrast, our proposed continual learning system does not
require any offline training phase, enabling its deployment in
unseen environments without the requirement of capturing a
dataset of the working environment beforehand.
As stated earlier, we make use of the RatSLAM loop
closure detection and map correction back-end in this work
[18], [21]. RatSLAM has been proposed as a bio-inspired
system following the navigational processes of the rat’s
hippocampus [18]. Even though the original RatSLAM uses
raw RGB images for template matching, further evolutions
of RatSLAM, such as LatentSLAM, replace the raw RGB
input by the stream of associated latent codes obtained
through learned feature extraction [2]. In this work, we feed
both our proposed radar-gyroscope odometry and the latent
codes inferred by our proposed continual learning SNN-
STDP fusion system to the RatSLAM back-end.
Since RGB-based SLAMs constitute today’s state of the
art, we will benchmark our DVS-Radar SLAM against both
LatentSLAM and RatSLAM, and against ORB features [22]
extensively used in state-of-the-art SLAM systems [3], [23].
III. SNN BACKGROUND THEORY
Unlike frame-based DNNs, SNNs make use of event-
driven spiking neurons as activation function, often modelled
using the Leaky Integrate and Fire (LIF) neurons [24]:
(dV
dt =1
τm(Jin V)with Jin = ¯wT
syn ¯s(t)
σ= 1, V 0if Vµ, else σ= 0 (1)
where σis the spiking output, Vthe membrane potential,
τmthe membrane time constant, µthe neuron threshold
and Jin = ¯wT
syn ¯s(t)the input to the neuron, resulting from
the inner product between the neuron weights ¯wsyn and the
spiking input vector ¯s(t). The LIF continuously integrates
its input Jin in Vfollowing (1). When Vcrosses the firing
摘要:

FusingEvent-basedCameraandRadarforSLAMUsingSpikingNeuralNetworkswithContinualSTDPLearningAliSafa1;4,TimVerbelen2;4,IljaOcket4,Andr´eBourdoux4,HichemSahli3;4,FranckyCatthoor1;4,GeorgesGielen1;4Abstract—Thisworkproposesarst-of-its-kindSLAMar-chitecturefusinganevent-basedcameraandaFrequencyMod-ulatedC...

展开>> 收起<<
Fusing Event-based Camera and Radar for SLAM Using Spiking Neural Networks with Continual STDP Learning Ali Safa14 Tim Verbelen24 Ilja Ocket4 Andr e Bourdoux4 Hichem Sahli34.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:3.18MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注