Fusing Event-based Camera and Radar for SLAM Using Spiking Neural Networks with Continual STDP Learning Ali Safa14 Tim Verbelen24 Ilja Ocket4 Andr e Bourdoux4 Hichem Sahli34

2025-05-06 0 0 3.18MB 7 页 10玖币

侵权投诉

Fusing Event-based Camera and Radar for SLAM Using Spiking

Neural Networks with Continual STDP Learning

Ali Safa1,4, Tim Verbelen2,4, Ilja Ocket4, Andr´

e Bourdoux4, Hichem Sahli3,4,

Francky Catthoor1,4, Georges Gielen1,4

Abstract— This work proposes a ﬁrst-of-its-kind SLAM ar-

chitecture fusing an event-based camera and a Frequency Mod-

ulated Continuous Wave (FMCW) radar for drone navigation.

Each sensor is processed by a bio-inspired Spiking Neural Net-

work (SNN) with continual Spike-Timing-Dependent Plasticity

(STDP) learning, as observed in the brain. In contrast to most

learning-based SLAM systems, our method does not require any

ofﬂine training phase, but rather the SNN continuously learns

features from the input data on the ﬂy via STDP. At the same

time, the SNN outputs are used as feature descriptors for loop

closure detection and map correction. We conduct numerous

experiments to benchmark our system against state-of-the-art

RGB methods and we demonstrate the robustness of our DVS-

Radar SLAM approach under strong lighting variations.

MULTIMEDIA MATERIAL

Please watch a demo video of our SNN-based DVS-Radar

fusion SLAM at https://youtu.be/a7gvZWNHGoI

I. INTRODUCTION

Simultaneous Localisation and Mapping (SLAM) is an

important problem for autonomous agents such as drones

[1], [2]. Most state-of-the-art SLAM systems integrate raw

odometry data with feature matching using standard RGB

cameras in order to detect loop closures (i.e., places already

visited by the agent) and to correct the drift in raw odometry

accordingly [3]. However, using RGB cameras alone does

not provide upmost robustness, such as insensitivity to light-

ing variations and environmental conditions [1]. Therefore,

multi-sensor SLAM systems fusing RGB with e.g., lidar,

radar and event-based cameras have emerged with increased

robustness compared to RGB alone [2], [4].

Event-based cameras (also called Dynamic Vision Sensors

or DVS) are a novel type of imaging sensor composed

of independent pixels xij that asynchronously emit spikes

whenever the change in light log-intensity |∆Lij |sensed

by the pixel crosses a certain threshold C[5] (see Fig.

1). In contrast to RGB cameras, DVS cameras can still

perform well in low-light conditions [1] and produce a

spatio-temporal stream that contains patterns in both the

spatial and spike timing dimensions, making them a natural

This research has received funding from the Flemish Government (AI

Research Program) and the European Union’s ECSEL Joint Undertaking

under grant agreement n° 826655 - project TEMPO.

1Faculty of Electrical Engineering (ESAT) KU Leuven, 3001, Belgium

2IDLab, Gent University, B-9052 Gent, Belgium

3ETRO, VUB, 1050 Brussels, Belgium

4imec, Kapeldreef 75, 3001, Leuven, Belgium {Ali.Safa,

Tim.Verbelen, Ilja.Ocket, Andre.Bourdoux,

Hichem.Sahli, Francky.Catthoor}@imec.be,

Georges.Gielen@kuleuven.be

Fig. 2: a) Our sensor fusion drone setup used for jointly

acquiring event camera and radar data. The warehouse

environment (detailed in [6]) in which the drone navigates

is also shown. The warehouse is equipped with Ultra Wide

Band (UWB) localisation for ground truth acquisition [6]. b)

Event and radar data are fed to SNNs that continually learn

on the ﬂy via STDP (see Fig. 4). The SNN output is used for

loop closure detection to perform SLAM. Radar detections

are also used to model the obstacles (black dots in b).

choice as input for Spiking Neural Networks (SNNs). Indeed,

SNNs are bio-plausible neural networks that make use of

spiking neurons, communicating via binary activations in an

event-driven manner, matching the DVS principle [7].

SNNs have recently gained huge attention for ultra-low-

energy and -area processing in power-constrained edge de-

Fig. 1: Conceptual illustration of event-based vision. a) The

camera outputs a stream of spikes in both space and time.

b) Each pixel xij ﬁres a spike when its change in light log-

intensity ∆Lij crosses a threshold. The spike is positive when

∆Lij >0(red dots) and negative (blue dots) otherwise.

arXiv:2210.04236v1 [cs.RO] 9 Oct 2022

vices such as drones [8]. In addition, the use of event-driven

learning rules such as Spike-Timing-Dependent Plasticity

(STDP) [9] enables unsupervised learning at the edge using

emerging sub-milliwatt neuromorphic chips [10], [11], as

opposed to power-hungry GPUs. Furthermore, SNN-STDP

systems are also a good choice for continual, on-line learning

[12], dropping the need for ofﬂine SNN training. Hence, this

work studies the use of SNNs equipped with STDP learning

for fusing DVS and radar data on power-constrained drones.

The use of radar sensing for drone navigation has recently

been investigated in a growing number of works [13],

[14], [15], [16], [17]. Indeed, fusing radar sensing with

vision-based sensors such as DVS is attractive since radars

intrinsically provide complementary information to camera

vision, such as target velocity and position in a range-azimuth

map (also called bird’s eye view vs. projective plane in

cameras) [17]. In addition, radars are robust to environmental

conditions as they are not sensitive to occlusion by dirt and

can sense in the dark [14].

This work deviates from most fusion-based SLAMs by

proposing a ﬁrst-of-its-kind, bio-inspired SLAM system fus-

ing event camera with radar (see Fig. 2, 4), using SNNs that

continuously learn via STDP, as observed in the brain [9].

It also deviates from most learning-based SLAM systems

which typically require the ofﬂine training of a Deep Neural

Network (DNN) on a dataset of the working environment

captured beforehand [2]. In contrast, our DVS-Radar fusion

SNN learns on the ﬂy and keeps adapting its weights via

unsupervised STDP as the drone explores the environment.

At the same time, the SNN outputs are fed to a bio-inspired

RatSLAM back-end [18] for loop closure detection and map

correction. Crucially, our continual STDP learning approach

enables the deployment of our system in environments not

captured by datasets and therefore, not known a priori (vs.

ofﬂine training in state-of-the-art DNN-based SLAMs [2]).

We use our sensor fusion drone shown in Fig. 2 to jointly

acquire DVS and radar data during multiple drone ﬂights in a

challenging, indoor environment, in order to perform SLAM

(see Fig. 2 b). We assess the performance of our proposed

DVS-Radar SNN-STDP SLAM system against ground truth

positioning, recorded via Ultra Wide Band (UWB) beacons

[6]. The main contributions of this paper are the following:

1) We propose what is, to the best of our knowledge, the

ﬁrst continual-learning SLAM system which fuses an

event-based camera and an FMCW radar using SNNs.

2) We propose a method for radar-gyroscope odometry,

where radar sensing provides the drone’s velocity, and

a method for obstacle modelling via radar detections.

3) We experimentally assess the performance of our

SLAM system on three different ﬂight sequences in a

challenging warehouse environment and we show the

robustness of our system to strong lighting variations.

This paper is organized as follows. Related works are

discussed in Section II, followed by background theory

covered in Section III. Our proposed methods are presented

in Section IV. Experimental results are shown in Section V.

Conclusions are provided in Section VI.

II. RELATED WORKS

A growing number of bio-inspired [18], [19], [20] and

sensor fusion [1], [2], [4] SLAM systems have been proposed

in recent years . Among the most related to this work, a DVS-

RGB SLAM system has been proposed in [1], providing

robust state estimations by fusing event-based cameras, RGB

and raw IMU odometry. In addition, the system of [1] has

been implemented on a drone for indoor navigation and was

shown to be robust to drastic changes in lighting conditions

and in low-light scenarios. In contrast to [1], which makes

use of hand-crafted features, our system uses an SNN with

STDP learning. In addition, we do not fuse the richer

information obtained from RGB as in [1], but rather fuse

DVS with sparser radar detections instead, which makes the

reliable template matching of the SNN outputs challenging.

Recently, the LatentSLAM system has been proposed in

[2] as a learning-based pipeline using a DNN encoder which

provides latent codes for template matching, trained ofﬂine

on a dataset capturing the environment in which the robot

must navigate. The inferred latent codes are fed to the loop

closure detection and map correction back-end of the popular

RatSLAM system [18] to correct the drift in raw odometry.

In contrast, our proposed continual learning system does not

require any ofﬂine training phase, enabling its deployment in

unseen environments without the requirement of capturing a

dataset of the working environment beforehand.

As stated earlier, we make use of the RatSLAM loop

closure detection and map correction back-end in this work

[18], [21]. RatSLAM has been proposed as a bio-inspired

system following the navigational processes of the rat’s

hippocampus [18]. Even though the original RatSLAM uses

raw RGB images for template matching, further evolutions

of RatSLAM, such as LatentSLAM, replace the raw RGB

input by the stream of associated latent codes obtained

through learned feature extraction [2]. In this work, we feed

both our proposed radar-gyroscope odometry and the latent

codes inferred by our proposed continual learning SNN-

STDP fusion system to the RatSLAM back-end.

Since RGB-based SLAMs constitute today’s state of the

art, we will benchmark our DVS-Radar SLAM against both

LatentSLAM and RatSLAM, and against ORB features [22]

extensively used in state-of-the-art SLAM systems [3], [23].

III. SNN BACKGROUND THEORY

Unlike frame-based DNNs, SNNs make use of event-

driven spiking neurons as activation function, often modelled

using the Leaky Integrate and Fire (LIF) neurons [24]:

(dV

dt =1

τm(Jin −V)with Jin = ¯wT

syn ¯s(t)

σ= 1, V ←− 0if V≥µ, else σ= 0 (1)

where σis the spiking output, Vthe membrane potential,

τmthe membrane time constant, µthe neuron threshold

and Jin = ¯wT

syn ¯s(t)the input to the neuron, resulting from

the inner product between the neuron weights ¯wsyn and the

spiking input vector ¯s(t). The LIF continuously integrates

its input Jin in Vfollowing (1). When Vcrosses the ﬁring

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FusingEvent-basedCameraandRadarforSLAMUsingSpikingNeuralNetworkswithContinualSTDPLearningAliSafa1;4,TimVerbelen2;4,IljaOcket4,Andr´eBourdoux4,HichemSahli3;4,FranckyCatthoor1;4,GeorgesGielen1;4AbstractThisworkproposesarst-of-its-kindSLAMar-chitecturefusinganevent-basedcameraandaFrequencyMod-ulatedC...

展开>> 收起<<

Fusing Event-based Camera and Radar for SLAM Using Spiking Neural Networks with Continual STDP Learning Ali Safa14 Tim Verbelen24 Ilja Ocket4 Andr e Bourdoux4 Hichem Sahli34.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Fusing Event-based Camera and Radar for SLAM Using Spiking Neural Networks with Continual STDP Learning Ali Safa14 Tim Verbelen24 Ilja Ocket4 Andr e Bourdoux4 Hichem Sahli34

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: