DOTIE - D etecting O bjects through T emporal I solation of E vents using a Spiking Architecture Manish Nagaraj Chamika Mihiranga Liyanagedera and Kaushik Roy

2025-04-27 1 0 1.17MB 7 页 10玖币

侵权投诉

DOTIE - Detecting Objects through Temporal Isolation of Events using

a Spiking Architecture

Manish Nagaraj, Chamika Mihiranga Liyanagedera and Kaushik Roy

Abstract— Vision-based autonomous navigation systems rely

on fast and accurate object detection algorithms to avoid

obstacles. Algorithms and sensors designed for such systems

need to be computationally efﬁcient, due to the limited energy

of the hardware used for deployment. Biologically inspired

event cameras are a good candidate as a vision sensor for such

systems due to their speed, energy efﬁciency, and robustness

to varying lighting conditions. However, traditional computer

vision algorithms fail to work on event-based outputs, as they

lack photometric features such as light intensity and texture.

In this work, we propose a novel technique that utilizes

the temporal information inherently present in the events

to efﬁciently detect moving objects. Our technique consists

of a lightweight spiking neural architecture that is able to

separate events based on the speed of the corresponding objects.

These separated events are then further grouped spatially to

determine object boundaries. This method of object detection

is both asynchronous and robust to camera noise. In addition,

it shows good performance in scenarios with events generated

by static objects in the background, where existing event-based

algorithms fail. We show that by utilizing our architecture,

autonomous navigation systems can have minimal latency and

energy overheads for performing object detection.

I. INTRODUCTION

Autonomous navigation is emerging as an important area

of research, with applications ranging from videography and

surveillance in remote regions to transportation systems in

populated urban areas. The Society of Automation Engineers

(SAE) has identiﬁed six levels of automation for autonomous

navigation [1], starting from systems with no automation

(level 0) to systems that are fully automated and do not

require any operator (level 5). One of the most basic and

essential requirements of a system with automation (level 1

and above), is the ability to detect and avoid obstacles. And

most importantly, these systems must be able to perform

object detection accurately at very high speeds.

Recent advancements in imaging technology have led to

the development of biologically inspired event cameras [2]–

[5]. While traditional frame cameras capture photometric

features such as light intensity and texture with high spatial

resolution at ﬁxed intervals, event cameras are asynchronous

and only capture the change in light intensities at each

pixel. Although event cameras fail at capturing photometric

features, they do not suffer from issues such as motion blur

and can operate at a much higher frequency and under a

*This work was supported in part by, Center for Brain-inspired Comput-

ing (C-BRIC), a DARPA sponsored JUMP center, Semiconductor Research

Corporation (SRC), National Science Foundation, the DoD Vannevar Bush

Fellowship, and IARPA MicroE4AI.

All authors are with Purdue University, West Lafayette, IN 47907, USA

{mnagara, cliyanag, kaushik}@purdue.edu

wider range of illumination. Owing to their higher output

frequency, event cameras can capture high resolution tem-

poral information that are missed by frame cameras. Since

object detection and subsequent applications such as collision

avoidance need to be performed at a very high speed and

over a wide range of illumination, event cameras become

ideal candidates for this task.

While there has been a plethora of research on object

and feature detection algorithms, most of these algorithms

are designed for frame camera outputs. These algorithms

fail on event camera outputs as they rely on photometric

features that are only present in frame data. For example,

Fig. 1. shows the performance of both, a preliminary edge

detection algorithm - the canny ﬁlter [6], and a complex

learning based object recognition algorithm - YOLOv3 [7],

on a frame camera output and an event camera output of the

same scene. While the algorithms can successfully function

on the former, they fail to function on the latter.

(a) (b) (c)

(d) (e) (f)

Fig. 1: Example to show the inability of traditional computer

vision algorithms to perform on event camera outputs. (a)

and (d) show the frame and event camera outputs captured

on the same scene (taken from the MVSEC dataset [8]). (b)

and (e) show the outputs of applying a canny ﬁlter on the

respective images. (c) and (f) show the output of YOLOv3 on

the respective images. We can see that both canny ﬁltering

and YOLOv3 fail to perform on the event camera output.

This motivates us to explore and utilize temporal features

that are inherently present in the events, but are ignored

in traditional computer vision algorithms. Leveraging such

temporal information in events can help us in detecting

and differentiating objects in a scene. For this purpose, we

identify two properties important for object detection:

(i) Events generated by the same object are temporally

close to each other.

arXiv:2210.00975v1 [cs.CV] 3 Oct 2022

(ii) Events generated by the same object are spatially close

to each other.

Based on these two properties, we aim to group events

along two dimensions - spatial and temporal, to isolate events

based on their source objects and thus identify the object

boundaries.

Biologically inspired spiking neurons, speciﬁcally Leaky

Integrate and Fire (LIF) neurons [9], can leverage spatio-

temporal information and are hence, well suited for achiev-

ing boundary detection along both these dimensions. These

neurons mimic the brain activity and behave based on the

temporal properties of the inputs to the neuron. The neuron

generates an output spike only if the input events occur at a

rate higher than a certain frequency. A neural architecture

with these spiking neurons is sensitive to the temporal

structure of input events. It is also possible to arrange the

connectivity between these neurons in a network to enable

support between events that are spatially close to each other.

Further, spiking neurons work in an asynchronous fashion,

making them compatible with the asynchronous outputs of

event cameras. These properties, along with the fact that

spiking neural architectures can be more energy efﬁcient (as

shown in [10]–[12]) than their artiﬁcial neural counterparts,

make them an ideal candidate for performing object detection

with a low latency and energy overhead for autonomous

navigation systems.

In this work, we develop a novel and energy efﬁcient

object detection technique to ﬁrst isolate objects based on the

speed of their movement. This is done using inputs from an

event camera and a lightweight single layer spiking neural

network. Once objects are separated based on their speed

of movement, their corresponding events are then grouped

together based on their spatial characteristics. For this, we

utilize existing clustering techniques to further separate the

events belonging to different objects, based on their spatial

proximity. Here, we are taking advantage of the fact that

the events from the same object, share similar temporal

characteristics such as their speed of movement, and spatial

characteristics such as being generated by pixels that are spa-

tially close together. Our experiments show that this approach

is a more efﬁcient way to detect objects in terms of both

latency and energy consumption. By isolating events based

on the speed of movement of their corresponding objects, we

are also able to eliminate the non-relevant information caused

by noise and background static objects1. Further, this beneﬁts

the clustering techniques as they have a lower operational

complexity due to the reduced number of samples (events)

that need to be clustered.

We summarize the main contributions as follows:

1) We develop an object detection algorithm that solely

relies on the event camera outputs and does not need any

additional information from traditional frame cameras.

2) Unlike many existing works on object detection which

accumulate events for a time duration to create a frame,

1If there is a need to detect static objects, this can be done by clustering

the residual events that do not propagate through the spiking architecture.

we perform object detection asynchronously as the

events are being generated by the event camera.

3) The proposed spike-based network for separating ob-

jects based on their motion consists of a single layer,

resulting in a detection algorithm with lower latency and

energy overhead.

4) The outputs of the proposed spiking architecture (which

isolates objects based on their speed), can be used with

any spatial clustering technique that does not require

prior knowledge of the number or the size of clusters.

5) The spiking architecture is scene independent. This

means that we do not have to train the parameters

of the architecture based on the scene of deployment.

These parameters directly correspond to the speed of

the objects and can be ﬁne-tuned prior to deployment.

II. RELATED WORK

A. Object Detection in the Event Camera Domain

Object detection is a topic that has been extensively

studied in the computer vision community. There have been

works ranging from simple feature detectors [6], [13], to

more complex learning-based methods [14]. There has also

been a signiﬁcant interest in the learning community on

neural networks that can not only detect objects, but can

also classify them into different classes [7]. However, when it

comes to autonomous navigation where object classiﬁcation

is not a priority, the latency and energy efﬁciency of the

underlying algorithms take precedence.

As discussed earlier, traditional frame-based algorithms

fail to operate on event camera outputs due to the absence of

photometric characteristics such as texture and light intensity.

However, owing to the numerous advantages of event cam-

eras, including higher operating speed, wider dynamic range,

and lower power consumption, there has been a substantial

interest in the community to develop algorithms that are more

suited towards this domain.

Initial event-based detection algorithms such as [15], [16]

were focused on detecting patterns present in the event cam-

era output. The authors in [17] used a simple blob detector

to detect the inherent patterns present in the event data, and

[18] used a plane ﬁtting method on the distribution of events

to identify corners. A recent work adapted Gaussian mixture

modelling to detect patterns in the event data [19]. These

methods, however, fail in scenarios where there are events

generated by the background. As a solution, [20] proposed a

motion compensation technique to eliminate events generated

by the background, by estimating the system’s ego-motion.

The optimization involved, however, adds signiﬁcant latency

and computational overhead to the system.

To improve the detection accuracy, several recent research

efforts were focused on utilizing information from both

frame and event cameras [21], [22]. These hybrid methods

detect features on the frames and track the objects through

events. Since their detection relies on frame inputs, they

cannot operate in scenarios with a wide dynamic range and

are computationally expensive.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DOTIE-DetectingObjectsthroughTemporalIsolationofEventsusingaSpikingArchitectureManishNagaraj,ChamikaMihirangaLiyanagederaandKaushikRoyAbstractVision-basedautonomousnavigationsystemsrelyonfastandaccurateobjectdetectionalgorithmstoavoidobstacles.Algorithmsandsensorsdesignedforsuchsystemsneedtobecompu...

展开>> 收起<<

DOTIE - D etecting O bjects through T emporal I solation of E vents using a Spiking Architecture Manish Nagaraj Chamika Mihiranga Liyanagedera and Kaushik Roy.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DOTIE - D etecting O bjects through T emporal I solation of E vents using a Spiking Architecture Manish Nagaraj Chamika Mihiranga Liyanagedera and Kaushik Roy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: