Event-based Temporally Dense Optical Flow Estimation with Sequential Learning Wachirawit Ponghiran Chamika Mihiranga Liyanagedera Kaushik Roy

2025-04-27 0 0 1.98MB 10 页 10玖币
侵权投诉
Event-based Temporally Dense
Optical Flow Estimation with Sequential Learning
Wachirawit Ponghiran Chamika Mihiranga Liyanagedera Kaushik Roy
Purdue University
West Lafayette, IN 47907, USA
{wponghir,cliyanag,kaushik}@purdue.edu
Abstract
Event cameras provide an advantage over traditional
frame-based cameras when capturing fast-moving objects
without a motion blur. They achieve this by recording
changes in light intensity (known as events), thus allow-
ing them to operate at a much higher frequency and mak-
ing them suitable for capturing motions in a highly dy-
namic scene. Many recent studies have proposed methods
to train neural networks (NNs) for predicting optical flow
from events. However, they often rely on a spatio-temporal
representation constructed from events over a fixed interval,
such as 10 Hz used in training on the DSEC dataset. This
limitation restricts the flow prediction to the same interval
(10 Hz) whereas the fast speed of event cameras, which can
operate up to 3 kHz, has not been effectively utilized. In this
work, we show that a temporally dense flow estimation at
100 Hz can be achieved by treating the flow estimation as a
sequential problem using two different variants of recurrent
networks – Long-short term memory (LSTM) and spiking
neural network (SNN). First, We utilize the NN model con-
structed similar to the popular EV-FlowNet but with LSTM
layers to demonstrate the efficiency of our training method.
The model not only produces 10×more frequent optical
flow than the existing ones, but the estimated flows also
have 13% lower errors than predictions from the baseline
EV-FlowNet. Second, we construct an EV-FlowNet SNN
but with leaky integrate and fire neurons to efficiently cap-
ture the temporal dynamics. We found that simple inherent
recurrent dynamics of SNN lead to significant parameter re-
duction compared to the LSTM model. In addition, because
of its event-driven computation, the spiking model is esti-
This work was supported in part by, Center for Brain-inspired Com-
puting (C-BRIC), a DARPA sponsored JUMP center, Semiconductor Re-
search Corporation (SRC), National Science Foundation, the DoD Van-
nevar Bush Fellowship, and IARPA MicroE4AI.
Code is available at https://github.com/wponghiran/
temporally_dense_flow
mated to consume only 1.5% energy of the LSTM model,
highlighting the efficiency of SNN in processing events and
the potential for achieving temporally dense flow.
1. Introduction
Optical flow estimation is a core problem in computer
vision that evaluates the motion of each pixel between any
two consecutive images captured by a frame-based cam-
era. Optical flow information enables an observer to visu-
alize a motion field which is useful for numerous applica-
tions such as object trajectory prediction [21], robotic con-
trol [25], and autonomous driving [16]. The problem has
been traditionally addressed using various classical com-
puter vision techniques like correlation-based [27], block-
matching [1] and energy minimization-based [14] tech-
niques, but their computational costs have shown to be
prohibitively expensive for real-time applications. Neural
network (NN) based techniques for optical flow prediction
[6,22,28] have been proposed and remain a popular low-
cost computing method. Generally, NN models receive two
consecutive images taken by a frame-based camera as in-
put and predict the optical flow that best warps pixels from
one image to another. However, due to the limited dynamic
range of such frame-based cameras, the performance of the
aforementioned techniques may be affected by motion blur
or temporal aliasing.
Methods to estimate optical flow from event camera out-
puts offer a promising alternative to the frame-based ap-
proaches [12,18,19,31,33,34]. An event camera logs light
intensity change at each pixel (so-called events) rather than
measuring actual light intensity for a fixed duration. Thus,
an event camera can generate a stream of events at high
temporal resolution as illustrated in Fig. 1(a). The reso-
lution may be as small as 300 µs [7], making event-based
optical flow estimation less susceptible to motion blur and
more suitable for a highly dynamic scene. Nonetheless,
being able to effectively extract information from a high-
arXiv:2210.01244v2 [cs.CV] 12 Oct 2023
Outputs of a traditional frame-based camera
time
x
x
y
y
Outputs of an event camera
time
Collection of event counts
or event representation
Event count
Positive events
t=0
flow
flow
t=10
t=10 t=10 t=20
t=20
(a)
(b) (c)
Grayscale
frame
Negative events
t=16 t=18 t=20
t=0
t=10 t=20t=16t=0
Existing
NN
Existing
NN Proposed
NN
Proposed
NN
Proposed
NN
...
...
1
st
2
nd
flow
n+1
th
flow
n
th
...
Past
event
count
Past
flow
flow
n+2
th
Figure 1. (a) Comparison between outputs of a traditional frame-
based and event camera. (b) Existing NN models typically rely
on a collection of events for optical flow prediction. (c) We train
NN models with memory elements to process each event count so
that they can perform more frequent optical flow estimation. Red
arrows indicate information flow from a past to a future time-step.
frequency event stream is a challenging task. An event cam-
era outputs events at a fast rate but in an asynchronous and
noisy manner. To ensure high fidelity of the inputs to the
NN models, existing works collect events over a fixed pe-
riod (often a duration between two consecutive optical flow
ground truths) and construct a spatio-temporal representa-
tion for optical flow estimation. Hence, optical flow is eval-
uated at a speed slower than the rate that events are pro-
duced by an event camera as illustrated in Fig. 1(b). Eval-
uating optical flow at a faster rate can be crucial for cer-
tain applications, such as dodging an obstacle during navi-
gation [24], where fast reaction time is essential.
To predict temporally dense optical flow, we cast the
event-based optical flow estimation as a sequential learn-
ing problem. We consider the event stream as a long
correlated sequence over time rather than multiple inde-
pendent sequences of inputs like in the existing works
[9,18,19,31,33,34]. This approach allows us to reduce the
time needed to collect events as depicted in Fig. 1(c). We
train the NN models to learn the trajectory from each event
count and use the collected information to estimate optical
flows. NN models are hence, required to have internal states
that are capable of retaining history. For demonstrating the
efficiency of our training method, we first construct an NN
model similar to the commonly used model in event-based
optical flow estimation, EV-FlowNet [33], but replace each
convolutional layer with a layer of convolutional long-short
term memory (LSTM) [26]. The use of LSTM allows pre-
vious event information to be stored and evolved through
time. To demonstrate the possibility of implementing tem-
porally dense optical flow estimation for real-time applica-
tion, we construct another NN model similar to EV-FlowNet
but replace stateless neurons (like ReLU) with stateful spik-
ing neurons [10]. Spiking neural networks (SNNs) have
been previously proposed to address the inefficiency of typ-
ical neural networks in handling events which are sparse
in nature [18,19]. Note that neurons communicate with
other neurons through binary values, and hence, SNNs of-
fer power savings on event-driven hardware by process-
ing only non-zero inputs. In addition, SNNs have internal
states (membrane potentials) which enable them to retain
information over time. This inherent recurrence in SNNs
can be advantageous for sequential learning tasks such as
temporally dense optical flow estimation. We demonstrate
that our training methodology can be applied to the spik-
ing models, resulting in a model with significantly fewer
parameters than the corresponding LSTM model. Our esti-
mation reveals that the spiking model consumes only 58%
energy compared to the baseline EV-FlowNet while predict-
ing 10×more frequent optical flow. Successful training of
the spiking model serves as the first step to realize tempo-
rally dense flow estimation on a neuromorphic chip like In-
tel Loihi [3] which recently achieved a throughput of 1000+
fps for multi-layer convolutional SNN computation [30].
Throughout this work, we refer to the two proposed mod-
els as LSTM-FlowNet and EfficientSpike-FlowNet, respec-
tively, for short. Steps to train both models for temporally
dense optical flow estimation are, nonetheless, not straight-
forward. A proper encoding scheme must be adapted to
deliver event information during every small duration to the
models. For this purpose, we use per-pixel event count ob-
tained through simple aggregation over a small time period.
Temporal information of the events is implicitly encoded in
the order that the event counts are fed to the models. De-
spite its simplicity, we show that the event count is sufficient
for optical flow estimation and in fact leads to better predic-
tion with a sequential learning methodology. Another chal-
lenge comes from a typical assumption in sequential learn-
ing that an input has a limited length. However, an input in
摘要:

Event-basedTemporallyDenseOpticalFlowEstimationwithSequentialLearningWachirawitPonghiranChamikaMihirangaLiyanagederaKaushikRoyPurdueUniversityWestLafayette,IN47907,USA{wponghir,cliyanag,kaushik}@purdue.eduAbstractEventcamerasprovideanadvantageovertraditionalframe-basedcameraswhencapturingfast-moving...

展开>> 收起<<
Event-based Temporally Dense Optical Flow Estimation with Sequential Learning Wachirawit Ponghiran Chamika Mihiranga Liyanagedera Kaushik Roy.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:1.98MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注