Towards Accurate Energy-Efficient Low-Latency Spiking LSTMs Gourav Datta1 Haoqin Deng2 Robert Aviles1 Peter A. Beerel1 1University of Southern California USA2University of Washington Seattle USA

2025-05-06 0 0 1.38MB 9 页 10玖币
侵权投诉
Towards Accurate, Energy-Efficient, & Low-Latency Spiking LSTMs
Gourav Datta1, Haoqin Deng2*, Robert Aviles1, Peter A. Beerel1
1University of Southern California, USA 2University of Washington, Seattle, USA
Abstract
Spiking Neural Networks (SNNs) have emerged as an attrac-
tive spatio-temporal computing paradigm for complex vision
tasks. However, most existing works yield models that re-
quire many time steps and do not leverage the inherent tempo-
ral dynamics of spiking neural networks, even for sequential
tasks. Motivated by this observation, we propose an optimized
spiking long short-term memory networks (LSTM) training
framework that involves a novel ANN-to-SNN conversion
framework, followed by SNN training. In particular, we pro-
pose novel activation functions in the source LSTM architec-
ture and judiciously select a subset of them for conversion
to integrate-and-fire (IF) activations with optimal bias shifts.
Additionally, we derive the leaky-integrate-and-fire (LIF) ac-
tivation functions converted from their non-spiking LSTM
counterparts which justifies the need to jointly optimize the
weights, threshold, and leak parameter. We also propose a
pipelined parallel processing scheme which hides the SNN
time steps, significantly improving system latency, especially
for long sequences. The resulting SNNs have high activation
sparsity and require only accumulate operations (AC), in con-
trast to expensive multiply-and-accumulates (MAC) needed
for ANNs, except for the input layer when using direct encod-
ing, yielding significant improvements in energy efficiency.
We evaluate our framework on sequential learning tasks in-
cluding temporal MNIST, Google Speech Commands (GSC),
and UCI Smartphone datasets on different LSTM architec-
tures. We obtain test accuracy of
94.75
% with only
2
time
steps with direct encoding on the GSC dataset with
4.1×
lower energy than an iso-architecture standard LSTM.
Introduction & Related Work
In contrast to the neurons in ANNs, the neurons in Spiking
Neural Networks (SNNs) are biologically inspired, receiv-
ing and transmitting information via spikes. SNNs promise
higher energy-efficiency than ANNs due to their high ac-
tivation sparsity and event-driven spike-based computation
(Diehl et al. 2016b) which helps avoid the costly multipli-
cation operations that dominate ANNs. To handle multi-bit
inputs, such as typical in traditional datasets and real-life
sensor-based applications, however, the inputs are often spike
encoded in the temporal domain using rate coding (Diehl et al.
2016b), temporal coding (Comsa et al. 2020), or rank-order
coding (Kheradpisheh et al. 2020). Alternatively, instead of
spike encoding the inputs, some researchers explored directly
feeding the analog pixel values in the first convolutional layer,
and thereby, emitting spikes only in the subsequent layers
(Rathi et al. 2020b). This can dramatically reduce the number
*Work done at University of Southern California
of time steps needed to achieve the state-of-the-art accuracy,
but comes at the cost that the first layer now requires MACs
(Rathi et al. 2020b; Datta et al. 2022; Kundu et al. 2021).
However, all these encoding techniques increase the end-
to-end latency (proportional to the number of time steps)
compared to their non-spiking counterparts.
In addition to accommodating various forms of spike en-
coding, supervised learning algorithms for SNNs, such as
surrogate gradient learning (SGL) have overcome various
roadblocks associated with the discontinuous derivative of
the spike activation function (Lee et al. 2016; Kim and Panda
2021b; Neftci, Mostafa, and Zenke 2019; Panda et al. 2020).
It is also commonly agreed that SNNs following the integrate-
and-fire (IF) compute model can be converted from ANNs
with low error by approximating the activation value of ReLU
neurons with the firing rate of spiking neurons (Sengupta et al.
2019; Rathi et al. 2020a; Diehl et al. 2016b). SNNs trained
using ANN-to-SNN conversion, coupled with SGL, have
been able to perform similar to SOTA CNNs in terms of test
accuracy in traditional image recognition tasks (Rathi et al.
2020b,a) with significant advantages in compute efficiency.
Previous works (Rathi et al. 2020b; Datta et al. 2021; Kundu
et al. 2021) have adopted SGL to jointly train the threshold
and leak values to improve the accuracy-latency tradeoff but
without any analytical justification.
Inspite of numerous innovations in SNN training algo-
rithms for static (Panda and Roy 2016; Panda et al. 2020;
Rathi et al. 2020b,a; Kim and Panda 2021b) and dynamic
vision tasks (Kim and Panda 2021a; Li et al. 2022), there has
been relatively fewer research that target SNNs for sequence
learning tasks. Among the existing works, some are limited
to the use of spiking inputs (Rezaabad and Vishwanath 2020;
Ponghiran and Roy 2021b)which might not represent several
real-world use cases. Furthermore, some (Deng and Gu 2021;
Moritz, Hori, and Roux 2019; Diehl et al. 2016a) propose
to yield SNNs from vanilla RNNs which has been shown to
yield a large accuracy drop for large-scale sequence learning
tasks, as they are unable to model temporal dependencies
for long sequences. Others (Ponghiran and Roy 2021a) use
the same input expansion approach for spike encoding and
yield SNNs which requires serial processing for each input
in the sequence, severely increasing total latency. A more
recent work (Ponghiran and Roy 2021b) proposed a more
complex neuron model compared to the popular IF or leaky-
integrate-and-fire (LIF) model, to improve the recurrence
dynamics for sequential learning. Additionally, it lets the
hidden activation maps be multi-bit (as opposed to binary
spikes) which improves training, but requires multiplications
that reduces energy efficiency compared to the multiplier-less
arXiv:2210.12613v1 [cs.NE] 23 Oct 2022
SNN models we develop. In particular, our work leverages
both the temporal and sparse dynamics of SNNs to reduce
the inference latency and energy consumption of large-scale
streaming ML workloads while achieving close to SOTA
accuracy.
The key contributions of our work are summarized below.
We propose a training framework that involves the conver-
sion from a pre-trained non-spiking LSTM to a spiking
LSTM model that minimizes conversion error. Our frame-
work involves three novel techniques. i) Converting the
traditional sigmoid and tanh activation functions in the
source LSTM to clipped versions, ii) judiciously selecting
a subset of these functions for conversion to IF activation
functions such that the SNN does not require the expen-
sive MAC operations, and iii) finding the optimal shifts
of the IF activation functions.
To the best of our knowledge, we are the first to obtain
a closed form expression of the LIF activation function
which, in particular, captures the impact of the leak term.
This function helps us analyze the post-conversion er-
ror between the non-spiking LSTM and LIF activation
outputs under non-uniform and non-identical input distri-
butions and motivates its reduction by jointly training the
threshold and leak term.
We propose a high-level parallel and pipelined implemen-
tation of the resulting SNN-based computations, which
coupled with our training algorithm, results in negligible
latency overheads compared to the baseline LSTM and
improves the hardware utilization.
We demonstrate the energy-latency-accuracy trade-off
benefits of our proposed framework through FPGA synthe-
sis and place-and-route, extensive ML experiments with
different LSTM architectures on sequential tasks from
computer vision (temporal MNIST), spoken term classifi-
cation (Google Speech Commands) and human activity
recogniton (UCI Smartphone) applications, and compar-
isons with existing spiking and non-spiking LSTMs.
Preliminaries
SNN IF/LIF Models
In this work, we adopt the popular IF and LIF models (Lee
et al. 2020b) to capture the computation dynamics of an
SNN. In both these models, a neuron transmits binary spike
trains (except the input layer for direct encoding) over a total
number of pre-defined time steps. To incorporate the temporal
input dimension, each neuron has an internal state called its
membrane potential
Ui(t)
which captures the integration
of the weight (denoted as
Wij
) modulated incoming spikes
(denoted as
Sj(t)
). In the LIF model,
Ui(t)
leaks with a fixed
time constant, denoted as
λ
(
λ= 1
for IF model). With
the spiking threshold represented as
Vth
, the LIF neuron
dynamics are expressed as
Utemp
i(t) = λUi(t1) + X
j
Wij Sj(t)(1)
Si(t) = Vth,if Utemp
i(t)> V th
0,otherwise (2)
Ui(t) = Utemp
i(t)Si(t)(3)
Surrogate Gradient Learning
Since the spiking neuron functionality is discontinuous and
non-differentiable, it is difficult to implement gradient de-
scent based backpropagation in SNNs. Hence, previous works
(Lee et al. 2020a; Neftci, Mostafa, and Zenke 2019) approx-
imate the spiking function with a continuous differentiable
function, which helps back-propagate non-zero gradients
known as surrogate gradients. The resulting weight update in
the lth hidden layer in the SNN is calculated as
Wl=X
t
L
Wl
=X
t
L
Ot
l
Ot
l
Ut
l
Ut
l
Wl
=X
t
L
Ot
l
Ot
l
Ut
l
Ot
l1
where
Ot
l
and
Ut
l
denote the spike output and membrane
potential tensor of the
lth
layer respectively at time step
t
.
Ot
l
Ut
l
is the non-differentiable gradient which can be ap-
proximated with the surrogate gradient,
Ot
l
Ut
l
=γ
Vth
l
·
max(0,1abs(Ut
l
Vth
l
1))
, where
Vth
l
is the
lth
layer thresh-
old and
γ
is a hyperparameter denoting the maximum gradi-
ent value (Bellec et al. 2018a).
Proposed Training Framework
Non-spiking LSTM
In order to yield accurate LSTM-based SNN models, we first
replace the traditional tanh and sigmoid activation functions
in the baseline LSTM model with their hard (clipped) ver-
sions, as illustrated in Fig. 1(a-b). Unlike previous works
(Ponghiran and Roy 2021a), we decouple the hard tanh func-
tion into two hard sigmoid functions. Hence, we have a single
threshold value, denoted as
Vth
sig
for the hard sigmoid function
whose outputs are always positive, but two threshold values
(one positive, denoted as
Vth
tanh+
for output values ranging
from 0 to +1 and one negative denoted as
Vth
tanh
for output
values ranging from 0 to -1) for the hard tanh function. This
approach enables both the hard sigmoid and tanh functions
to be implemented with threshold ReLU functions which
have been shown to improve the accuracy of ANN-to-SNN
conversion (Sengupta et al. 2019; Deng and Gu 2021).
Conversion to SNN
The LIF outputs
Ssig(t)
and
Stanh(t)
at time step
t
converted
from the sigmoid and tanh functions respectively is
摘要:

TowardsAccurate,Energy-Efcient,&Low-LatencySpikingLSTMsGouravDatta1,HaoqinDeng2*,RobertAviles1,PeterA.Beerel11UniversityofSouthernCalifornia,USA2UniversityofWashington,Seattle,USAAbstractSpikingNeuralNetworks(SNNs)haveemergedasanattrac-tivespatio-temporalcomputingparadigmforcomplexvisiontasks.Howev...

展开>> 收起<<
Towards Accurate Energy-Efficient Low-Latency Spiking LSTMs Gourav Datta1 Haoqin Deng2 Robert Aviles1 Peter A. Beerel1 1University of Southern California USA2University of Washington Seattle USA.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:1.38MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注