Towards Accurate Energy-Efﬁcient Low-Latency Spiking LSTMs Gourav Datta1 Haoqin Deng2 Robert Aviles1 Peter A. Beerel1 1University of Southern California USA2University of Washington Seattle USA

2025-05-06 0 0 1.38MB 9 页 10玖币

侵权投诉

Towards Accurate, Energy-Efﬁcient, & Low-Latency Spiking LSTMs

Gourav Datta1, Haoqin Deng2*, Robert Aviles1, Peter A. Beerel1

1University of Southern California, USA 2University of Washington, Seattle, USA

Abstract

Spiking Neural Networks (SNNs) have emerged as an attrac-

tive spatio-temporal computing paradigm for complex vision

tasks. However, most existing works yield models that re-

quire many time steps and do not leverage the inherent tempo-

ral dynamics of spiking neural networks, even for sequential

tasks. Motivated by this observation, we propose an optimized

spiking long short-term memory networks (LSTM) training

framework that involves a novel ANN-to-SNN conversion

framework, followed by SNN training. In particular, we pro-

pose novel activation functions in the source LSTM architec-

ture and judiciously select a subset of them for conversion

to integrate-and-ﬁre (IF) activations with optimal bias shifts.

Additionally, we derive the leaky-integrate-and-ﬁre (LIF) ac-

tivation functions converted from their non-spiking LSTM

counterparts which justiﬁes the need to jointly optimize the

weights, threshold, and leak parameter. We also propose a

pipelined parallel processing scheme which hides the SNN

time steps, signiﬁcantly improving system latency, especially

for long sequences. The resulting SNNs have high activation

sparsity and require only accumulate operations (AC), in con-

trast to expensive multiply-and-accumulates (MAC) needed

for ANNs, except for the input layer when using direct encod-

ing, yielding signiﬁcant improvements in energy efﬁciency.

We evaluate our framework on sequential learning tasks in-

cluding temporal MNIST, Google Speech Commands (GSC),

and UCI Smartphone datasets on different LSTM architec-

tures. We obtain test accuracy of

94.75

% with only

time

steps with direct encoding on the GSC dataset with

∼4.1×

lower energy than an iso-architecture standard LSTM.

Introduction & Related Work

In contrast to the neurons in ANNs, the neurons in Spiking

Neural Networks (SNNs) are biologically inspired, receiv-

ing and transmitting information via spikes. SNNs promise

higher energy-efﬁciency than ANNs due to their high ac-

tivation sparsity and event-driven spike-based computation

(Diehl et al. 2016b) which helps avoid the costly multipli-

cation operations that dominate ANNs. To handle multi-bit

inputs, such as typical in traditional datasets and real-life

sensor-based applications, however, the inputs are often spike

encoded in the temporal domain using rate coding (Diehl et al.

2016b), temporal coding (Comsa et al. 2020), or rank-order

coding (Kheradpisheh et al. 2020). Alternatively, instead of

spike encoding the inputs, some researchers explored directly

feeding the analog pixel values in the ﬁrst convolutional layer,

and thereby, emitting spikes only in the subsequent layers

(Rathi et al. 2020b). This can dramatically reduce the number

*Work done at University of Southern California

of time steps needed to achieve the state-of-the-art accuracy,

but comes at the cost that the ﬁrst layer now requires MACs

(Rathi et al. 2020b; Datta et al. 2022; Kundu et al. 2021).

However, all these encoding techniques increase the end-

to-end latency (proportional to the number of time steps)

compared to their non-spiking counterparts.

In addition to accommodating various forms of spike en-

coding, supervised learning algorithms for SNNs, such as

surrogate gradient learning (SGL) have overcome various

roadblocks associated with the discontinuous derivative of

the spike activation function (Lee et al. 2016; Kim and Panda

2021b; Neftci, Mostafa, and Zenke 2019; Panda et al. 2020).

It is also commonly agreed that SNNs following the integrate-

and-ﬁre (IF) compute model can be converted from ANNs

with low error by approximating the activation value of ReLU

neurons with the ﬁring rate of spiking neurons (Sengupta et al.

2019; Rathi et al. 2020a; Diehl et al. 2016b). SNNs trained

using ANN-to-SNN conversion, coupled with SGL, have

been able to perform similar to SOTA CNNs in terms of test

accuracy in traditional image recognition tasks (Rathi et al.

2020b,a) with signiﬁcant advantages in compute efﬁciency.

Previous works (Rathi et al. 2020b; Datta et al. 2021; Kundu

et al. 2021) have adopted SGL to jointly train the threshold

and leak values to improve the accuracy-latency tradeoff but

without any analytical justiﬁcation.

Inspite of numerous innovations in SNN training algo-

rithms for static (Panda and Roy 2016; Panda et al. 2020;

Rathi et al. 2020b,a; Kim and Panda 2021b) and dynamic

vision tasks (Kim and Panda 2021a; Li et al. 2022), there has

been relatively fewer research that target SNNs for sequence

learning tasks. Among the existing works, some are limited

to the use of spiking inputs (Rezaabad and Vishwanath 2020;

Ponghiran and Roy 2021b)which might not represent several

real-world use cases. Furthermore, some (Deng and Gu 2021;

Moritz, Hori, and Roux 2019; Diehl et al. 2016a) propose

to yield SNNs from vanilla RNNs which has been shown to

yield a large accuracy drop for large-scale sequence learning

tasks, as they are unable to model temporal dependencies

for long sequences. Others (Ponghiran and Roy 2021a) use

the same input expansion approach for spike encoding and

yield SNNs which requires serial processing for each input

in the sequence, severely increasing total latency. A more

recent work (Ponghiran and Roy 2021b) proposed a more

complex neuron model compared to the popular IF or leaky-

integrate-and-ﬁre (LIF) model, to improve the recurrence

dynamics for sequential learning. Additionally, it lets the

hidden activation maps be multi-bit (as opposed to binary

spikes) which improves training, but requires multiplications

that reduces energy efﬁciency compared to the multiplier-less

arXiv:2210.12613v1 [cs.NE] 23 Oct 2022

SNN models we develop. In particular, our work leverages

both the temporal and sparse dynamics of SNNs to reduce

the inference latency and energy consumption of large-scale

streaming ML workloads while achieving close to SOTA

accuracy.

The key contributions of our work are summarized below.

•

We propose a training framework that involves the conver-

sion from a pre-trained non-spiking LSTM to a spiking

LSTM model that minimizes conversion error. Our frame-

work involves three novel techniques. i) Converting the

traditional sigmoid and tanh activation functions in the

source LSTM to clipped versions, ii) judiciously selecting

a subset of these functions for conversion to IF activation

functions such that the SNN does not require the expen-

sive MAC operations, and iii) ﬁnding the optimal shifts

of the IF activation functions.

•

To the best of our knowledge, we are the ﬁrst to obtain

a closed form expression of the LIF activation function

which, in particular, captures the impact of the leak term.

This function helps us analyze the post-conversion er-

ror between the non-spiking LSTM and LIF activation

outputs under non-uniform and non-identical input distri-

butions and motivates its reduction by jointly training the

threshold and leak term.

•

We propose a high-level parallel and pipelined implemen-

tation of the resulting SNN-based computations, which

coupled with our training algorithm, results in negligible

latency overheads compared to the baseline LSTM and

improves the hardware utilization.

•

We demonstrate the energy-latency-accuracy trade-off

beneﬁts of our proposed framework through FPGA synthe-

sis and place-and-route, extensive ML experiments with

different LSTM architectures on sequential tasks from

computer vision (temporal MNIST), spoken term classiﬁ-

cation (Google Speech Commands) and human activity

recogniton (UCI Smartphone) applications, and compar-

isons with existing spiking and non-spiking LSTMs.

Preliminaries

SNN IF/LIF Models

In this work, we adopt the popular IF and LIF models (Lee

et al. 2020b) to capture the computation dynamics of an

SNN. In both these models, a neuron transmits binary spike

trains (except the input layer for direct encoding) over a total

number of pre-deﬁned time steps. To incorporate the temporal

input dimension, each neuron has an internal state called its

membrane potential

Ui(t)

which captures the integration

of the weight (denoted as

Wij

) modulated incoming spikes

(denoted as

Sj(t)

). In the LIF model,

Ui(t)

leaks with a ﬁxed

time constant, denoted as

(

λ= 1

for IF model). With

the spiking threshold represented as

Vth

, the LIF neuron

dynamics are expressed as

Utemp

i(t) = λUi(t−1) + X

Wij Sj(t)(1)

Si(t) = Vth,if Utemp

i(t)> V th

0,otherwise (2)

Ui(t) = Utemp

i(t)−Si(t)(3)

Surrogate Gradient Learning

Since the spiking neuron functionality is discontinuous and

non-differentiable, it is difﬁcult to implement gradient de-

scent based backpropagation in SNNs. Hence, previous works

(Lee et al. 2020a; Neftci, Mostafa, and Zenke 2019) approx-

imate the spiking function with a continuous differentiable

function, which helps back-propagate non-zero gradients

known as surrogate gradients. The resulting weight update in

the lth hidden layer in the SNN is calculated as

∆Wl=X

∂L

∂Wl

∂L

∂Ot

∂Ut

∂Wl

∂L

∂Ot

∂Ut

l−1

where

and

denote the spike output and membrane

potential tensor of the

lth

layer respectively at time step

∂Ot

∂Ut

is the non-differentiable gradient which can be ap-

proximated with the surrogate gradient,

∂Ot

∂Ut

=γ

Vth

max(0,1−abs(Ut

Vth

−1))

, where

Vth

is the

lth

layer thresh-

old and

is a hyperparameter denoting the maximum gradi-

ent value (Bellec et al. 2018a).

Proposed Training Framework

Non-spiking LSTM

In order to yield accurate LSTM-based SNN models, we ﬁrst

replace the traditional tanh and sigmoid activation functions

in the baseline LSTM model with their hard (clipped) ver-

sions, as illustrated in Fig. 1(a-b). Unlike previous works

(Ponghiran and Roy 2021a), we decouple the hard tanh func-

tion into two hard sigmoid functions. Hence, we have a single

threshold value, denoted as

Vth

sig

for the hard sigmoid function

whose outputs are always positive, but two threshold values

(one positive, denoted as

Vth

tanh+

for output values ranging

from 0 to +1 and one negative denoted as

Vth

tanh−

for output

values ranging from 0 to -1) for the hard tanh function. This

approach enables both the hard sigmoid and tanh functions

to be implemented with threshold ReLU functions which

have been shown to improve the accuracy of ANN-to-SNN

conversion (Sengupta et al. 2019; Deng and Gu 2021).

Conversion to SNN

The LIF outputs

Ssig(t)

and

Stanh(t)

at time step

converted

from the sigmoid and tanh functions respectively is

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsAccurate,Energy-Efcient,&Low-LatencySpikingLSTMsGouravDatta1,HaoqinDeng2*,RobertAviles1,PeterA.Beerel11UniversityofSouthernCalifornia,USA2UniversityofWashington,Seattle,USAAbstractSpikingNeuralNetworks(SNNs)haveemergedasanattrac-tivespatio-temporalcomputingparadigmforcomplexvisiontasks.Howev...

展开>> 收起<<

Towards Accurate Energy-Efﬁcient Low-Latency Spiking LSTMs Gourav Datta1 Haoqin Deng2 Robert Aviles1 Peter A. Beerel1 1University of Southern California USA2University of Washington Seattle USA.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Accurate Energy-Efﬁcient Low-Latency Spiking LSTMs Gourav Datta1 Haoqin Deng2 Robert Aviles1 Peter A. Beerel1 1University of Southern California USA2University of Washington Seattle USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: