1 SpikeSim An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking

2025-04-30 0 0 1.39MB 14 页 10玖币
侵权投诉
1
SpikeSim: An end-to-end Compute-in-Memory
Hardware Evaluation Tool for Benchmarking
Spiking Neural Networks
Abhishek Moitra,Student Member, IEEE, Abhiroop Bhattacharjee,Student Member, IEEE, Runcong Kuang,
Gokul Krishnan, Member, IEEE, Yu Cao, Fellow, IEEE, and Priyadarshini Panda, Member, IEEE
Abstract—Spiking Neural Networks (SNNs) are an active
research domain towards energy efficient machine intelligence.
Compared to conventional artificial neural networks (ANNs),
SNNs use temporal spike data and bio-plausible neuronal ac-
tivation functions such as Leaky-Integrate Fire/Integrate Fire
(LIF/IF) for data processing. However, SNNs incur significant
dot-product operations causing high memory and computation
overhead in standard von-Neumann computing platforms. To
this end, In-Memory Computing (IMC) architectures have been
proposed to alleviate the “memory-wall bottleneck” prevalent
in von-Neumann architectures. Although recent works have
proposed IMC-based SNN hardware accelerators, the following
key implementation aspects have been overlooked 1) the adverse
effects of crossbar non-ideality on SNN performance due to
repeated analog dot-product operations over multiple time-steps
2) hardware overheads of essential SNN-specific components
such as the LIF/IF and data communication modules. To this
end, we propose SpikeSim, a tool that can perform realistic
performance, energy, latency and area evaluation of IMC-
mapped SNNs. SpikeSim consists of a practical monolithic IMC
architecture called SpikeFlow for mapping SNNs. Additionally,
the non-ideality computation engine (NICE) and energy-latency-
area (ELA) engine performs hardware-realistic evaluation of
SpikeFlow-mapped SNNs. Based on 65nm CMOS implementa-
tion and experiments on CIFAR10, CIFAR100 and TinyImagenet
datasets, we find that the LIF/IF neuronal module has significant
area contribution (>11% of the total hardware area). To this
end, we propose SNN topological modifications that leads to
1.24×and 10×reduction in the neuronal module’s area and the
overall energy-delay-product value, respectively. Furthermore,
in this work, we perform a holistic comparison between IMC
implemented ANN and SNNs and conclude that lower number of
time-steps are the key to achieve higher throughput and energy-
efficiency for SNNs compared to 4-bit ANNs. The code repository
for the SpikeSim tool will be made available in this Github link.
Index Terms—Spiking Neural Networks (SNNs), In-Memory
Computing, Emerging Devices, Analog Crossbars
I. INTRODUCTION
In the last decade, Spiking Neural Networks (SNNs) have
gained significant attention in the context of energy-efficient
machine intelligence [1]. SNNs encode input data information
with discrete binary spikes over multiple time-steps making
These authors have contributed equally to this work.
Abhishek Moitra, Abhiroop Bhattacharjee, and Priyadarshini Panda are with
the Department of Electrical Engineering, Yale University, New Haven, CT,
USA.
Runcong Kuang, Gokul Krishnan, and Yu Cao are with the School of
Electrical, Computer, and Energy Engineering, Arizona State University,
Tempe 85287, AZ.
TABLE I: Table showing qualitative comparison of SpikeSim with
related works. I- Inference, T- Training, VN- von-Neumann, IMC- In-
memory Computing, ELA- Energy, Latency & Area, M- Monolithic
and C- Chiplet Architecture.
Work Platform I / T Non-
Ideality
ELA
Evaluation
ANN
Eyeriss [13] VN-M I 7 3
Neurosim [14] IMC-M I 7 3
CrossSim [15] IMC-M I 7 3
RxNN [16] IMC-M I 3 7
SIAM [17] IMC-C I 3 3
SNN
Loihi [4], TrueNorth [5] VN-M I 7 7
SpinalFlow [6], PTB [7] VN-M I 7 7
H2Learn [18], SATA [19] VN-M T 7 3
RESPARC [9] IMC-M I 7 7
SpikeSim (ours) IMC-M I 3 3
them highly suitable for asynchronous event-driven input pro-
cessing applications [2], [3]. Recent works have proposed full-
scale general-purpose von-Neumann architectures leveraging
the temporal processing property of SNNs [4], [5]. Other
works such as [6], [7] have proposed novel dataflow to
minimize the hardware overhead in von-Neumann implemen-
tation of SNNs. However, SNNs like conventional Artificial
Neural Networks (ANNs) entail significant dot-product op-
erations leading to high memory and energy overhead when
implemented on traditional von-Neumann architectures (due
to the “memory wall bottleneck”) [8], [9]. To this end, analog
In-Memory Computing (IMC) architectures [10]–[12] have
been proposed to perform analog dot-product or Multiply-
and-Accumulate (MAC) operations to achieve high memory
bandwidth and compute parallelism, thereby overcoming the
“memory wall bottleneck”.
Being an emerging and heavily researched computing
paradigm, IMC architectures require hardware evaluation plat-
forms for fast and accurate algorithm benchmarking. To this
effect, many state-of-the-art hardware evaluation frameworks
[14]–[17] have been proposed for realistic evaluation of IMC-
mapped ANNs. However, they are unsuitable for hardware-
realistic SNN evaluations as they lack key architectural modi-
fications required for temporal spike processing and non-linear
activation functions, such as Leaky Integrate Fire or Integrate
Fire (LIF/IF). In the context of hardware evaluation platforms
for SNNs, works such as [18], [19] have been proposed
for benchmarking SNN training on digital CMOS platforms.
Additionally, works such as [9] propose IMC architectures for
SNN inference. However, they lack several practical archi-
tectural considerations such as non-idealities incurred during
analog MAC computations [20]–[22], data communication
arXiv:2210.12899v1 [cs.NE] 24 Oct 2022
2
overhead among others rendering them unsuitable for a holistic
hardware evaluation for IMC mapped SNNs. All of these
have been qualitatively illustrated and compared in Table I.
Therefore, in current literature, there is an evident gap between
SNN algorithm design and a holistic evaluation platform for
hardware-realistic benchmarking of these algorithms.
To this end, we propose SpikeSim, an end-to-end hardware
evaluation tool for benchmarking SNN inference algorithms.
SpikeSim consists of a monolithic IMC-based tiled hardware
architecture called SpikeFlow that maps a given SNN on
non-ideal analog crossbars. In SpikeFlow, we incorporate
SNN-specific non-linear activation functions such as LIF/IF
neuron and leverage the binary spike input data to propose a
lightweight module (the DIFF module) for facilitating signed
MAC operations without the need for traditional dual-crossbar
approach [14], [23]. For hardware-realistic SNN inference
performance benchmarking, we develop a Non-Ideality Com-
putation Engine (NICE). NICE incorporates a non-ideality-
aware weight encoding to improve the robustness of SNNs
when mapped on analog crossbars [24]. NICE incorporates
circuit analysis methods to realize non-ideal MAC operations
and provide hardware realistic SNN inference performance.
Furthermore, we design an Energy-Latency-Area (ELA) engine
to benchmark hardware realistic energy, latency and area of
the SpikeFlow-mapped SNN.
The key contributions of our work can be summarized as
follows:
1) We propose SpikeSim which is an end-to-end hard-
ware benchmarking tool for SNN inference. SpikeSim
consists of SpikeFlow- a tiled memristive crossbar ar-
chitecture. SpikeFlow incorporates Leaky-Integrate-Fire/
Integrate-Fire (LIF/IF) functionality, and a novel fully-
digital DIFF module that eliminates dual-crossbar ap-
proach for signed MAC computations [14], [23]. Addi-
tionally, it contains NICE and ELA engines for crossbar-
realistic hardware evaluations.
2) We develop NICE to perform fast and realistic mod-
elling of resistive and device conductance variation non-
idealities for crossbar-aware performance evaluations of
SNNs. NICE incorporates a non-ideality aware weight
encoding scheme that improves the inference accuracy
of pretrained SNNs implemented on analog crossbars.
3) We perform extensive hardware evaluations on bench-
mark datasets- CIFAR10, CIFAR100 [25], TinyImagenet
[26] and unravel that the neuronal module consumes a
significant portion of the total chip area (11 30%)
owing to the requirement to store a large number of
membrane potentials in between time-steps.
4) Through extensive experiments we show that simple
SNN topological modifications, such as reducing the
number of output channels in the first convolutional
layer, can ameliorate the area overhead of the neuron
module by 1.24×and improve the Energy-Delay Prod-
uct (EDP) by 10×. Furthermore, we show that the non-
ideality aware weight encoding improves the crossbar-
mapped SNN accuracy by more than 70% (for CIFAR10
dataset) compared to vanilla weight encoding onto the
SpikeFlow architecture.
5) Finally, we compare the performance as well as area and
energy distributions of crossbar mapped VGG9 ANN
and SNNs trained on CIFAR10 dataset. We find that
SNNs exhibit 1000×higher neuronal module area
compared to ANNs and can achieve iso-performance and
higher energy-efficiency and throughput benefits at small
value of time-steps (T=3,4,5) compared to 4-bit ANNs.
To the best of our knowledge, SpikeSim is the first
hardware-realistic evaluation platform for SNNs mapped on
IMC architecture. Through SpikeSim, we bring out some of
the key parameters in SNN algorithm and IMC architecture
design that can potentially lead to IMC-aware SNN research
directions in the future.
II. RELATED WORKS
A. Hardware Evaluation Platforms for ANN Inference
Eyeriss [13] has proposed a reconfigurable digital systolic-
array architecture for energy-efficient ANN accelerators. The
authors show that data transfer from DRAM memory to
the computation unit contributes significantly to the energy
consumption in von-Neumann ANN accelerators and hence
propose a row-stationary dataflow to mitigate the memory
overhead. More recent works such as ISAAC [27], used in-
memory computing architectures such as analog crossbars
to perform fast and energy efficient computation of ANNs.
They performed extensive hardware evaluation with different
crossbar sizes, analog-to-digital converter (ADC) precision
among others. PUMA [12] proposes a memristive crossbar-
based ANN accelerator that uses graph partitioning and custom
instruction set architecture to schedule MAC operations in a
multi-crossbar architecture. The work by Chen et al. [14] Neu-
rosim, proposes an end-to-end hardware evaluation platform
for evaluating monolithic analog crossbar-based ANN accel-
erators. Recent work SIAM by Krishnan et al. [17] proposed
an end-to-end hardware evaluation platform for chiplet-based
analog crossbar-based ANN accelerators. While the above
works provide state-of-the-art evaluation platforms for ANN
accelerators, they are insufficient for accurate SNN evaluation
as they lack critical architectural modifications required for
temporal spike data processing and LIF/IF activation function-
alities.
B. Hardware Evaluation Platforms for SNN Inference
In a recent work SpinalFlow [6], Narayanan et al. showed
that naive hardware implementation of SNNs on Spiking
Eyeriss-like architecture lowers the energy-efficiency claimed
by SNNs. To this end, the work proposed architectural
changes and used a tick-batched dataflow to achieve higher
energy efficiency and lower hardware overheads. Another
work RESPARC [9] proposed analog crossbar-based hardware
accelerators for energy efficient implementation of SNNs.
The energy efficiency of their implementation is achieved
due to the event-driven communication and computation of
spikes. However, the work overlooks the underlying hardware
overheads for event-driven communication and the effect of
analog crossbar non-idealities on SNN performance.
3
Given the current literature gap in IMC-based hardware
evaluation platforms for SNNs, we propose SpikeSim, an
end-to-end platform for hardware realistic benchmarking of
SNNs implemented on IMC architectures. SpikeSim con-
tains SpikeFlow crossbar architecture that incorporates SNN-
specific spike data processing and LIF/IF Neuron function-
ality. SpikeSim also incorporates the NICE and ELA engine
for hardware-realistic performance, energy, latency and area
evaluation of IMC-mapped SNNs.
III. BACKGROUND
A. Spiking Neural Networks
SNNs [1], [28] have gained attention due to their potential
energy-efficiency compared to standard ANNs. The main
feature of SNNs is the type of neural activation function for
temporal signal processing, which is different from a ReLU
activation for ANNs. A Leak-Integrate-and-Fire (LIF) neuron
is commonly used as an activation function for SNNs. The LIF
neuron ihas a membrane potential ut
iwhich accumulates the
weighted summation of asynchronous spike inputs St
j, which
can be formulated as follows:
Ut
i=λUt1
i+X
j
wij St
j.(1)
Here, tstands for time-step, and wij is for weight connections
between neuron iand neuron j. Also, λis a leak factor. The
LIF neuron iaccumulates membrane potential and generates
a spike output ot
iwhenever membrane potential exceeds the
threshold θ:
ot
i=(1,if ut
i> θ,
0otherwise. (2)
The membrane potential is reset to zero after firing. This
integrate-and-fire behavior of an LIF neuron generates a non-
differentiable function, which is difficult to be used with
standard backpropagation.
To address the non-differentiability, various training al-
gorithms for SNNs have been studied in the past decade.
ANN-SNN conversion methods [29]–[33] convert pretrained
ANNs to SNNs using weight (or threshold) scaling in order
to approximate ReLU activation with LIF/IF activation. They
can leverage well-established ANN training methods, resulting
in high accuracy on complex datasets. On the other hand,
surrogate gradient learning addresses the non-differentiability
problem of an LIF/IF neuron by approximating the backward
gradient function [34]. Surrogate gradient learning can directly
learn from the spikes, in a smaller number of time-steps.
Based on the surrogate learning, several input data encoding
schemes have been compared. A recent work [35] compares
two state-of-the-art input data encoding techniques- Direct
Encoding and Rate Encoding. Rate encoding converts a input
data to stochastically distributed temporal spikes using poisson
coding technique [36]. In contrast, direct encoding leverages
features directly extracted from the inputs over multiple time-
steps. It has been shown that direct encoding schemes can
achieve higher performance at lower number of time-steps.
B. Analog Crossbar Arrays and their Non-idealities
Analog crossbars consist of 2D arrays of In-Memory-
Computing (IMC) devices, Digital-to-Analog Converters
(DACs) and Analog-to-Digital Converters (ADCs) and write
circuits for programming the IMC devices. The activations of a
neural network are fed in as analog voltages Vito each row of
the crossbar and weights are programmed as synaptic device
conductances (Gij ) at the cross-points as shown in Fig. 1. For
an ideal N×M crossbar during inference, the voltages interact
with the device conductances and produce a current (governed
by Ohm’s Law).
Consequently, by Kirchoffs current law, the net out-
put current sensed at each column jis the sum of cur-
rents through each device, i.e. Ij(ideal)= ΣN
i=1Gij Vi.
Fig. 1: An IMC crossbar array with input
voltages Vi, IMC devices bearing synaptic
conductances Gij and output currents Ij.
We term the
matrix Gideal as
the collection of
all Gij s for a
crossbar. However,
in reality, the
analog nature of
the computation
leads to various
hardware noise or
non-idealities, such
as, interconnect
parasitic resistances
and synaptic device-
level variations [16],
[20], [24], [37],
[38]. This results
in a Gnonideal
matrix, with each element G0
ij incorporating the impact
of the non-idealities. Consequently, the net output current
sensed at each column jin a non-ideal scenario becomes
Ij(nonideal)= ΣN
i=1G0
ij Vi, which deviates from its ideal
value. This manifests as huge accuracy losses for neural
networks mapped onto crossbars. Larger crossbars entail
greater non-idealities, resulting in higher accuracy losses [16],
[20], [24], [39].
IV. SPIKESIM
SpikeSim platform as shown in Fig. 2requires various SNN,
circuit and device parameter inputs (details provided in Table
II) for the hardware evaluation. It consists of three different
stages:
1) SpikeFlow Mapping: A pre-trained SNN is partitioned
and mapped on a realistic analog crossbar architecture
called SpikeFlow (See Section IV-A for details).
2) Non-Ideality Computation Engine (NICE): Incorpo-
rates circuit analysis and ADC quantization to evaluate
hardware-realistic inference performance of SpikeFlow
mapped SNNs (See Section IV-B).
3) ELA Engine: Computes the energy, latency and area of
the SpikeFlow-mapped SNN (see Section IV-C).
摘要:

1SpikeSim:Anend-to-endCompute-in-MemoryHardwareEvaluationToolforBenchmarkingSpikingNeuralNetworksAbhishekMoitra,StudentMember,IEEE,AbhiroopBhattacharjee,StudentMember,IEEE,RuncongKuang,GokulKrishnan,Member,IEEE,YuCao,Fellow,IEEE,andPriyadarshiniPanda,Member,IEEEAbstract—SpikingNeuralNetworks(SNNs)...

展开>> 收起<<
1 SpikeSim An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.39MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注