1 SpikeSim An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking

2025-04-30 0 0 1.39MB 14 页 10玖币

侵权投诉

SpikeSim: An end-to-end Compute-in-Memory

Hardware Evaluation Tool for Benchmarking

Spiking Neural Networks

Abhishek Moitra∗,Student Member, IEEE, Abhiroop Bhattacharjee∗,Student Member, IEEE, Runcong Kuang,

Gokul Krishnan, Member, IEEE, Yu Cao, Fellow, IEEE, and Priyadarshini Panda, Member, IEEE

Abstract—Spiking Neural Networks (SNNs) are an active

research domain towards energy efﬁcient machine intelligence.

Compared to conventional artiﬁcial neural networks (ANNs),

SNNs use temporal spike data and bio-plausible neuronal ac-

tivation functions such as Leaky-Integrate Fire/Integrate Fire

(LIF/IF) for data processing. However, SNNs incur signiﬁcant

dot-product operations causing high memory and computation

overhead in standard von-Neumann computing platforms. To

this end, In-Memory Computing (IMC) architectures have been

proposed to alleviate the “memory-wall bottleneck” prevalent

in von-Neumann architectures. Although recent works have

proposed IMC-based SNN hardware accelerators, the following

key implementation aspects have been overlooked 1) the adverse

effects of crossbar non-ideality on SNN performance due to

repeated analog dot-product operations over multiple time-steps

2) hardware overheads of essential SNN-speciﬁc components

such as the LIF/IF and data communication modules. To this

end, we propose SpikeSim, a tool that can perform realistic

performance, energy, latency and area evaluation of IMC-

mapped SNNs. SpikeSim consists of a practical monolithic IMC

architecture called SpikeFlow for mapping SNNs. Additionally,

the non-ideality computation engine (NICE) and energy-latency-

area (ELA) engine performs hardware-realistic evaluation of

SpikeFlow-mapped SNNs. Based on 65nm CMOS implementa-

tion and experiments on CIFAR10, CIFAR100 and TinyImagenet

datasets, we ﬁnd that the LIF/IF neuronal module has signiﬁcant

area contribution (>11% of the total hardware area). To this

end, we propose SNN topological modiﬁcations that leads to

1.24×and 10×reduction in the neuronal module’s area and the

overall energy-delay-product value, respectively. Furthermore,

in this work, we perform a holistic comparison between IMC

implemented ANN and SNNs and conclude that lower number of

time-steps are the key to achieve higher throughput and energy-

efﬁciency for SNNs compared to 4-bit ANNs. The code repository

for the SpikeSim tool will be made available in this Github link.

Index Terms—Spiking Neural Networks (SNNs), In-Memory

Computing, Emerging Devices, Analog Crossbars

I. INTRODUCTION

In the last decade, Spiking Neural Networks (SNNs) have

gained signiﬁcant attention in the context of energy-efﬁcient

machine intelligence [1]. SNNs encode input data information

with discrete binary spikes over multiple time-steps making

∗These authors have contributed equally to this work.

Abhishek Moitra, Abhiroop Bhattacharjee, and Priyadarshini Panda are with

the Department of Electrical Engineering, Yale University, New Haven, CT,

USA.

Runcong Kuang, Gokul Krishnan, and Yu Cao are with the School of

Electrical, Computer, and Energy Engineering, Arizona State University,

Tempe 85287, AZ.

TABLE I: Table showing qualitative comparison of SpikeSim with

related works. I- Inference, T- Training, VN- von-Neumann, IMC- In-

memory Computing, ELA- Energy, Latency & Area, M- Monolithic

and C- Chiplet Architecture.

Work Platform I / T Non-

Ideality

ELA

Evaluation

ANN

Eyeriss [13] VN-M I 7 3

Neurosim [14] IMC-M I 7 3

CrossSim [15] IMC-M I 7 3

RxNN [16] IMC-M I 3 7

SIAM [17] IMC-C I 3 3

SNN

Loihi [4], TrueNorth [5] VN-M I 7 7

SpinalFlow [6], PTB [7] VN-M I 7 7

H2Learn [18], SATA [19] VN-M T 7 3

RESPARC [9] IMC-M I 7 7

SpikeSim (ours) IMC-M I 3 3

them highly suitable for asynchronous event-driven input pro-

cessing applications [2], [3]. Recent works have proposed full-

scale general-purpose von-Neumann architectures leveraging

the temporal processing property of SNNs [4], [5]. Other

works such as [6], [7] have proposed novel dataﬂow to

minimize the hardware overhead in von-Neumann implemen-

tation of SNNs. However, SNNs like conventional Artiﬁcial

Neural Networks (ANNs) entail signiﬁcant dot-product op-

erations leading to high memory and energy overhead when

implemented on traditional von-Neumann architectures (due

to the “memory wall bottleneck”) [8], [9]. To this end, analog

In-Memory Computing (IMC) architectures [10]–[12] have

been proposed to perform analog dot-product or Multiply-

and-Accumulate (MAC) operations to achieve high memory

bandwidth and compute parallelism, thereby overcoming the

“memory wall bottleneck”.

Being an emerging and heavily researched computing

paradigm, IMC architectures require hardware evaluation plat-

forms for fast and accurate algorithm benchmarking. To this

effect, many state-of-the-art hardware evaluation frameworks

[14]–[17] have been proposed for realistic evaluation of IMC-

mapped ANNs. However, they are unsuitable for hardware-

realistic SNN evaluations as they lack key architectural modi-

ﬁcations required for temporal spike processing and non-linear

activation functions, such as Leaky Integrate Fire or Integrate

Fire (LIF/IF). In the context of hardware evaluation platforms

for SNNs, works such as [18], [19] have been proposed

for benchmarking SNN training on digital CMOS platforms.

Additionally, works such as [9] propose IMC architectures for

SNN inference. However, they lack several practical archi-

tectural considerations such as non-idealities incurred during

analog MAC computations [20]–[22], data communication

arXiv:2210.12899v1 [cs.NE] 24 Oct 2022

overhead among others rendering them unsuitable for a holistic

hardware evaluation for IMC mapped SNNs. All of these

have been qualitatively illustrated and compared in Table I.

Therefore, in current literature, there is an evident gap between

SNN algorithm design and a holistic evaluation platform for

hardware-realistic benchmarking of these algorithms.

To this end, we propose SpikeSim, an end-to-end hardware

evaluation tool for benchmarking SNN inference algorithms.

SpikeSim consists of a monolithic IMC-based tiled hardware

architecture called SpikeFlow that maps a given SNN on

non-ideal analog crossbars. In SpikeFlow, we incorporate

SNN-speciﬁc non-linear activation functions such as LIF/IF

neuron and leverage the binary spike input data to propose a

lightweight module (the DIFF module) for facilitating signed

MAC operations without the need for traditional dual-crossbar

approach [14], [23]. For hardware-realistic SNN inference

performance benchmarking, we develop a Non-Ideality Com-

putation Engine (NICE). NICE incorporates a non-ideality-

aware weight encoding to improve the robustness of SNNs

when mapped on analog crossbars [24]. NICE incorporates

circuit analysis methods to realize non-ideal MAC operations

and provide hardware realistic SNN inference performance.

Furthermore, we design an Energy-Latency-Area (ELA) engine

to benchmark hardware realistic energy, latency and area of

the SpikeFlow-mapped SNN.

The key contributions of our work can be summarized as

follows:

1) We propose SpikeSim which is an end-to-end hard-

ware benchmarking tool for SNN inference. SpikeSim

consists of SpikeFlow- a tiled memristive crossbar ar-

chitecture. SpikeFlow incorporates Leaky-Integrate-Fire/

Integrate-Fire (LIF/IF) functionality, and a novel fully-

digital DIFF module that eliminates dual-crossbar ap-

proach for signed MAC computations [14], [23]. Addi-

tionally, it contains NICE and ELA engines for crossbar-

realistic hardware evaluations.

2) We develop NICE to perform fast and realistic mod-

elling of resistive and device conductance variation non-

idealities for crossbar-aware performance evaluations of

SNNs. NICE incorporates a non-ideality aware weight

encoding scheme that improves the inference accuracy

of pretrained SNNs implemented on analog crossbars.

3) We perform extensive hardware evaluations on bench-

mark datasets- CIFAR10, CIFAR100 [25], TinyImagenet

[26] and unravel that the neuronal module consumes a

signiﬁcant portion of the total chip area (11 −30%)

owing to the requirement to store a large number of

membrane potentials in between time-steps.

4) Through extensive experiments we show that simple

SNN topological modiﬁcations, such as reducing the

number of output channels in the ﬁrst convolutional

layer, can ameliorate the area overhead of the neuron

module by 1.24×and improve the Energy-Delay Prod-

uct (EDP) by 10×. Furthermore, we show that the non-

ideality aware weight encoding improves the crossbar-

mapped SNN accuracy by more than 70% (for CIFAR10

dataset) compared to vanilla weight encoding onto the

SpikeFlow architecture.

5) Finally, we compare the performance as well as area and

energy distributions of crossbar mapped VGG9 ANN

and SNNs trained on CIFAR10 dataset. We ﬁnd that

SNNs exhibit ∼1000×higher neuronal module area

compared to ANNs and can achieve iso-performance and

higher energy-efﬁciency and throughput beneﬁts at small

value of time-steps (T=3,4,5) compared to 4-bit ANNs.

To the best of our knowledge, SpikeSim is the ﬁrst

hardware-realistic evaluation platform for SNNs mapped on

IMC architecture. Through SpikeSim, we bring out some of

the key parameters in SNN algorithm and IMC architecture

design that can potentially lead to IMC-aware SNN research

directions in the future.

II. RELATED WORKS

A. Hardware Evaluation Platforms for ANN Inference

Eyeriss [13] has proposed a reconﬁgurable digital systolic-

array architecture for energy-efﬁcient ANN accelerators. The

authors show that data transfer from DRAM memory to

the computation unit contributes signiﬁcantly to the energy

consumption in von-Neumann ANN accelerators and hence

propose a row-stationary dataﬂow to mitigate the memory

overhead. More recent works such as ISAAC [27], used in-

memory computing architectures such as analog crossbars

to perform fast and energy efﬁcient computation of ANNs.

They performed extensive hardware evaluation with different

crossbar sizes, analog-to-digital converter (ADC) precision

among others. PUMA [12] proposes a memristive crossbar-

based ANN accelerator that uses graph partitioning and custom

instruction set architecture to schedule MAC operations in a

multi-crossbar architecture. The work by Chen et al. [14] Neu-

rosim, proposes an end-to-end hardware evaluation platform

for evaluating monolithic analog crossbar-based ANN accel-

erators. Recent work SIAM by Krishnan et al. [17] proposed

an end-to-end hardware evaluation platform for chiplet-based

analog crossbar-based ANN accelerators. While the above

works provide state-of-the-art evaluation platforms for ANN

accelerators, they are insufﬁcient for accurate SNN evaluation

as they lack critical architectural modiﬁcations required for

temporal spike data processing and LIF/IF activation function-

alities.

B. Hardware Evaluation Platforms for SNN Inference

In a recent work SpinalFlow [6], Narayanan et al. showed

that naive hardware implementation of SNNs on Spiking

Eyeriss-like architecture lowers the energy-efﬁciency claimed

by SNNs. To this end, the work proposed architectural

changes and used a tick-batched dataﬂow to achieve higher

energy efﬁciency and lower hardware overheads. Another

work RESPARC [9] proposed analog crossbar-based hardware

accelerators for energy efﬁcient implementation of SNNs.

The energy efﬁciency of their implementation is achieved

due to the event-driven communication and computation of

spikes. However, the work overlooks the underlying hardware

overheads for event-driven communication and the effect of

analog crossbar non-idealities on SNN performance.

Given the current literature gap in IMC-based hardware

evaluation platforms for SNNs, we propose SpikeSim, an

end-to-end platform for hardware realistic benchmarking of

SNNs implemented on IMC architectures. SpikeSim con-

tains SpikeFlow crossbar architecture that incorporates SNN-

speciﬁc spike data processing and LIF/IF Neuron function-

ality. SpikeSim also incorporates the NICE and ELA engine

for hardware-realistic performance, energy, latency and area

evaluation of IMC-mapped SNNs.

III. BACKGROUND

A. Spiking Neural Networks

SNNs [1], [28] have gained attention due to their potential

energy-efﬁciency compared to standard ANNs. The main

feature of SNNs is the type of neural activation function for

temporal signal processing, which is different from a ReLU

activation for ANNs. A Leak-Integrate-and-Fire (LIF) neuron

is commonly used as an activation function for SNNs. The LIF

neuron ihas a membrane potential ut

iwhich accumulates the

weighted summation of asynchronous spike inputs St

j, which

can be formulated as follows:

i=λUt−1

i+X

wij St

j.(1)

Here, tstands for time-step, and wij is for weight connections

between neuron iand neuron j. Also, λis a leak factor. The

LIF neuron iaccumulates membrane potential and generates

a spike output ot

iwhenever membrane potential exceeds the

threshold θ:

i=(1,if ut

i> θ,

0otherwise. (2)

The membrane potential is reset to zero after ﬁring. This

integrate-and-ﬁre behavior of an LIF neuron generates a non-

differentiable function, which is difﬁcult to be used with

standard backpropagation.

To address the non-differentiability, various training al-

gorithms for SNNs have been studied in the past decade.

ANN-SNN conversion methods [29]–[33] convert pretrained

ANNs to SNNs using weight (or threshold) scaling in order

to approximate ReLU activation with LIF/IF activation. They

can leverage well-established ANN training methods, resulting

in high accuracy on complex datasets. On the other hand,

surrogate gradient learning addresses the non-differentiability

problem of an LIF/IF neuron by approximating the backward

gradient function [34]. Surrogate gradient learning can directly

learn from the spikes, in a smaller number of time-steps.

Based on the surrogate learning, several input data encoding

schemes have been compared. A recent work [35] compares

two state-of-the-art input data encoding techniques- Direct

Encoding and Rate Encoding. Rate encoding converts a input

data to stochastically distributed temporal spikes using poisson

coding technique [36]. In contrast, direct encoding leverages

features directly extracted from the inputs over multiple time-

steps. It has been shown that direct encoding schemes can

achieve higher performance at lower number of time-steps.

B. Analog Crossbar Arrays and their Non-idealities

Analog crossbars consist of 2D arrays of In-Memory-

Computing (IMC) devices, Digital-to-Analog Converters

(DACs) and Analog-to-Digital Converters (ADCs) and write

circuits for programming the IMC devices. The activations of a

neural network are fed in as analog voltages Vito each row of

the crossbar and weights are programmed as synaptic device

conductances (Gij ) at the cross-points as shown in Fig. 1. For

an ideal N×M crossbar during inference, the voltages interact

with the device conductances and produce a current (governed

by Ohm’s Law).

Consequently, by Kirchoff’s current law, the net out-

put current sensed at each column jis the sum of cur-

rents through each device, i.e. Ij(ideal)= ΣN

i=1Gij ∗Vi.

Fig. 1: An IMC crossbar array with input

voltages Vi, IMC devices bearing synaptic

conductances Gij and output currents Ij.

We term the

matrix Gideal as

the collection of

all Gij ’s for a

crossbar. However,

in reality, the

analog nature of

the computation

leads to various

hardware noise or

non-idealities, such

as, interconnect

parasitic resistances

and synaptic device-

level variations [16],

[20], [24], [37],

[38]. This results

in a Gnon−ideal

matrix, with each element G0

ij incorporating the impact

of the non-idealities. Consequently, the net output current

sensed at each column jin a non-ideal scenario becomes

Ij(non−ideal)= ΣN

i=1G0

ij ∗Vi, which deviates from its ideal

value. This manifests as huge accuracy losses for neural

networks mapped onto crossbars. Larger crossbars entail

greater non-idealities, resulting in higher accuracy losses [16],

[20], [24], [39].

IV. SPIKESIM

SpikeSim platform as shown in Fig. 2requires various SNN,

circuit and device parameter inputs (details provided in Table

II) for the hardware evaluation. It consists of three different

stages:

1) SpikeFlow Mapping: A pre-trained SNN is partitioned

and mapped on a realistic analog crossbar architecture

called SpikeFlow (See Section IV-A for details).

2) Non-Ideality Computation Engine (NICE): Incorpo-

rates circuit analysis and ADC quantization to evaluate

hardware-realistic inference performance of SpikeFlow

mapped SNNs (See Section IV-B).

3) ELA Engine: Computes the energy, latency and area of

the SpikeFlow-mapped SNN (see Section IV-C).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1SpikeSim:Anend-to-endCompute-in-MemoryHardwareEvaluationToolforBenchmarkingSpikingNeuralNetworksAbhishekMoitra,StudentMember,IEEE,AbhiroopBhattacharjee,StudentMember,IEEE,RuncongKuang,GokulKrishnan,Member,IEEE,YuCao,Fellow,IEEE,andPriyadarshiniPanda,Member,IEEEAbstractSpikingNeuralNetworks(SNNs)...

展开>> 收起<<

1 SpikeSim An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 SpikeSim An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: