Training Spiking Neural Networks with Local Tandem Learning Qu Yang1 Jibin Wu2 Malu Zhang3 Yansong Chua4 Xinchao Wang1 Haizhou Li561

2025-05-06 0 0 2.57MB 21 页 10玖币

侵权投诉

Training Spiking Neural Networks

with Local Tandem Learning

Qu Yang1, Jibin Wu2∗

, Malu Zhang3, Yansong Chua4, Xinchao Wang1, Haizhou Li5,6,1

1National University of Singapore

2The Hong Kong Polytechnic University

3University of Electronic Science and Technology of China

4China Nanhu Academy of Electronics and Information Technology

5The Chinese University of Hong Kong, Shenzhen, China

6Kriston AI, Xiamen, China

Abstract

Spiking neural networks (SNNs) are shown to be more biologically plausible and

energy efﬁcient over their predecessors. However, there is a lack of an efﬁcient

and generalized training method for deep SNNs, especially for deployment on

analog computing substrates. In this paper, we put forward a generalized learning

rule, termed Local Tandem Learning (LTL). The LTL rule follows the teacher-

student learning approach by mimicking the intermediate feature representations of

a pre-trained ANN. By decoupling the learning of network layers and leveraging

highly informative supervisor signals, we demonstrate rapid network convergence

within ﬁve training epochs on the CIFAR-10 dataset while having low computa-

tional complexity. Our experimental results have also shown that the SNNs thus

trained can achieve comparable accuracies to their teacher ANNs on CIFAR-10,

CIFAR-100, and Tiny ImageNet datasets. Moreover, the proposed LTL rule is

hardware friendly. It can be easily implemented on-chip to perform fast parameter

calibration and provide robustness against the notorious device non-ideality issues.

It, therefore, opens up a myriad of opportunities for training and deployment of

SNN on ultra-low-power mixed-signal neuromorphic computing chips.

1 Introduction

Over the last decade, artiﬁcial neural networks (ANNs) have improved the perceptual and cognitive

capabilities of machines by leaps and bounds, and have become the de-facto standard for many pattern

recognition tasks including computer vision [

], speech processing [

], language

understanding [

], and robotics [

]. Despite their superior performance, ANNs are computationally

expensive to be deployed on ubiquitous mobile and edge computing devices due to high memory and

computation requirements.

Spiking neural networks (SNNs), the third-generation artiﬁcial neural networks, have gained growing

research attention due to their greater biological plausibility and potential to realize ultra-low-

power computation as observed in biological neural networks. Leveraging the sparse, spike-driven

computation and ﬁne-grain parallelism, the fully digital neuromorphic computing (NC) chips like

TrueNorth [

], Loihi [

], and Tianjic [

], that support the efﬁcient inference of SNNs, have

indeed demonstrated orders of magnitude improved power efﬁciency over GPU-based AI solutions.

Moreover, the emerging in situ mixed-signal NC chips [

], enabled by nascent non-volatile

technologies, can further boost the hardware efﬁciency by a large margin over the aforementioned

digital chips.

∗Corresponding Author: jibin.wu@polyu.edu.hk

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.04532v1 [cs.NE] 10 Oct 2022

Despite remarkable progress in neuromorphic hardware development, how to efﬁciently and effec-

tively train the core computational model, spiking neural network, remains a challenging research

topic. It, therefore, impedes the development of efﬁcient neuromorphic training chips as well as

the wide adoption of neuromorphic solutions in mainstream AI applications. The existing train-

ing algorithms for deep SNNs can be grouped into two categories: ANN-to-SNN conversion and

gradient-based direct training.

For ANN-to-SNN conversion methods, they propose to reuse network weights from more easily

trainable ANNs. This can be viewed as a speciﬁc example of Teacher-Student (T-S) learning that

transfers the knowledge from a teacher ANN to a student SNN in the form of network weights.

By properly determining the neuronal ﬁring threshold and initial membrane potentials for SNNs,

recent studies show that the activation values of ANNs can be well approximated with the ﬁring

rate of spiking neurons, achieving near-lossless network conversion on a number of challenging AI

benchmarks [

]. Nevertheless, these network conversion methods

are developed solely based on the non-leaky integrate-and-ﬁre (IF) neuron model and typically

require a large time window so as to reach a reliable ﬁring rate approximation. It is, therefore, not

straightforward and efﬁcient to deploy these converted SNNs onto the existing neuromorphic chips.

In another vein of research, the gradient-based direct training methods explicitly model each spiking

neuron as a self-recurrent neural network and leverage the canonical Backpropagation Through Time

(BPTT) algorithm to optimize the network parameters. The non-differentiable spiking activation

function is typically circumvented with continuous surrogate gradient (SG) functions during error

backpropagation [

]. Despite their compatibility with event-based inputs

and different spiking neuron models, they are computational and memory inefﬁcient to operate in

practice. Moreover, the gradient approximation error introduced by these SG functions tends to

accumulate over layers, causing signiﬁcant performance degradation in the face of the deep network

structure and short time window [57].

In general, SNN learning algorithms can be categorized into off-chip learning [

] and on-

chip learning [

]. Almost all of the direct SNN training methods discussed above belong

to the off-chip learning category. Due to the lack of effective ways to exploit the high level of

sparsity in spiking activities and the requirement to store non-local information for credit assignment,

these off-chip methods exhibit very low training efﬁciency. Moreover, due to notorious device

non-ideality problems [

], the actual network dynamics will deviate from the off-chip simulated

ones, causing the accuracy of off-chip trained SNNs degrades signiﬁcantly when deployed onto

the analog computing substrates [

]. To address these problems, recent work proposes

on-chip learning algorithms in the form of local Hebbian learning [

] and approximation of

gradient-based learning [

], while the effectiveness of these algorithms had only been

demonstrated on simple benchmarks, such as MNIST and N-MNIST datasets.

To address the aforementioned problems in SNNs training and hardware deployment, we put forward

a generalized SNN learning rule in this paper, which we referred to as the Local Tandem Learning

(LTL) rule. The LTL rule takes the best of both ANN-to-SNN conversion and gradient-based training

methods. On one hand, it makes good use of highly effective intermediate feature representations

of ANNs to supervise the training of SNNs. By doing so, we show that it can achieve rapid

network convergence within ﬁve training epochs on the CIFAR-10 dataset with low computational

complexity. On the other hand, the LTL rule adopts the gradient-based approach to perform knowledge

transfer, which can support different neuron models and achieve rapid pattern recognition. By

propagating gradient information locally within a layer, it can also alleviate the compounding gradient

approximation errors of the SG method and lead to near-lossless knowledge transfer on CIFAR-10,

CIFAR-100, and Tiny ImageNet datasets. Moreover, the LTL rule is designed to be hardware friendly,

which can perform efﬁcient on-chip learning using only local information. Under this on-chip setting,

we demonstrate that the LTL rule is capable of addressing the notorious device non-ideality issues

of analog computing substrates, including device mismatch, quantization noise, thermal noise, and

neuron silencing.

Layer 3

Layer 2

Layer 1

Layer 3

Layer 2

Layer 1

ANN (teacher)

Data

SNN (student)

(a)

T=0 T=1 T=2

Layer

(b)

T=0 T=1 T=2

Layer

(c)

Synapse Membrane

Potential

Spike

Accumulator

Error

Boxcar()

Spike

Generator

Sign()

Reset

Main

Processor Memory

Hardware

Modules

Host Computer NC Chips

Control Signal

Data

Cache

Layer 1

Layer l

Layer l+1

Layer l

(d)

Figure 1: Illustration of the proposed LTL rule and its on-chip implementation. (a) The LTL

rule follows the teacher-student learning approach, whereby the SNN tries to mimic the feature

representation of a pre-trained ANN through local loss functions. (b) Computational graph of the

ofﬂine LTL rule. (c) Computational graph of the online LTL rule. (d) Functional block diagram of the

proposed on-chip implementation, where the host computer transfers the control signal and training

data (input spike train, layerwise targets) to NC chips. The proposed SNN on-chip learning circuit

consists of two parts: spiking neuron (green) and learning circuits (red).

2 Methods

2.1 Spiking Neuron Model

To demonstrate the proposed LTL rule is compatible with different spiking neuron models, we base

our study on both non-leaky integrate-and-ﬁre (IF) [

] and leaky integrate-and-ﬁre (LIF) neuron

models [

], whose neuronal dynamics can be described by the following discrete-time formulation:

i[t] = αUl

i[t−1] + Il

i[t]−ϑSl

i[t−1] (1)

with

i[t] = X

wl−1

ij Sl−1

j[t−1] + bl

i(2)

where

i[t]

and

i[t]

refer to the subthreshold membrane potential of and input current to neuron

layer

, respectively.

α≡exp(−dt/τm)

is the membrane potential decaying constant, wherein

τm

the membrane time constant and

is the simulation time step. For IF neuron,

takes a value of

denotes the neuronal ﬁring threshold.

wl−1

represents the connection weight from neuron

of the

preceding layer

l−1

and

denotes the constant injecting current to neuron

i[t−1]

indicates the

occurrence of an output spike from neuron

at time step

t−1

, which is determined according to the

spiking activation function as per

i[t]=ΘUl

i[t]−ϑwith Θ(x) = 1,if x≥0

0,otherwise (3)

2.2 Local Tandem Learning

As illustrated in Figure 1(a), the LTL rule follows the T-S learning approach [

], whereby the

intermediate feature representation of a pre-trained ANN is transferred to SNN through layer-wise

local loss functions. In contrast to the ANN-to-SNN conversion methods, we establish the feature

representation equivalence at the neuron level rather than at the synapse level, which provides the

ﬂexibility for choosing any neuron model to be used in the SNN. On the other hand, with the proposed

spatially local loss function, we simplify the spatial-temporal credit assignment required in the end-

to-end direct training methods, which can dramatically improve the network convergence speed and

meanwhile reduce the computational complexity. In the following, we introduce two versions of the

LTL rule, ofﬂine and online, depending on whether the temporal locality constraint is imposed.

Ofﬂine Learning

Following the T-S learning approach, we consider the intermediate feature repre-

sentation of a pre-trained ANN as the knowledge and train an SNN to reproduce an equivalent feature

representation via layer-wise loss functions. In particular, we establish an equivalence between the

normalized activation values of an ANN and the global average ﬁring rates of an SNN. To reduce

the discrepancy between these two quantities, we adopt the mean square error (MSE) loss function

and apply it separately for each layer. Thus, for any layer

, the local loss function

is deﬁned as

follows

Llˆyl, yl[Tw]=







ˆyl

ynorm

−Cl[Tw]

Tw







(4)

where

ˆyl

is the output of ANN layer

ynorm

is a normalization constant that takes the value of

99th

99.9th

percentile across all

ˆyl

. This can alleviate the effect of outliers compared to using the

maximum activation value [

is the time window size.

Cl[Tw] = ΣTw

t=1S[t]

is total spike count.

As the computational graph shown in Figure 1(b), we adopt the BPTT algorithm to resolve the

temporal credit assignment problem, and the weight gradients can be derived as

∂Ll

∂wl

t=1

∂Ll

∂Ul

i[t]

∂Ul

i[t]

∂wl

t=1

∂Ll

∂Ul

i[t]

∂Ul

i[t]

∂Il

i[t]

∂Il

i[t]

∂wl

t=1

∂Ll

∂Ul

i[t]Sl−1

j[t−1] (5)

with

∂Ll

∂Ul

i[t]=





αδl

i[t+ 1] ∂Sl

i[t+1]

∂U l

i[t+1] +δl

i[t]∂Sl

i[t]

∂U l

i[t]if t<Tw

δl

i[Tw]∂Sl

i[Tw]

∂U l

i[Tw]if t=Tw

(6)

where

δl

i[t] = ∂Ll

∂Sl

i[t]=





−ϑδl

i[t+ 1] ∂Sl

i[t+1]

∂U l

i[t+1] +δl

i[Tw]if t<Tw

−2

Twˆyl

ynorm −1

TwΣTw

t=1Sl

i[t]if t=Tw

(7)

To resolve the problem of the non-differentiable spiking activation function, we apply the surrogate

gradient method, i.e.,

Θ0(x)≈θ0(x)

. Speciﬁcally, we adopt the boxcar function for

θ0(x)

that

supports convenient and efﬁcient on-chip implementation.

∂Sl

i[t]

∂Ul

i[t]=θ0(Ul

i[t]−ϑ) = 1

psign |Ul

i[t]−ϑ|<p

2(8)

where

controls the permissible range of membrane potentials that allow gradients to pass through,

and we tune this hyperparameter separately for each dataset. By substituting Eq. (8) into Eqs. (6) and

(7), we can yield the ultimate form of weight gradients, and we can update the weights according to

the stochastic gradient descent method or its adaptive variants. See Supplementary Materials Section

A.1 for a more detailed derivation of the gradients to weight and bias terms.

Online Learning

The ofﬂine LTL rule requires storing intermediate synaptic and membrane state

variables so as to be used during error backpropagation, which is prohibited for on-chip learning

where memory resources are limited. To address this problem, we introduce an online LTL rule,

whose loss function is designed to be both spatially and temporally local. To achieve the temporal

locality, we use the moving average ﬁring rate, which can be calculated at each time step, to replace

the global ﬁring rate used in Eq. (4). It hence yields the following local loss function

Ll[t] = 







ˆyl

ynorm

−Cl[t]

t







(9)

Compared to the ofﬂine version, the gradient update is much simpler now:

∂Ll[t]

∂wl

=∂Ll[t]

∂Sl

i[t]

∂Sl

i[t]

∂Ul

i[t]

∂Ul

i[t]

∂wl

=ζl

i[t]∂Sl

i[t]

∂Ul

i[t]Sl−1

j[t−1] (10)

where ζl[t]can be directly computed from Eq. (9):

ζl

i[t] = ∂Ll[t]

∂Sl

i[t]=−2

tˆyl

ynorm

−1

tΣt

k=1Sl

i[k](11)

The computational graph of the online LTL rule is provided in Figure 1(c). It is worth noting that the

ﬁrst few time steps of the ﬁring rate calculation are relatively noisy. Nevertheless, this issue can be

easily addressed by treating the ﬁrst few steps as the warm-up period, during which the parameter

updates are not allowed (see Supplementary Materials Section Cfor a study on the effect of the

warm-up period). By doing so, it can also reduce the overall training cost. As will be discussed in

Sections 3.1 and 3.4, this online version can signiﬁcantly reduce the computational complexity, while

achieving a comparable test accuracy to that of the ofﬂine version.

On-chip Implementation

To allow a convenient and efﬁcient on-chip implementation of the pro-

posed online LTL rule, we carefully designed the learning circuits as illustrated in Figure 1(d). The

output spike count

is updated at the spike accumulator, and it is compared to the local target

ˆyl

following the layer-wise loss function deﬁned in Eq. (9). This error term is further feedback to the

neuron to update the synaptic parameters. Note that the synaptic updates are gated by

sign(·)

and

boxcar(·)functions, which can signiﬁcantly reduce the overall number of parameter updates.

We would like to highlight that the proposed LTL learning rule is more hardware-friendly than

the recently introduced hardware-in-the-loop (HIL) training approach [

]. The HIL training

approaches require two-way information communication, that is, (1) reading intermediate neuronal

states from the NC chip to the host computer to perform off-chip training and (2) writing the updated

weights from the host computer to the NC chip. Given the sequential nature of these two processes

and the high implementation cost for reading neuronal states (e.g., requiring to implement costly

analog-to-digital converters for analog spiking neurons), HIL training approaches are expensive to

deploy in practice.

In contrast, the LTL training rule can be implemented efﬁciently on-chip by simultaneously extracting

the layerwise targets from ANNs, running on the host computer, for data batch

i+ 1

and performing

on-chip SNN training for data batch

. This is similar to conventional ANN training, where the data

preprocessing of the next data batch is performed on the CPU and meanwhile the current data batch

is used for ANN training on the GPU. The only difference is that the input data is preprocessed by

the pre-trained ANN to extract the targets for intermediate layers for the proposed LTL rule. Given

the inference of ANN can be performed in parallel on the host computer, the overall training time

is bottlenecked by the NC chip that operates in a sequential mode, where only one sample is been

processed at a time. Therefore, our method has much lower hardware and time complexity.

3 Experiments

In this section, we evaluate the effectiveness of the proposed LTL rule on the image classiﬁcation task

with CIFAR-10 [

], CIFAR-100 [

], and Tiny ImageNet [

] datasets. We perform a comprehensive

study to demonstrate its superiority in: 1. accurate, rapid, and efﬁcient pattern recognition; 2. rapid

network convergence with low computational complexity; 3. provide robustness against hardware-

related noises. More details about the experimental datasets and implementation details are provided

in the Supplementary Materials Section B, and the source code can be found at2.

3.1 Accurate and Scalable Image Classiﬁcation

Here, we report the classiﬁcation results of LTL trained SNNs on CIFAR-10, CIFAR-100 and Tiny

ImageNet datasets against other SNN learning rules, including ANN-to-SNN conversion [

] and direct SNN training [

] methods. Given the network architectures

and data preparation processes vary slightly across different work, therefore, we focus our discussions

on the conversion or transfer errors between the ANNs and SNNs whenever the data is available.

2https://github.com/Aries231/Local_tandem_learning_rule

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TrainingSpikingNeuralNetworkswithLocalTandemLearningQuYang1,JibinWu2,MaluZhang3,YansongChua4,XinchaoWang1,HaizhouLi5;6;11NationalUniversityofSingapore2TheHongKongPolytechnicUniversity3UniversityofElectronicScienceandTechnologyofChina4ChinaNanhuAcademyofElectronicsandInformationTechnology5TheChinese...

展开>> 收起<<

Training Spiking Neural Networks with Local Tandem Learning Qu Yang1 Jibin Wu2 Malu Zhang3 Yansong Chua4 Xinchao Wang1 Haizhou Li561.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Training Spiking Neural Networks with Local Tandem Learning Qu Yang1 Jibin Wu2 Malu Zhang3 Yansong Chua4 Xinchao Wang1 Haizhou Li561

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: