Multi-Level Firing with Spiking DS-ResNet Enabling Better and Deeper Directly-Trained Spiking Neural Networks Lang Feng1Qianhui Liu1Huajin Tang12De Ma1and Gang Pan12

2025-04-26 0 0 1.74MB 14 页 10玖币

侵权投诉

Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper

Directly-Trained Spiking Neural Networks

Lang Feng1,Qianhui Liu1,Huajin Tang1,2,De Ma1and Gang Pan1,2,∗

1College of Computer Science and Technology, Zhejiang University, Hangzhou, China

2Zhejiang Lab, Hangzhou, China

{langfeng, qianhuiliu, htang, made, gpan}@zju.edu.cn

Abstract

Spiking neural networks (SNNs) are bio-inspired

neural networks with asynchronous discrete and

sparse characteristics, which have increasingly

manifested their superiority in low energy con-

sumption. Recent research is devoted to utilizing

spatio-temporal information to directly train SNNs

by backpropagation. However, the binary and non-

differentiable properties of spike activities force

directly trained SNNs to suffer from serious gra-

dient vanishing and network degradation, which

greatly limits the performance of directly trained

SNNs and prevents them from going deeper. In

this paper, we propose a multi-level ﬁring (MLF)

method based on the existing spatio-temporal back

propagation (STBP) method, and spiking dormant-

suppressed residual network (spiking DS-ResNet).

MLF enables more efﬁcient gradient propagation

and the incremental expression ability of the neu-

rons. Spiking DS-ResNet can efﬁciently perform

identity mapping of discrete spikes, as well as pro-

vide a more suitable connection for gradient propa-

gation in deep SNNs. With the proposed method,

our model achieves superior performances on a

non-neuromorphic dataset and two neuromorphic

datasets with much fewer trainable parameters and

demonstrates the great ability to combat the gra-

dient vanishing and degradation problem in deep

SNNs.

1 Introduction

Spiking neural networks (SNNs) are developed to realize

brain-like information processing [Maass, 1997], which use

asynchronous binary spike signals to transmit information

and have the ability to process information in both spatial

domain (SD) and temporal domain (TD). Besides, the spar-

sity and event-driven properties of SNNs position them as po-

tential candidates for the implementation of low energy con-

sumption on dedicated neuromorphic hardware. As an ex-

ample, the energy consumed by SNNs to transmit a spike on

∗Corresponding author.

neuromorphic hardware is only nJ or pJ [Diehl and Cook,

2015].

In terms of learning algorithms, existing unsupervised

learning algorithms [Qi et al., 2018; Liu et al., 2020]are difﬁ-

cult to train deep SNNs. Currently, there are two main learn-

ing algorithms for deep SNNs training. One is ANN-SNN

conversion learning [Sengupta et al., 2019; Yan et al., 2021;

Hu et al., 2021], which converts the pre-trained ANN model

to the SNN model. Conversion learning can achieve deep

SNNs training with competitive results, but it has to consume

a large number of timesteps to ensure the coding resolution.

Moreover, conversion learning cannot utilize the TD infor-

mation, making it difﬁcult to train neuromorphic datasets.

The other is direct supervised learning [Wu et al., 2018;

Gu et al., 2019; Liu et al., 2022; Zheng et al., 2021], which is

the approach taken by this paper. Direct supervised learning

has great potential to make full use of spatio-temporal infor-

mation to train the network and can reduce the demand for

timesteps. However, to achieve more efﬁcient direct super-

vised learning for better and deeper directly-trained SNNs,

there are still two challenging issues to overcome.

The ﬁrst is gradient vanishing. Due to non-differentiable

spike activities, approximate derivative [Neftci et al., 2019]

has to be adopted to make the gradient available, such as

rectangle function and Gaussian cumulative distribution func-

tion [Wu et al., 2018]. However, it will raise a problem that

the limited width of the approximate derivative causes mem-

brane potentials of a multitude of neurons to fall into the satu-

ration area, where the approximate derivative is zero or a tiny

value. Furthermore, the sharp features that have larger values

in the feature map cannot be further enhanced due to falling

into the saturation area to the right of the approximate deriva-

tive caused by excessive membrane potential. This greatly

limits the performance of deep SNNs, and the neurons located

in this saturation area caused by excessive membrane poten-

tial are termed to be dormant units in this paper. In the above

cases, the gradient propagation will be blocked and unstable,

therefore resulting in the gradient vanishing and increasing

the difﬁculty of training deep SNNs.

The second is network degradation, which is terribly se-

rious in deep directly-trained SNNs, even if residual struc-

ture [He et al., 2016]is adopted. Therefore, existing train-

ing methods mainly expand SNNs in width to get improved

performance, resulting in a large number of trainable param-

arXiv:2210.06386v2 [cs.NE] 19 Apr 2023

eters. The above non-differentiable spike activity is one of

the reasons for network degradation, and the weak spatial ex-

pression ability of binary spike signals is another signiﬁcant

factor. For the widely used spiking neuron models like leaky

integrate-and-ﬁre (LIF) model, the sharp feature with a larger

value and the non-sharp feature with a smaller value will have

the same output in the forward process if the corresponding

membrane potentials both exceed the ﬁring threshold. As a

result, the loss of information caused by discrete spikes will

make residual structures hard to perform identity mapping.

We take steps to address these two challenges for enabling

better and deeper directly-trained deep SNNs. We ﬁrst pro-

pose the multi-level ﬁring (MLF) method. MLF expands

the non-zero area of the rectangular approximate derivatives

by allocating the coverage of approximate derivative of each

level. In this way, the membrane potentials of neurons are

more likely to fall into the area where the derivative is not

zero, so as to alleviate gradient vanishing. Besides, with

the activation function of neurons in MLF generating spikes

with different thresholds when activating the input, the ex-

pression ability of the neurons can be improved. Second, we

propose spiking dormant-suppressed residual network (spik-

ing DS-ResNet). Spiking DS-ResNet can efﬁciently per-

form identity mapping of discrete spikes as well as reduce

the probability of dormant unit generation, making it more

suitable for gradient propagation. To demonstrate the ef-

fectiveness of our work, we perform experiments on a non-

neuromorphic dataset (CIFAR10) and neuromorphic datasets

(DVS-Gesture, CIFAR10-DVS). Our model achieves state-

of-the-art performances on all datasets with much fewer train-

able parameters. Experimental analysis indicates that MLF

effectively reduces the proportion of dormant units and im-

proves the performances, and MLF with spiking DS-ResNet

allows SNNs to go very deep without degradation.

2 Related Work

Learning algorithm of deep SNNs. For deep SNNs, there

are two main learning algorithms to achieve competitive per-

formance: (1) indirect supervised learning such as ANN-

SNN conversion learning; (2) direct supervised learning, the

gradient descent-based backpropagation method.

The purpose of ANN-SNN conversion learning is to make

the SNNs have the same input-output mapping as the ANNs.

Conversion learning avoids the problem of the weak ex-

pression ability of binary spike signals by approximating

the spike sequence the real-valued output of ReLU, with

which the inevitable conversion loss arises. A lot of works

focus on reducing the conversion loss [Han et al., 2020;

Yan et al., 2021]and achieve competitive performances.

However, conversion learning ignores the effective TD in-

formation and needs a large number of timesteps to ensure

accuracy. As a result, it is often limited to non-neuromorphic

datasets and has a serious inference latency.

In recent years, direct supervised learning of SNNs has de-

veloped rapidly. From spatial back propagation [Lee et al.,

2016]to spatial-temporal back propagation [Wu et al., 2018;

Gu et al., 2019; Fang et al., 2020], people have realized the

utilization of spatial and temporal information for training.

On this basis, [Zheng et al., 2021]realized the direct train-

ing of large-size networks and achieved state-of-the-art per-

formance on the neuromorphic datasets. However, existing

methods didn’t solve the problem of the limited width of ap-

proximate derivative and weak expression ability of binary

spike signals, which makes the direct training of deep SNNs

inefﬁcient. Gradient vanishing and network degradation se-

riously restrict directly-trained SNNs from going very deep,

which is what we want to overcome.

Gradient vanishing or explosion. Gradient vanishing or

explosion is the shared challenge of deep ANNs and deep

SNNs. For deep ANNs, there are quite a few successful meth-

ods to address this problem. Batch normalization (BN) [Ioffe

and Szegedy, 2015]reduces internal covariate shift to avoid

gradient vanishing or explosion. The residual structure [He

et al., 2016]makes the gradient propagate across layers by

introducing shortcut connection, which is one of the most

widely used basic blocks in deep learning.

For directly-trained deep SNNs, existing research on the

gradient vanishing or explosion problem is limited. It is

worth noting that the threshold-dependent batch normaliza-

tion (tdBN) method proposed by [Zheng et al., 2021]can ad-

just the ﬁring rate and avoid gradient vanishing or explosion

to some extent, which is helpful for our further research on

gradient vanishing. On this basis, we will combat the gradi-

ent vanishing problem in SD caused by the limited width of

the approximate derivative.

Deep network degradation. Network degradation will

result in a worse performance of deeper networks than that

of shallower networks. For deep ANNs, one of the most

successful methods to solve degradation problem is residual

structure [He et al., 2016]. It introduces a shortcut connec-

tion to increase the identity mapping ability of the network

and enable the networks to reach hundreds of layers without

degradation greatly expanding the depth of the networks.

For directly-trained deep SNNs, there are few efforts on the

degradation problem. Even if tdBN has explored the directly

trained deep SNNs with residual structure and made SNNs

go deeper, the degradation of deep SNNs is still serious. Our

work will try to ﬁll this gap in the ﬁeld of SNNs.

3 Preliminaries

In this section, we review the spatio-temporal back prop-

agation (STBP) [Wu et al., 2018]and the iterative LIF

model [Wu et al., 2019]to introduce the foundation of our

work.

STBP realizes error backpropagation in both TD and SD

for the direct training of SNNs. On this basis, [Wu et

al., 2019]develops the iterative LIF model into an easy-to-

program version and accelerates the direct training of SNNs.

Considering the fully connected network, the forward process

of the iterative LIF model can be described as

xt+1,n

l(n−1)

j=1

ij ot+1,n−1

j+bn

i,(1)

ut+1,n

i=kτut,n

i(1 −ot,n

i) + xt+1,n

i,(2)

ot+1,n

i=f(ut+1,n

i−Vth) = (1, ut+1,n

i≥Vth

0, ut+1,n

i< Vth

,(3)

𝑉

𝑡ℎ𝐾

Input

Output

LIF

…

𝑤𝑖1

𝑛

𝑤𝑖2

𝑛

𝑤𝑖𝑙 𝑛

𝑛

MLF unit

𝑽𝒕𝒉𝑲

𝑽𝒕𝒉𝟐

𝑽𝒕𝒉𝟏

∑: sum

𝑈: union

Figure 1: Illustration of MLF unit. A MLF unit contains multiple

LIF neurons with different level thresholds. After receiving the in-

put, these neurons will update the membrane potentials. Once the

membrane potential of each level neuron reaches the corresponding

threshold, a spike will be ﬁred. The ﬁnal output of MLF unit is the

union of the spikes ﬁred by all level neurons.

where kτis a decay factor. nand l(n−1) denote the n-th

layer and the number of neurons in the (n−1)-th layer re-

spectively. tis time index. ut,n

iand ot,n

iare the membrane

potential and the output of the i-th neuron in the n-th layer at

time trespectively. ot,n

i∈(0,1) is generated by the activa-

tion function f(·), which is the step function. Vth is the ﬁring

threshold. When the membrane potential exceeds the ﬁring

threshold, the neuron will ﬁre a spike and the membrane po-

tential is reset to zero. wn

ij is the synaptic weight from the

j-th neuron in the (n−1)-th layer to the i-th neuron in the

n-th layer and bn

iis the bias.

4 Method

4.1 The MLF Method

The forward process

As shown in Fig. 1, we replace LIF neurons with MLF

units, which contain multiple LIF neurons with different level

thresholds. The output is the union of all spikes ﬁred by these

neurons. The forward process can be described as

ut+1,n

i=kτut,n

i(1 −ot,n

i) + xt+1,n

i,(4)

ot+1,n

i=f(ut+1,n

i−Vth),(5)

ˆot+1,n

i=s(ot+1,n

i),(6)

where ut,n

i= (ut,n

i,1, ut,n

i,2, ..., ut,n

i,k , ..., ut,n

i,K )and ot,n

(ot,n

i,1, ot,n

i,2, .., ot,n

i,k , ..., ot,n

i,K )denote the membrane potential

vector and the output vector of the i-th MLF unit in the n-th

layer at time trespectively. denotes the Hadamard product.

kand Kdenote the k-th level and the number of levels respec-

tively. Vth = (Vth1, Vth2, .., Vthk, ..., VthK)is the threshold

vector. To facilitate the calculation of pre-synaptic input xt,n

we deﬁne a spike encoder as s(ot,n

i) = ot,n

i,1+ot,n

i,2+...+ot,n

i,K ,

which is completely equivalent to union (see Appendix A).

ˆot,n

i=s(ot,n

i)is the ﬁnal output of the i-th MLF unit in the

n-th layer at time t. Then, xt,n

ican be computed by Eq. (1),

where ot,n

iis replaced with ˆot,n

Comparing Eq. (2)-(3) and Eq. (4)-(6), it can be seen that

MLF unit doesn’t introduce additional trainable parameters

to the network, but just replaces LIF neurons with MLF units.

Beneﬁtting from the union of multiple spikes, MLF unit can

distinguish some sharp features with large values and the non-

sharp features with small values.

The backward process

To demonstrate that MLF can make the gradient propagation

more efﬁcient in SD, we next deduce the backward process

of MLF method.

In order to obtain the gradients of weights and biases, we

ﬁrst derive the gradients of ot,n

i,k ,ˆot,n

iand ut,n

i,k , With Lrepre-

senting the loss function, the gradients ∂L/∂ot,n

i,k ,∂L/∂ˆot,n

and ∂L/∂ut,n

i,k can be computed by applying the chain rule as

follows

∂L

∂ot,n

i,k

=∂L

∂ˆot,n

+∂L

∂ut+1,n

i,k

ut,n

i,k (−kτ),(7)

∂L

∂ˆot,n

l(n+1)

j=1

m=1

(∂L

∂ut,n+1

j,m

ji),(8)

∂L

∂ut,n

i,k

=∂L

∂ot,n

i,k

∂ot,n

i,k

∂ut,n

i,k

+∂L

∂ut+1,n

i,k

kτ(1 −ot,n

i,k ),(9)

We can observe that gradient ∂L/∂ot,n

i,k and ∂L/∂ut,n

i,k come

from two directions: SD (the left part in Eq. (7), (9)) and TD

(the right part in Eq. (7), (9)). Gradient ∂L/∂ˆot,n

icomes from

SD. Finally, we can obtain the gradients of weights wnand

biases bnas follows

∂L

∂wn=

t=1

k=1

∂L

∂ut,n

∂wn=

t=1

(

k=1

∂L

∂ut,n

)ˆ

ot,n−1T,

(10)

∂L

∂bn=

t=1

k=1

∂L

∂ut,n

∂bn=

t=1

(

k=1

∂L

∂ut,n

),(11)

where Tis the number of timesteps. Due to the non-

differentiable property of spiking activity, ∂ok/∂ukcannot

be derived. To solve this problem, we adopt the rectangular

function hk(uk)[Wu et al., 2018]to approximate the deriva-

tive of spike activity, which is deﬁned by

∂ok

∂uk

≈hk(uk) = 1

asign(|uk−Vthk|<a

2),(12)

where ais the width parameter of the rectangular function.

Considering the gradient propagation in SD from n-th

layer to (n−1)-th layer, the spatial propagation link can

be described as: ∂L/∂ˆot,n

i→(∂L/∂ot,n

i,1, ..., ∂L/∂ot,n

i,K )→

(∂L/∂ut,n

i,1, ..., ∂L/∂ut,n

i,K )→∂L/∂ˆot,n−1

i. If it is only one-

level ﬁring (K= 1), the model will become the standard

STBP model. In this case, numerous neurons will fall into the

saturation area outside the rectangular area, some of which

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-LevelFiringwithSpikingDS-ResNet:EnablingBetterandDeeperDirectly-TrainedSpikingNeuralNetworksLangFeng1,QianhuiLiu1,HuajinTang1;2,DeMa1andGangPan1;2;1CollegeofComputerScienceandTechnology,ZhejiangUniversity,Hangzhou,China2ZhejiangLab,Hangzhou,Chinaflangfeng,qianhuiliu,htang,made,gpang@zju.edu.c...

展开>> 收起<<

Multi-Level Firing with Spiking DS-ResNet Enabling Better and Deeper Directly-Trained Spiking Neural Networks Lang Feng1Qianhui Liu1Huajin Tang12De Ma1and Gang Pan12.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-Level Firing with Spiking DS-ResNet Enabling Better and Deeper Directly-Trained Spiking Neural Networks Lang Feng1Qianhui Liu1Huajin Tang12De Ma1and Gang Pan12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: