Multi-Level Firing with Spiking DS-ResNet Enabling Better and Deeper Directly-Trained Spiking Neural Networks Lang Feng1Qianhui Liu1Huajin Tang12De Ma1and Gang Pan12

2025-04-26 0 0 1.74MB 14 页 10玖币
侵权投诉
Multi-Level Firing with Spiking DS-ResNet: Enabling Better and Deeper
Directly-Trained Spiking Neural Networks
Lang Feng1,Qianhui Liu1,Huajin Tang1,2,De Ma1and Gang Pan1,2,
1College of Computer Science and Technology, Zhejiang University, Hangzhou, China
2Zhejiang Lab, Hangzhou, China
{langfeng, qianhuiliu, htang, made, gpan}@zju.edu.cn
Abstract
Spiking neural networks (SNNs) are bio-inspired
neural networks with asynchronous discrete and
sparse characteristics, which have increasingly
manifested their superiority in low energy con-
sumption. Recent research is devoted to utilizing
spatio-temporal information to directly train SNNs
by backpropagation. However, the binary and non-
differentiable properties of spike activities force
directly trained SNNs to suffer from serious gra-
dient vanishing and network degradation, which
greatly limits the performance of directly trained
SNNs and prevents them from going deeper. In
this paper, we propose a multi-level firing (MLF)
method based on the existing spatio-temporal back
propagation (STBP) method, and spiking dormant-
suppressed residual network (spiking DS-ResNet).
MLF enables more efficient gradient propagation
and the incremental expression ability of the neu-
rons. Spiking DS-ResNet can efficiently perform
identity mapping of discrete spikes, as well as pro-
vide a more suitable connection for gradient propa-
gation in deep SNNs. With the proposed method,
our model achieves superior performances on a
non-neuromorphic dataset and two neuromorphic
datasets with much fewer trainable parameters and
demonstrates the great ability to combat the gra-
dient vanishing and degradation problem in deep
SNNs.
1 Introduction
Spiking neural networks (SNNs) are developed to realize
brain-like information processing [Maass, 1997], which use
asynchronous binary spike signals to transmit information
and have the ability to process information in both spatial
domain (SD) and temporal domain (TD). Besides, the spar-
sity and event-driven properties of SNNs position them as po-
tential candidates for the implementation of low energy con-
sumption on dedicated neuromorphic hardware. As an ex-
ample, the energy consumed by SNNs to transmit a spike on
Corresponding author.
neuromorphic hardware is only nJ or pJ [Diehl and Cook,
2015].
In terms of learning algorithms, existing unsupervised
learning algorithms [Qi et al., 2018; Liu et al., 2020]are diffi-
cult to train deep SNNs. Currently, there are two main learn-
ing algorithms for deep SNNs training. One is ANN-SNN
conversion learning [Sengupta et al., 2019; Yan et al., 2021;
Hu et al., 2021], which converts the pre-trained ANN model
to the SNN model. Conversion learning can achieve deep
SNNs training with competitive results, but it has to consume
a large number of timesteps to ensure the coding resolution.
Moreover, conversion learning cannot utilize the TD infor-
mation, making it difficult to train neuromorphic datasets.
The other is direct supervised learning [Wu et al., 2018;
Gu et al., 2019; Liu et al., 2022; Zheng et al., 2021], which is
the approach taken by this paper. Direct supervised learning
has great potential to make full use of spatio-temporal infor-
mation to train the network and can reduce the demand for
timesteps. However, to achieve more efficient direct super-
vised learning for better and deeper directly-trained SNNs,
there are still two challenging issues to overcome.
The first is gradient vanishing. Due to non-differentiable
spike activities, approximate derivative [Neftci et al., 2019]
has to be adopted to make the gradient available, such as
rectangle function and Gaussian cumulative distribution func-
tion [Wu et al., 2018]. However, it will raise a problem that
the limited width of the approximate derivative causes mem-
brane potentials of a multitude of neurons to fall into the satu-
ration area, where the approximate derivative is zero or a tiny
value. Furthermore, the sharp features that have larger values
in the feature map cannot be further enhanced due to falling
into the saturation area to the right of the approximate deriva-
tive caused by excessive membrane potential. This greatly
limits the performance of deep SNNs, and the neurons located
in this saturation area caused by excessive membrane poten-
tial are termed to be dormant units in this paper. In the above
cases, the gradient propagation will be blocked and unstable,
therefore resulting in the gradient vanishing and increasing
the difficulty of training deep SNNs.
The second is network degradation, which is terribly se-
rious in deep directly-trained SNNs, even if residual struc-
ture [He et al., 2016]is adopted. Therefore, existing train-
ing methods mainly expand SNNs in width to get improved
performance, resulting in a large number of trainable param-
arXiv:2210.06386v2 [cs.NE] 19 Apr 2023
eters. The above non-differentiable spike activity is one of
the reasons for network degradation, and the weak spatial ex-
pression ability of binary spike signals is another significant
factor. For the widely used spiking neuron models like leaky
integrate-and-fire (LIF) model, the sharp feature with a larger
value and the non-sharp feature with a smaller value will have
the same output in the forward process if the corresponding
membrane potentials both exceed the firing threshold. As a
result, the loss of information caused by discrete spikes will
make residual structures hard to perform identity mapping.
We take steps to address these two challenges for enabling
better and deeper directly-trained deep SNNs. We first pro-
pose the multi-level firing (MLF) method. MLF expands
the non-zero area of the rectangular approximate derivatives
by allocating the coverage of approximate derivative of each
level. In this way, the membrane potentials of neurons are
more likely to fall into the area where the derivative is not
zero, so as to alleviate gradient vanishing. Besides, with
the activation function of neurons in MLF generating spikes
with different thresholds when activating the input, the ex-
pression ability of the neurons can be improved. Second, we
propose spiking dormant-suppressed residual network (spik-
ing DS-ResNet). Spiking DS-ResNet can efficiently per-
form identity mapping of discrete spikes as well as reduce
the probability of dormant unit generation, making it more
suitable for gradient propagation. To demonstrate the ef-
fectiveness of our work, we perform experiments on a non-
neuromorphic dataset (CIFAR10) and neuromorphic datasets
(DVS-Gesture, CIFAR10-DVS). Our model achieves state-
of-the-art performances on all datasets with much fewer train-
able parameters. Experimental analysis indicates that MLF
effectively reduces the proportion of dormant units and im-
proves the performances, and MLF with spiking DS-ResNet
allows SNNs to go very deep without degradation.
2 Related Work
Learning algorithm of deep SNNs. For deep SNNs, there
are two main learning algorithms to achieve competitive per-
formance: (1) indirect supervised learning such as ANN-
SNN conversion learning; (2) direct supervised learning, the
gradient descent-based backpropagation method.
The purpose of ANN-SNN conversion learning is to make
the SNNs have the same input-output mapping as the ANNs.
Conversion learning avoids the problem of the weak ex-
pression ability of binary spike signals by approximating
the spike sequence the real-valued output of ReLU, with
which the inevitable conversion loss arises. A lot of works
focus on reducing the conversion loss [Han et al., 2020;
Yan et al., 2021]and achieve competitive performances.
However, conversion learning ignores the effective TD in-
formation and needs a large number of timesteps to ensure
accuracy. As a result, it is often limited to non-neuromorphic
datasets and has a serious inference latency.
In recent years, direct supervised learning of SNNs has de-
veloped rapidly. From spatial back propagation [Lee et al.,
2016]to spatial-temporal back propagation [Wu et al., 2018;
Gu et al., 2019; Fang et al., 2020], people have realized the
utilization of spatial and temporal information for training.
On this basis, [Zheng et al., 2021]realized the direct train-
ing of large-size networks and achieved state-of-the-art per-
formance on the neuromorphic datasets. However, existing
methods didn’t solve the problem of the limited width of ap-
proximate derivative and weak expression ability of binary
spike signals, which makes the direct training of deep SNNs
inefficient. Gradient vanishing and network degradation se-
riously restrict directly-trained SNNs from going very deep,
which is what we want to overcome.
Gradient vanishing or explosion. Gradient vanishing or
explosion is the shared challenge of deep ANNs and deep
SNNs. For deep ANNs, there are quite a few successful meth-
ods to address this problem. Batch normalization (BN) [Ioffe
and Szegedy, 2015]reduces internal covariate shift to avoid
gradient vanishing or explosion. The residual structure [He
et al., 2016]makes the gradient propagate across layers by
introducing shortcut connection, which is one of the most
widely used basic blocks in deep learning.
For directly-trained deep SNNs, existing research on the
gradient vanishing or explosion problem is limited. It is
worth noting that the threshold-dependent batch normaliza-
tion (tdBN) method proposed by [Zheng et al., 2021]can ad-
just the firing rate and avoid gradient vanishing or explosion
to some extent, which is helpful for our further research on
gradient vanishing. On this basis, we will combat the gradi-
ent vanishing problem in SD caused by the limited width of
the approximate derivative.
Deep network degradation. Network degradation will
result in a worse performance of deeper networks than that
of shallower networks. For deep ANNs, one of the most
successful methods to solve degradation problem is residual
structure [He et al., 2016]. It introduces a shortcut connec-
tion to increase the identity mapping ability of the network
and enable the networks to reach hundreds of layers without
degradation greatly expanding the depth of the networks.
For directly-trained deep SNNs, there are few efforts on the
degradation problem. Even if tdBN has explored the directly
trained deep SNNs with residual structure and made SNNs
go deeper, the degradation of deep SNNs is still serious. Our
work will try to fill this gap in the field of SNNs.
3 Preliminaries
In this section, we review the spatio-temporal back prop-
agation (STBP) [Wu et al., 2018]and the iterative LIF
model [Wu et al., 2019]to introduce the foundation of our
work.
STBP realizes error backpropagation in both TD and SD
for the direct training of SNNs. On this basis, [Wu et
al., 2019]develops the iterative LIF model into an easy-to-
program version and accelerates the direct training of SNNs.
Considering the fully connected network, the forward process
of the iterative LIF model can be described as
xt+1,n
i=
l(n1)
X
j=1
wn
ij ot+1,n1
j+bn
i,(1)
ut+1,n
i=kτut,n
i(1 ot,n
i) + xt+1,n
i,(2)
ot+1,n
i=f(ut+1,n
iVth) = (1, ut+1,n
iVth
0, ut+1,n
i< Vth
,(3)
𝑉
𝑡ℎ𝐾
Input
Output
LIF
𝑤𝑖1
𝑛
𝑤𝑖2
𝑛
𝑤𝑖𝑙 𝑛
𝑛
MLF unit
𝑽𝒕𝒉𝑲
𝑽𝒕𝒉𝟐
𝑽𝒕𝒉𝟏
: sum
𝑈: union
Figure 1: Illustration of MLF unit. A MLF unit contains multiple
LIF neurons with different level thresholds. After receiving the in-
put, these neurons will update the membrane potentials. Once the
membrane potential of each level neuron reaches the corresponding
threshold, a spike will be fired. The final output of MLF unit is the
union of the spikes fired by all level neurons.
where kτis a decay factor. nand l(n1) denote the n-th
layer and the number of neurons in the (n1)-th layer re-
spectively. tis time index. ut,n
iand ot,n
iare the membrane
potential and the output of the i-th neuron in the n-th layer at
time trespectively. ot,n
i(0,1) is generated by the activa-
tion function f(·), which is the step function. Vth is the firing
threshold. When the membrane potential exceeds the firing
threshold, the neuron will fire a spike and the membrane po-
tential is reset to zero. wn
ij is the synaptic weight from the
j-th neuron in the (n1)-th layer to the i-th neuron in the
n-th layer and bn
iis the bias.
4 Method
4.1 The MLF Method
The forward process
As shown in Fig. 1, we replace LIF neurons with MLF
units, which contain multiple LIF neurons with different level
thresholds. The output is the union of all spikes fired by these
neurons. The forward process can be described as
ut+1,n
i=kτut,n
i(1 ot,n
i) + xt+1,n
i,(4)
ot+1,n
i=f(ut+1,n
iVth),(5)
ˆot+1,n
i=s(ot+1,n
i),(6)
where ut,n
i= (ut,n
i,1, ut,n
i,2, ..., ut,n
i,k , ..., ut,n
i,K )and ot,n
i=
(ot,n
i,1, ot,n
i,2, .., ot,n
i,k , ..., ot,n
i,K )denote the membrane potential
vector and the output vector of the i-th MLF unit in the n-th
layer at time trespectively. denotes the Hadamard product.
kand Kdenote the k-th level and the number of levels respec-
tively. Vth = (Vth1, Vth2, .., Vthk, ..., VthK)is the threshold
vector. To facilitate the calculation of pre-synaptic input xt,n
i,
we define a spike encoder as s(ot,n
i) = ot,n
i,1+ot,n
i,2+...+ot,n
i,K ,
which is completely equivalent to union (see Appendix A).
ˆot,n
i=s(ot,n
i)is the final output of the i-th MLF unit in the
n-th layer at time t. Then, xt,n
ican be computed by Eq. (1),
where ot,n
iis replaced with ˆot,n
i.
Comparing Eq. (2)-(3) and Eq. (4)-(6), it can be seen that
MLF unit doesn’t introduce additional trainable parameters
to the network, but just replaces LIF neurons with MLF units.
Benefitting from the union of multiple spikes, MLF unit can
distinguish some sharp features with large values and the non-
sharp features with small values.
The backward process
To demonstrate that MLF can make the gradient propagation
more efficient in SD, we next deduce the backward process
of MLF method.
In order to obtain the gradients of weights and biases, we
first derive the gradients of ot,n
i,k ,ˆot,n
iand ut,n
i,k , With Lrepre-
senting the loss function, the gradients L/∂ot,n
i,k ,L/∂ˆot,n
i
and L/∂ut,n
i,k can be computed by applying the chain rule as
follows
L
ot,n
i,k
=L
ˆot,n
i
+L
ut+1,n
i,k
ut,n
i,k (kτ),(7)
L
ˆot,n
i
=
l(n+1)
X
j=1
K
X
m=1
(L
ut,n+1
j,m
wn
ji),(8)
L
ut,n
i,k
=L
ot,n
i,k
ot,n
i,k
ut,n
i,k
+L
ut+1,n
i,k
kτ(1 ot,n
i,k ),(9)
We can observe that gradient L/∂ot,n
i,k and L/∂ut,n
i,k come
from two directions: SD (the left part in Eq. (7), (9)) and TD
(the right part in Eq. (7), (9)). Gradient L/∂ˆot,n
icomes from
SD. Finally, we can obtain the gradients of weights wnand
biases bnas follows
L
wn=
T
X
t=1
K
X
k=1
L
ut,n
k
ut,n
k
wn=
T
X
t=1
(
K
X
k=1
L
ut,n
k
)ˆ
ot,n1T,
(10)
L
bn=
T
X
t=1
K
X
k=1
L
ut,n
k
ut,n
k
bn=
T
X
t=1
(
K
X
k=1
L
ut,n
k
),(11)
where Tis the number of timesteps. Due to the non-
differentiable property of spiking activity, ok/∂ukcannot
be derived. To solve this problem, we adopt the rectangular
function hk(uk)[Wu et al., 2018]to approximate the deriva-
tive of spike activity, which is defined by
ok
uk
hk(uk) = 1
asign(|ukVthk|<a
2),(12)
where ais the width parameter of the rectangular function.
Considering the gradient propagation in SD from n-th
layer to (n1)-th layer, the spatial propagation link can
be described as: L/∂ˆot,n
i(L/∂ot,n
i,1, ..., ∂L/∂ot,n
i,K )
(L/∂ut,n
i,1, ..., ∂L/∂ut,n
i,K )L/∂ˆot,n1
i. If it is only one-
level firing (K= 1), the model will become the standard
STBP model. In this case, numerous neurons will fall into the
saturation area outside the rectangular area, some of which
摘要:

Multi-LevelFiringwithSpikingDS-ResNet:EnablingBetterandDeeperDirectly-TrainedSpikingNeuralNetworksLangFeng1,QianhuiLiu1,HuajinTang1;2,DeMa1andGangPan1;2;1CollegeofComputerScienceandTechnology,ZhejiangUniversity,Hangzhou,China2ZhejiangLab,Hangzhou,Chinaflangfeng,qianhuiliu,htang,made,gpang@zju.edu.c...

展开>> 收起<<
Multi-Level Firing with Spiking DS-ResNet Enabling Better and Deeper Directly-Trained Spiking Neural Networks Lang Feng1Qianhui Liu1Huajin Tang12De Ma1and Gang Pan12.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.74MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注