eters. The above non-differentiable spike activity is one of
the reasons for network degradation, and the weak spatial ex-
pression ability of binary spike signals is another significant
factor. For the widely used spiking neuron models like leaky
integrate-and-fire (LIF) model, the sharp feature with a larger
value and the non-sharp feature with a smaller value will have
the same output in the forward process if the corresponding
membrane potentials both exceed the firing threshold. As a
result, the loss of information caused by discrete spikes will
make residual structures hard to perform identity mapping.
We take steps to address these two challenges for enabling
better and deeper directly-trained deep SNNs. We first pro-
pose the multi-level firing (MLF) method. MLF expands
the non-zero area of the rectangular approximate derivatives
by allocating the coverage of approximate derivative of each
level. In this way, the membrane potentials of neurons are
more likely to fall into the area where the derivative is not
zero, so as to alleviate gradient vanishing. Besides, with
the activation function of neurons in MLF generating spikes
with different thresholds when activating the input, the ex-
pression ability of the neurons can be improved. Second, we
propose spiking dormant-suppressed residual network (spik-
ing DS-ResNet). Spiking DS-ResNet can efficiently per-
form identity mapping of discrete spikes as well as reduce
the probability of dormant unit generation, making it more
suitable for gradient propagation. To demonstrate the ef-
fectiveness of our work, we perform experiments on a non-
neuromorphic dataset (CIFAR10) and neuromorphic datasets
(DVS-Gesture, CIFAR10-DVS). Our model achieves state-
of-the-art performances on all datasets with much fewer train-
able parameters. Experimental analysis indicates that MLF
effectively reduces the proportion of dormant units and im-
proves the performances, and MLF with spiking DS-ResNet
allows SNNs to go very deep without degradation.
2 Related Work
Learning algorithm of deep SNNs. For deep SNNs, there
are two main learning algorithms to achieve competitive per-
formance: (1) indirect supervised learning such as ANN-
SNN conversion learning; (2) direct supervised learning, the
gradient descent-based backpropagation method.
The purpose of ANN-SNN conversion learning is to make
the SNNs have the same input-output mapping as the ANNs.
Conversion learning avoids the problem of the weak ex-
pression ability of binary spike signals by approximating
the spike sequence the real-valued output of ReLU, with
which the inevitable conversion loss arises. A lot of works
focus on reducing the conversion loss [Han et al., 2020;
Yan et al., 2021]and achieve competitive performances.
However, conversion learning ignores the effective TD in-
formation and needs a large number of timesteps to ensure
accuracy. As a result, it is often limited to non-neuromorphic
datasets and has a serious inference latency.
In recent years, direct supervised learning of SNNs has de-
veloped rapidly. From spatial back propagation [Lee et al.,
2016]to spatial-temporal back propagation [Wu et al., 2018;
Gu et al., 2019; Fang et al., 2020], people have realized the
utilization of spatial and temporal information for training.
On this basis, [Zheng et al., 2021]realized the direct train-
ing of large-size networks and achieved state-of-the-art per-
formance on the neuromorphic datasets. However, existing
methods didn’t solve the problem of the limited width of ap-
proximate derivative and weak expression ability of binary
spike signals, which makes the direct training of deep SNNs
inefficient. Gradient vanishing and network degradation se-
riously restrict directly-trained SNNs from going very deep,
which is what we want to overcome.
Gradient vanishing or explosion. Gradient vanishing or
explosion is the shared challenge of deep ANNs and deep
SNNs. For deep ANNs, there are quite a few successful meth-
ods to address this problem. Batch normalization (BN) [Ioffe
and Szegedy, 2015]reduces internal covariate shift to avoid
gradient vanishing or explosion. The residual structure [He
et al., 2016]makes the gradient propagate across layers by
introducing shortcut connection, which is one of the most
widely used basic blocks in deep learning.
For directly-trained deep SNNs, existing research on the
gradient vanishing or explosion problem is limited. It is
worth noting that the threshold-dependent batch normaliza-
tion (tdBN) method proposed by [Zheng et al., 2021]can ad-
just the firing rate and avoid gradient vanishing or explosion
to some extent, which is helpful for our further research on
gradient vanishing. On this basis, we will combat the gradi-
ent vanishing problem in SD caused by the limited width of
the approximate derivative.
Deep network degradation. Network degradation will
result in a worse performance of deeper networks than that
of shallower networks. For deep ANNs, one of the most
successful methods to solve degradation problem is residual
structure [He et al., 2016]. It introduces a shortcut connec-
tion to increase the identity mapping ability of the network
and enable the networks to reach hundreds of layers without
degradation greatly expanding the depth of the networks.
For directly-trained deep SNNs, there are few efforts on the
degradation problem. Even if tdBN has explored the directly
trained deep SNNs with residual structure and made SNNs
go deeper, the degradation of deep SNNs is still serious. Our
work will try to fill this gap in the field of SNNs.
3 Preliminaries
In this section, we review the spatio-temporal back prop-
agation (STBP) [Wu et al., 2018]and the iterative LIF
model [Wu et al., 2019]to introduce the foundation of our
work.
STBP realizes error backpropagation in both TD and SD
for the direct training of SNNs. On this basis, [Wu et
al., 2019]develops the iterative LIF model into an easy-to-
program version and accelerates the direct training of SNNs.
Considering the fully connected network, the forward process
of the iterative LIF model can be described as
xt+1,n
i=
l(n−1)
X
j=1
wn
ij ot+1,n−1
j+bn
i,(1)
ut+1,n
i=kτut,n
i(1 −ot,n
i) + xt+1,n
i,(2)
ot+1,n
i=f(ut+1,n
i−Vth) = (1, ut+1,n
i≥Vth
0, ut+1,n
i< Vth
,(3)