Exact Gradient Computation for Spiking Neural Networks Through Forward Propagation Jane H. Lee1 Saeid Haghighatshoar2and Amin Karbasi3

2025-05-06 0 0 2.14MB 26 页 10玖币

侵权投诉

Exact Gradient Computation for Spiking Neural Networks

Through Forward Propagation

Jane H. Lee∗1, Saeid Haghighatshoar∗2and Amin Karbasi3

1Department of Computer Science, Yale University

2SynSense, Zurich, Switzerland

3Department of Electrical Engineering, Yale University

March 13, 2023

Abstract

Spiking neural networks (SNN) have recently emerged as alternatives to traditional neural networks,

owing to energy eﬃciency beneﬁts and capacity to better capture biological neuronal mechanisms. However,

the classic backpropagation algorithm for training traditional networks has been notoriously diﬃcult

to apply to SNN due to the hard-thresholding and discontinuities at spike times. Therefore, a large

majority of prior work believes exact gradients for SNN w.r.t. their weights do not exist and has focused

on approximation methods to produce surrogate gradients. In this paper, (1) by applying the implicit

function theorem to SNN at the discrete spike times, we prove that, albeit being non-diﬀerentiable in time,

SNNs have well-deﬁned gradients w.r.t. their weights, and (2) we propose a novel training algorithm, called

forward propagation (FP), that computes exact gradients for SNN. FP exploits the causality structure

between the spikes and allows us to parallelize computation forward in time. It can be used with other

algorithms that simulate the forward pass, and it also provides insights on why other related algorithms

such as Hebbian learning and also recently-proposed surrogate gradient methods may perform well.

Keywords: spiking neural networks ·exact gradients ·neuromorphic computation

1 Introduction

While artiﬁcial neural networks have achieved state-of-the-art performance on various tasks, such as in natural

language processing or computer vision, these networks are usually large, complex, and their computation

consumes a lot of energy. Spiking neural networks (SNNs), inspired by biological neuronal mechanisms and

sometimes referred to as the third generation of neural networks [

], have garnered considerable attention

recently [

] as low-power alternatives. For instance, SNNs have been shown to yield 1-2 orders

of magnitude energy saving over ANNs on emerging neuromorphic hardware [

]. SNNs have other

unique properties, owing to their ability to model biological mechanisms such as dendritic computations with

temporally evolving potentials [

] or short-term plasticity, which allow them to even outperform ANNs in

accuracy in some tasks [

]. The power of neuromorphic computing can even be seen in ANNs, e.g., [

]

use rank-coding in ANN inspired by the temporal encoding of information in SNNs. However, due to the

discontinuous resetting of the membrane potential in spiking neurons, e.g., in Integrate-and-Fire (IF) or

Leaky-Integrate-and-Fire (LIF) type neurons [

], it is notoriously diﬃcult to calculate gradients and train

SNNs by conventional methods. For instance, [

] use the fact that “spike coding poses diﬃculties...and

training that require ad hoc mitigation” and “SNNs are particularly diﬃcult to analyse mathematically” to

∗Equal contribution

arXiv:2210.15415v2 [cs.NE] 10 Mar 2023

motivate rank-coding for ANN. As such, many existing works on training SNN do so without exact gradients,

which range from heuristic rules like Hebbian learning [

] and STDP [

], SNN-ANN conversion

[43, 13, 22], and surrogate gradient approximations [37].

In this work, by applying the implicit function theorem (IFT) at the ﬁring times of the neurons in SNN,

we ﬁrst show that under fairly general conditions, gradients of loss w.r.t. network weights are well-deﬁned.

We do this by proving that the conditions for IFT are always satisﬁed at ﬁring times. We then provide what

we call a forward-propagation (FP) algorithm which uses the causality structure in network ﬁring times and

our IFT-based gradient calculations in order to calculate exact gradients of the loss w.r.t. network weights.

We call it forward propagation because intermediate calculations needed to calculate the ﬁnal gradient are

actually done forward in time (or forward in layers for feed-forward networks). We highlight the following

features of our method:

•

Our method can be applied in networks with arbitrary recurrent connections (up to self loops) and

is agnostic to how the forward pass is implemented. We provide an implementation for computing

the ﬁring times in the forward pass, but as long as we can obtain accurate ﬁring times and causality

information (for instance, using existing libraries), we can calculate gradients.

•

Our method can be seen as an extension of Hebbian learning as it illustrates that the gradient w.r.t. a

weight

Wji

connecting neuron

to neuron

is almost an average of the feeding kernel

yji

between these

neurons at the ﬁring times. In the context of Hebbian learning (especially from a biological perspective),

this is interpreted as the well-known fact that stronger feeding/activation ampliﬁes the association

between the neurons. [8, 19]

•

In our method, the smoothing kernels

yji

arise naturally as a result of application of IFT at the ﬁring

times, resembling the smoothing kernels applied in surrogate gradient methods. As a result (1) our

method sheds some light on why the surrogate gradient methods may work quite well and (2) in our

method, the smoothing kernels

yji

vary according to the ﬁring times between two neurons; thus, they

can be seen as an adaptive version of the ﬁxed smoothing kernels used in surrogate gradient methods.

•

Most of the methods in the literature apply a time-quantized version of the neuron dynamics and convert

the continuous-time system into a discrete-time system. While we derive results in the continuous time

regime, our IFT formulation is also applicable in these discrete-time scenarios. To do so, one needs to

treat the weight parameters and all the time-quantized versions of the variables (such as synaptic and

membrane potential, etc.) as separate variables. The number of these state variables however grows

proportionally to the simulation time and the precision of the time quantization, which is why the

continuous-time regime is preferred.

1.1 Related Work

A review of learning in deep spiking networks can be found at [

], with [

] discussing also

developments in neuromorphic computing in both software (algorithms) and hardware. [

] focuses on

surrogate gradient methods, which use smooth activation functions in place of the hard-thresholding for

compatibility with usual backpropagation and have been used to train SNNs in a variety of settings [

23, 51, 47, 45].

A number of works explore backpropagation in SNNs [

]. The SpikeProp [

] framework assumes a

linear relationship between the post-synaptic input and the resultant spiking time, which our framework does

not rely on. The method in [

] and its RSNN version [

] are limited to a rate-coded loss that depends on

spike counts. The continuous “spike time” representation of spikes in our framework is related to temporal

coding [

], but the authors of [

] in the context of diﬀerentiation of losses largely ignore the discontinuities

that occur at spikes times, stating “the derivative...is discontinuous at such points [but] many feedforward

ANNs use activation functions with a discontinuous ﬁrst derivative”. In contrast with [

], we prove that

exact gradients can be calculated despite this discontinuity.

As mentioned in [

], applying methods from optimal control theory to compute exact gradients in hard-

threshold spiking neural networks has been recognized [

]. However, unlike in our setting these works

consider a neuron with a two-sided threshold and provide specialized algorithms for speciﬁc loss functions.

Most related to our work is the recent EventProp [

] which derives an algorithm for a continuous-time

spiking neural network by applying the adjoint method (which can be seen as generalized backpropagation)

together with proper partial derivative jumps. EventProp calculates the gradients by accumulating adjoint

variables while computing adjoint state trajectories via simulating another continuous-time dynamical system

with transition jumps in a backward pass, but our algorithm computes gradients with just ﬁring time and

causality information. In particular, the only time we need to simulate continuous-time dynamics is in the

forward pass.

2 Spiking Neural Networks

In this section, we ﬁrst describe the precise models we use throughout the paper for the pre-synaptic and

pos-synaptic behaviors of spiking neurons. We then explain the dynamics of a SNN and the eﬀects of spike

generations.

2.1 Pre-Synaptic Model

For the ease of presentation, a generic structure of a SNN is illustrated in Fig. 1 on the left. There are many

diﬀerent models to simulate the nonlinear dynamics of a spiking neuron (e.g., see [

]). In this paper, we

adopt the Leaky-Integrate-and-Fire (LIF) model which consists of three main steps.

2.1.1 Synaptic Dynamics

A generic neuron

is stimulated through a collection of input neurons, its neighborhood

. Each neuron

j∈ Ni

has a synaptic connection to

whose dynamics is modelled by a 1

-order low-pass

circuit that

smooths out the Dirac Delta currents it receives from neuron

. Since this system is linear and time-invariant

(LTI), it can be described by its impulse response

j(t) = e−αjtu(t),

where

αj

τs

and

τs

jCs

denotes the synaptic time constant of neuron

, and

(

)denotes the Heaviside

step function. Therefore, the output synaptic current Ij(t)can be written as

Ij(t) = hs

j(t)?X

f∈Fj

δ(t−f) = X

f∈Fj

j(t−f),(1)

where

is the set of output ﬁring times from neuron

. Note that in Eq.

(1)

we used the fact that convolution

with a Direct Delta function hs

j(t)? δ(t−f) = hs

j(t−f), is equivalent to shifts in time.

2.1.2 Neuron Dynamics

The synaptic current of all stimulating neurons is weighted by

Wji

j∈ Ni

, and builds the weighted current

that feeds the neuron. The dynamic of the neuron can be described by yet another 1

-order low-pass

circuit with a time constant

τn

iCn

and with an impulse response

(

) =

e−βitu

(

)where

βi

τn

. The

output of this system is the membrane potential Vi(t).

2.1.3 Hard-thresholding and spike generation

The membrane potential

(

)is compared with the ﬁring threshold

θi

of neuron

and a spike (a delta current)

is produced by neuron when

(

)goes above

θi

. Also, after spike generation, the membrane potential is

reset/dropped immediately by θi(reset to zero).

2.2 Post-Synaptic Kernel Model

We call the model illustrated in the left of Fig. 1 the pre-synaptic model, as the spiking dynamics of the

stimulating neurons

of a generic neuron

appear before the synapse. In this paper, we will work with a

modiﬁed but equivalent model in which we combine the synaptic and neuron dynamics, and consider the

eﬀect of spiking dynamics of

directly on the membrane potential after it is being smoothed out by the

synapse and neuron low-pass ﬁlters. We call this model the post-synaptic or kernel model of the SNN.

To derive this model, we simply use the fact that the only source of non-linearity in SNN is hard-

thresholding during the spike generation. And, in particular, SNN dynamics from the stimulating neuron

j∈ Ni

until the membrane potential

(

)is completely linear and can be described by the joint impulse

response

hji(t) = hs

j(t)? hn

i(t)

=Z∞

−∞

j(τ)hn

i(t−τ)dτ

=Zt

e−αjτe−βi(t−τ)dτ

=e−αjt−e−βit

βi−αj

u(t).(2)

Therefore the whole eﬀect of spikes

of neuron

j∈ Ni

on the membrane potential can be written in terms

of kernel

yji(t) = X

f∈Fj

hji(t−f).

We call this model post-synaptic since the eﬀect of dynamic of neuron

j∈ Ni

(

)is considered after

being processed by the synapse and even the neuron

. Using the linearity and applying super-position for

linear systems, we can see that the eﬀect of all spikes coming for all stimulating neurons

, can be written as

V◦

i(t) = X

j∈Ni

Wjiyji(t),(3)

where

Wji

is the weight from neuron

. We used

V◦

(

)to denote the contribution to the membrane

potential

(

)after neglecting the potential reset due to hard-thresholding and spike generation. Fig. 1 (right)

illustrates the post-synaptic model for the SNN.

Remark 1.

Our main motivation for using this equivalent model comes from the fact that even though the

spikes are not diﬀerentiable functions, the eﬀect of each stimulating neuron

j∈ Ni

on neuron

is written as a

well-deﬁned and (almost everywhere) diﬀerentiable kernel yji(.).♦

Remark 2

(Connection with the surrogate gradients)

Intuitively speaking, and as we will show rigorously

in the following sections, the kernel model derived here immediately shows that SNNs have an intrinsic

smoothing mechanism for their abrupt spiking inputs, through the low-pass impulse response

hji

(

)between

their neurons. As a result, one does not need to introduce any additional artiﬁcial smoothing to derive

surrogate gradients by modifying the neuron model in the backward gradient computation path. We will use

this inherent smoothing to prove that SNNs indeed have well-deﬁned gradients. Interestingly, our derivation of

the exact gradient based on this inherent smoothing property intuitively explains that even though surrogate

gradients are not exact, they may be close to and yield a similar training performance as the exact gradients.

♦

2.3 SNN Full Dynamics

In the post-synaptic kernel model, we already speciﬁed the eﬀect of spikes from stimulating neurons as in

(3)

. To have a full picture of the SNN dynamics, we need to specify also the eﬀect of spike generation. The

following theorem completes this.

Figure 1:

(Left)

A generic structure of a spiking neural network: (i) spikes (train of Dirac Delta currents)

coming from a generic input neuron

pass through the synaptic RC circuit with a time constant

τs

jCs

and build the synaptic current

(

), (ii) synaptic current

(

)are weighted by

Wji

and build the input

current

PjWjiIj

(

), (iii) this current is ﬁltered through neuron

as an RC circuit with a time constant

τn

iCn

and produces the membrane potential

(

), (iv) membrane potential

(

)is compared with

the threshold

θi

and a current spike is produced when it passes above

θi

, then (v) membrane potential is

reset/dropped by

θi

immediately after the spike generation.

(Right)

Post-synaptic kernel model of the SNNs.

In this model neuron

j∈ Ni

stimulates neuron

through the smooth kernel

yji

(

) =

Pg∈Fihji

(

t−g

)rather

than the abrupt spiking signal Pg∈Fjδ(t−g)as adopted in pre-synaptic model.

Theorem 1.

Let

be a generic neuron in SNN and let

be the set of its stimulating neurons. Let

(

)and

(

)be the impulse response of the neuron

and synapse

j∈ Ni

, respectively, and let

hji

(

) =

(

)

? hs

(

Then the membrane potential of the neuron ifor all times tis given by

Vi(t) = V◦

i(t)−X

f∈Fi

θihn

i(t−f),(4)

where

yji

(

) =

Pg∈Fjhji

(

t−g

)denotes the smoothed kernel between the neuron

and

j∈ Ni

, and

θi

denotes

the spike generation threshold of the neuron i.

Proof.

In the following, we provide a a simple and intuitive proof. An alternative and more rigorous proof by

induction on the number of ﬁring times of neuron iis provided in the Appendix 7.1.

Proof (i)

: We use the following simple result/computation-trick from circuit theory that in an RC circuit,

abrupt dropping of the potential of the capacitor by

θi

at a speciﬁc ﬁring time

f∈ Fi

can be mimicked by

adding a voltage source

−θiu

(

t−f

)series with the capacitor. If we do this for all the ﬁring times of the

neuron, we obtain a linear RC circuit with two inputs: (i) weighted synaptic current coming from the neurons

Ni, (ii) voltage sources {−θiu(t−f) : f∈ Fi}. This is illustrated in Fig. 2.

The key observation is that although this new circuit is obtained after running the dynamics of the neuron

and observing its ﬁring times

, as far as the membrane potential

(

)is concerned, the two circuits are

equivalent. Interestingly, after this modiﬁcation, the new circuit is a completely linear circuit and we can

apply the super-position principle for linear circuits to write the response of the neuron as the summation of:

(i) the response

V(1)

(

)due to the weighted synaptic current

(

)in the input (as in the previous circuit),

and (ii) the response

V(2)

(

)due to Heaviside voltage sources

{−θiu

(

t−f

) :

f∈ Fi}

. From

(3)

V(1)

(

)is

simply given by

V(1)

i(t) = X

j∈Ni

Wjiyji(t).

The response of an RC circuit to a Heaviside voltage function

−θiu

(

t−f

)is given by

−θihn

(

t−f

)where

(

)is the impulse response of the neuron

as before. We also used the time invariance property (for shift

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExactGradientComputationforSpikingNeuralNetworksThroughForwardPropagationJaneH.Lee*1,SaeidHaghighatshoar*2andAminKarbasi31DepartmentofComputerScience,YaleUniversity2SynSense,Zurich,Switzerland3DepartmentofElectricalEngineering,YaleUniversityMarch13,2023AbstractSpikingneuralnetworks(SNN)haverecentlye...

展开>> 收起<<

Exact Gradient Computation for Spiking Neural Networks Through Forward Propagation Jane H. Lee1 Saeid Haghighatshoar2and Amin Karbasi3.pdf

共26页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exact Gradient Computation for Spiking Neural Networks Through Forward Propagation Jane H. Lee1 Saeid Haghighatshoar2and Amin Karbasi3

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: