between each other. This allows event-based computation and enables efficient computation on
neuromorphic hardware with low energy consumption [14, 15, 16].
However, the supervised training of SNNs is challenging due to the non-differentiable neuron model
with discrete spike-generation procedures. Several kinds of methods are proposed to tackle the
problem, and recent progress has empirically obtained successful results. Backpropagation through
time (BPTT) with surrogate gradients (SG) is one of the mainstream methods which enables the
training of deep SNNs with high performance on large-scale datasets (e.g., ImageNet) with extremely
low latency (e.g., 4-6 time steps) [
6
,
10
,
11
,
13
]. These approaches unfold the iterative expression
of spiking neurons, backpropagate the errors through time [
17
], and use surrogate derivatives to
approximate the gradient of the spiking function [
3
,
4
,
18
,
19
,
20
,
21
,
22
,
23
]. As a result, during
training, they suffer from significant memory costs proportional to the number of time steps, and the
optimization with approximated surrogate gradients is not well guaranteed theoretically. Another
branch of works builds the closed-form formulation for the spike representation of neurons, e.g. the
(weighted) firing rate or spiking time, which is similar to conventional artificial neural networks
(ANNs). Then SNNs can be either optimized by calculating the gradients from the equivalent
mappings between spike representations [
2
,
24
,
25
,
26
,
9
,
27
], or converted from a trained equivalent
ANN counterpart [
28
,
29
,
30
,
31
,
32
,
7
,
33
,
8
,
34
]. The optimization of these methods is clearer
than surrogate gradients. However, they require a larger number of time steps compared to BPTT
with SG. Therefore, they suffer from high latency, and more energy consumption is required if the
representation is rate-based. Another critical point for both methods is that they are indeed inconsistent
with biological online learning, which is also the learning rule on neuromorphic hardware [15].
In this work, we develop a novel approach for training SNNs to achieve high performance with
low latency, and maintain the online learning property to pave a path for learning on neuromorphic
chips. We call our method online training through time (OTTT). We first derive OTTT from the
commonly used BPTT with SG method by analyzing the temporal dependency and proposing to
track the presynaptic activities in order to decouple this dependency. With the instantaneous loss
calculation, OTTT can perform forward-in-time learning, i.e. calculations are done online in time
without computing backward through the time. Then we theoretically analyze the gradients of OTTT
and gradients of spike representation-based methods. We show that they have similar expressions and
prove that they can provide the similar descent direction for the optimization problem formulated
by spike representation. For the feedforward network condition, gradients are easily calculated and
analyzed. For the recurrent network condition, we follow the framework in [
12
] that weighted firing
rates will converge to an equilibrium state and gradients can be calculated by implicit differentiation.
With this formulation, the gradients correspond to an approximation of gradients calculated by
implicit differentiation, which can be proved to be a descent direction for the optimization problem
as well [
35
,
36
]. In this way, a connection between OTTT and spike representation-based methods is
bridged. Finally, we show that OTTT is in the form of three-factor Hebbian learning rule [
37
], which
could pave a path for online learning on neuromorphic chips. Our contributions include:
1.
We propose online training through time (OTTT) for SNNs, which enables forward-in-time
learning and only requires constant training memory agnostic to time steps, avoiding the
large training memory costs of backpropagation through time (BPTT).
2.
We theoretically analyze and connect the gradients of OTTT and gradients based on spike
representations, and prove the descent guarantee for optimization under both feedforward
and recurrent conditions.
3.
We show that OTTT is in the form of three-factor Hebbian learning rule, which could pave a
path for on-chip online learning. With OTTT, it is the first time that a connection between
BPTT with SG, spike representation-based methods, and biological three-factor Hebbian
learning is bridged.
4.
We conduct extensive experiments on CIFAR-10, CIFAR-100, ImageNet, and CIFAR10-
DVS, which demonstrate the superior results of our methods on large-scale static and
neuromorphic datasets in a small number of time steps.
2 Related Work
SNN Training Methods.
As for supervised training of SNNs, there are two main research direc-
tions. One direction is to build a connection between spike representations (e.g. firing rates) of SNNs
with equivalent ANN-like closed-form mappings. With the connection, SNNs can be converted from
2