and spiking behaviour (IF, LIF...). The weights of this layer
are learned during training, thus encoding can be tuned to
reduce spiking activity in the network [7]. In the present
work, we evaluate Direct Encoding on CIFAR-10 and GSC
datasets. Additionally, we evaluate native spike encoding using
event cameras [8]. In this method, each pixel of the sensor
generates a spike whenever it detects a brightness variation,
thus encoding movement into spikes. With such sensor, the
input spiking activity is very low, since only the information
of interest (i.e. moving objects) are returned by the camera.
This property helps improving the computational and energy
efficiency of SNNs. We evaluate native encoding using the
Prophesee NCARS dataset.
2) Training of SNN in the literature: Spiking neural net-
works cannot use the classical backpropagation training algo-
rithm to learn their weights because its activations (spikes)
are binary and thus non-differentiable. Encoding static data
such as images using rate coding enables the conversion of
an already trained FNN to a SNN. The most common way is
to replace the ReLU neurons of the FNN by IF neurons [9].
However, the prediction accuracy obtained through conversion
is systematically inferior to their FNN counterpart, while
generating a lot of spikes over a large number of timesteps.
Numerous works have studied how to train SNNs directly
in spiking domain. The best results were obtained using
backpropagation-based learning rules, such as the surrogate
gradient [10]. To circumvent the non-differentiability of spikes
in SNNs, the main idea of surrogate gradient learning is to use
two distinct functions in the forward and backward passes:
an Heaviside step function for the first, and a differentiable
approximation of the Heaviside in the latter, such as a sig-
moid function. Using surrogate gradient learning requires a
fixed number of timesteps. As the number of computations
performed by SNNs increases with the number of timesteps,
being able to fix it beforehand and to tune it during the training
is vital to increase the computational efficiency of SNNs.
3) Comparisons based on measurements: In the literature,
a few comparisons of SNNs and FNNs have been produced
based on hardware measurements. Some papers show com-
petitive results for SNNs: in [11], the authors highlighted
the influence of the spike encoding method on the accuracy
and computational efficiency of the SNN. They compared the
spiking and formal networks through a Resnet-18 like archi-
tecture on two classification datasets. They found that SNNs
reached higher or equivalent accuracy and energy efficiency.
In [1], the authors showed that an SNN could reach twice the
power and resource efficiency of an FNN, with an MLP on
MNIST dataset targeting ASIC. However, those encouraging
results are still very specific and thus hardly generalizable. In a
more holistic approach, the authors of [4] performed a design
space exploration (including encoding, training method, level
of parallelism...) and showed that the advantage of the SNN
depended on the considered case, making it difficult to draw
general rules. In [2], researchers showed that SNNs on dedi-
cated hardware (Loihi) demonstrated better energy efficiency
than equivalent FNNs on generic hardware (CPU and GPU)
for small topologies, but observed the opposite using larger
CNNs. Once more, the conclusions depended on the studied
case and could not be generalized. Albeit encouraging, those
results are not sufficient to draw general conclusions regarding
the savings offered by event-based processing, since they
depend on the selected application, network hyper-parameters
and hardware targets. Therefore, another approach consists in
comparing both coding domains through estimation metrics,
taking a step back to produce more general conclusions.
4) Comparisons based on metrics: Most energy consump-
tion metrics are based on the number of synaptic opera-
tions: accumulations (ACC) in the SNN and multiplication-
accumulations (MAC) in the FNN. Those models have lim-
itations: energy consumption is assimilated to the energy
consumption of synaptic operations [12], thus other factors
(such as neuron addressing in multiplexed architecture or
memory accesses) are often neglected. Moreover, the models
usually do not take into account some specific mechanisms,
like membrane potential leakage, reset and biases integration.
In [11], the authors proposed a metric based on synaptic opera-
tions only, and found great energy consumption savings for the
SNN (up to 126×more efficient than the FNN baseline). In [4]
the authors demonstrated that such simplistic metrics were not
always coherent with actual energy consumption of circuits on
FPGA. When taking memory into account, another team [13]
found equivalent energy consumptions for SNNs and FNNs
using various topologies on CIFAR10. Additionally, reference
[14] measured a theoretical maximum spike rate of 1.72 to
guarantee energy savings in the SNN based on a detailed
metric, accounting for synaptic operations, memory accesses
and activation broadcast. Those energy consumption models
are enlightening, but still fail to settle whether event-based
processing is sufficient to increase energy efficiency. That is
mostly because those metrics are too hardware specific, or do
not take all significant sources of energy consumption into
account.
In the present work, we propose a metric intended to be
independent from low-level implementation choices, based on
three main operations: neuron addressing, synaptic operations
and memory accesses.
III. METRICS
A. Operational cost
In this section we define a metric to compute the number
of ACC and MAC due to synaptic operations in SNNs and
FNNs.
1) Convolutional layers: For a convolution layer, the num-
ber of filters is defined by Cout and their size are noted
Cin ×Hkernel ×Wkernel, where C,Hand Wstands for
channel, height and width. The input and output of the
layer are composed of a set of feature maps, with shapes
(Cin ×Hin ×Win)and (Cout ×Hout ×Wout)respectively. In
the following we consider the padding mode “same” and a
stride S. The number of timesteps is noted T. The equations
describing the number of MAC and ACC operations in FNNs
and SNNs, for convolution layers, are summaried in Eq. 1.