Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware Peyton Chandarana Mohammadreza Mohammadi James Seekings Ramtin Zand

2025-04-29 0 0 315.71KB 7 页 10玖币
侵权投诉
Energy-Efficient Deployment of Machine Learning
Workloads on Neuromorphic Hardware
Peyton Chandarana, Mohammadreza Mohammadi, James Seekings, Ramtin Zand
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC
Abstract—As the technology industry is moving towards im-
plementing tasks such as natural language processing, path
planning, image classification, and more on smaller edge com-
puting devices, the demand for more efficient implementations of
algorithms and hardware accelerators has become a significant
area of research. In recent years, several edge deep learning
hardware accelerators have been released that specifically focus
on reducing the power and area consumed by deep neural
networks (DNNs). On the other hand, spiking neural networks
(SNNs) which operate on discrete time-series data, have been
shown to achieve substantial power reductions over even the
aforementioned edge DNN accelerators when deployed on special-
ized neuromorphic event-based/asynchronous hardware. While
neuromorphic hardware has demonstrated great potential for
accelerating deep learning tasks at the edge, the current space
of algorithms and hardware is limited and still in rather early
development. Thus, many hybrid approaches have been proposed
which aim to convert pre-trained DNNs into SNNs. In this work,
we provide a general guide to converting pre-trained DNNs into
SNNs while also presenting techniques to improve the deployment
of converted SNNs on neuromorphic hardware with respect to
latency, power, and energy. Our experimental results show that
when compared against the Intel Neural Compute Stick 2, Intel’s
neuromorphic processor, Loihi, consumes up to 27×less power
and 5×less energy in the tested image classification tasks by
using our SNN improvement techniques.
Index Terms—edge computing, hardware accelerator, spiking
neural networks, deep neural networks, neuromorphic computing
I. INTRODUCTION
In the last 10 years, deep learning has transformed the
technology industry enabling computers to perform image
classification and recognition, translation, path planning, and
more [1]–[4]. While these efforts have been fruitful in terms
of providing the desired functionality, most of these imple-
mentations employ the use of power-hungry hardware such
as GPUs and TPUs [5] and are deployed in systems that
are not constrained by power limitations. In recent, years
many approaches have been proposed to alleviate these power
constraints with methods such as quantization [6], [7] and
approximate computing [8]. With these approaches, many new
edge-specific devices have been introduced such as the Nvidia
Jetson Nano, Intel Neural Compute Stick 2, and Google Coral
Edge TPU. Many of these edge devices were created to take
advantage of quantized networks that operate on lower preci-
sion values rather than the standard single or double precision
floating point representations. As a result, the overall power
consumption and architecture area are reduced to specifically
benefit applications where space and power are limited.
Spiking neural networks (SNNs), considered the latest gen-
eration of artificial neural networks (ANNs), are a new class
of neural networks that focus on biological plausibility, energy
efficiency, and event-based computing [9]. Unlike their con-
tinuous floating-point value-based deep neural network (DNN)
counterparts, SNNs operate on discrete-temporal values which
represent the biological action potentials of neurons in the
brain [9]. SNNs have been shown in many works [10]–[12]
to accomplish comparable accuracies to DNNs while also
significantly reducing power and energy consumption.
While, SNNs can be more energy and power efficient,
training deep SNNs (DSNNs) has been a recurring challenge
due to the lack of suitable training/learning algorithms that per-
form as well as the backpropagation algorithm used in DNNs
[13]–[15]. Many SNN-specific learning algorithms have been
proposed such as spike-timing dependent plasticity (STDP)
and variants of it [14], [16]. These learning approaches rely on
the temporal patterns found in the time between spikes to adapt
the weight values as the network sees more input [17]. This
approach to learning, while efficient and suitable for SNNs of
low depth dimensionality, do not typically scale well to deeper
networks due to the lack of feedback from subsequent layers
during training [13], [15]. To address or even bypass the train-
ing and design challenges introduced in SNNs, many DNN to
SNN conversion approaches have been proposed [18]–[20].
One such conversion approach, the SNN Conversion Toolbox
[20], uses the parameters in pre-trained DNNs to create a
similar SNN and deploy them on Loihi to provide energy-
efficient and event-based computation to highly constrained
environments in edge computing applications. In this work,
we aim to generalize the process of converting pre-trained
DNNs into SNNs and deploying the SNN on neuromorphic
hardware such as Loihi by contributing the following:
We provide general guidelines for designing and training
DNNs for conversion into SNNs.
After the SNNs are created, we present analysis and
optimization techniques to further optimize the SNNs
with respect to power, latency, and energy.
We compare the performance of SNNs on Loihi against
the Intel Neural Compute Stick 2 in classifying static
images.
The remainder of this work is organized as follows. In
Section II, we provide an overview of the two hardware
platforms used in this work, the Intel Neural Compute Stick
2 and Intel Loihi, along with their respective APIs. In Section
978-1-6654-6550-2/22/$31.00 ©2022 IEEE
arXiv:2210.05006v2 [cs.LG] 22 Nov 2022
Fig. 1. Intel Neural Compute Stick 2 (top) [21] and Intel Loihi Kapoho Bay
(bottom) [22].
Fig. 2. Nahuku-32 Loihi server blade with 32 interconnected Loihi chips
[22].
III, we discuss the conversion methodology and network
considerations for converting DNNs to SNNs using the SNN
Conversion Toolbox. We then provide some insights and
techniques, in Section IV, for optimizing the SNNs in terms of
latency and energy consumption. In Section V, we present our
experimental results with respect to inference accuracy, power,
latency, and energy on the three separate image classification
tasks. Finally, in Section VI, we conclude our work with a
discussion of the findings and future directions of research.
II. EDGE HARDWARE
To demonstrate the benefits of neuromorphic hardware for
machine learning tasks we use two hardware platforms along
with their respective software APIs to perform our experi-
ments. Here, we briefly describe the architectures and APIs of
an edge computing neural network accelerator, the Intel Neural
Compute Stick 2, and a neuromorphic hardware platform, Intel
Loihi.
A. Intel Neural Compute Stick 2
In 2017, Intel launched the Movidius Neural Compute Stick
meant to be used in edge computing devices to accelerate
neural networks, specifically in computer vision-based appli-
cations using convolutional neural networks (CNNs). Since
then, Intel has released an improved version called the Neural
Compute Stick 2 (NCS2), which we use herein.
The NCS2, shown in Fig. 1, provides a plug-and-play USB
interface for use with edge or small-computer devices like
the Raspberry Pi. Specifically, NCS2 is clocked at 700 MHz
and includes 16 shave cores, a neural engine, and 4 GB of
memory which combine to implement a Vision Processing
Unit (VPU) [23]. Compared to conventional hardware’s full
or double-precision floating point operations, the Intel NCS2
only performs 16-bit floating-point operations allowing power
and area savings at the cost of precision/accuracy.
To deploy models on NCS2, Intel provides an API called
OpenVINO [24] which allows models to be compiled and
scaled/quantized for deployment on NCS2. Once the model
is optimized for NCS2, the network can then be deployed on
the device and input can be presented to perform inference.
B. Intel Loihi
Intel’s neuromorphic platform, Loihi, was introduced in
2018 [25] with the goal of deploying SNNs on hardware to
better establish neuromorphic computing’s viability to accel-
erate tasks such as image classification and event-based or
real-time computing problems.
In the subsequent years since its launch in 2018, Loihi has
been shown to have magnitudes lower energy consumption
in machine learning applications while achieving comparable
and even, in some cases, better accuracy than the traditional
DNNs deployed on GPUs and TPUs. In terms of scalability,
Loihi’s hardware architecture enables it to be scaled from
small USB form factor devices of one to four Loihi chips to
the much larger data center implementations with many Loihi
chips contained within a single server blade as seen in Figure
2 [25], [26]. Each first-generation Loihi chip is comprised
of 128 specialized-event-driven neuro-cores, each capable of
implementing up to 1,024 spiking neurons in an SNN. Addi-
tionally, each neuro-core in the Loihi chip also contains 128
KB of state memory and allows the implementation of up to
4,096 fan-in or fan-out axons connecting to other neurons.
In this work, we employ the SNN Conversion Toolbox [20]
along with its custom Loihi backend, NxTF [27], to convert
DNNs into SNNs to be deployed on Intel’s Loihi platform.
We go into further detail about this conversion process and
methodology in Section III.
III. DNN TO SNN CONVERSION METHODOLOGY
Our experiments perform image classification on three dis-
tinct image datasets: the MNIST handwritten digit dataset
[28], the fashion MNIST (FMNIST) clothing dataset [29], and
the American Sign Language (ASL) Alphabet [30]. MNIST
and FMNIST consist of 10 distinct classes each. MNIST
contains 70,000 images in total of handwritten digits zero
through nine. FMNIST, like MNIST, also contains 70,000
static images including images of different pieces of clothing
such as pullovers, trousers, bags, etc. Unlike the MNIST
and FMNIST, the ASL Alphabet dataset contains 24 classes
representing static hand gestures corresponding to the English
摘要:

Energy-EfcientDeploymentofMachineLearningWorkloadsonNeuromorphicHardwarePeytonChandarana,MohammadrezaMohammadi,JamesSeekings,RamtinZandDepartmentofComputerScienceandEngineering,UniversityofSouthCarolina,Columbia,SCAbstract—Asthetechnologyindustryismovingtowardsim-plementingtaskssuchasnaturallanguag...

展开>> 收起<<
Energy-Efficient Deployment of Machine Learning Workloads on Neuromorphic Hardware Peyton Chandarana Mohammadreza Mohammadi James Seekings Ramtin Zand.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:315.71KB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注