Energy-Efﬁcient Deployment of Machine Learning Workloads on Neuromorphic Hardware Peyton Chandarana Mohammadreza Mohammadi James Seekings Ramtin Zand

2025-04-29 0 0 315.71KB 7 页 10玖币

侵权投诉

Energy-Efﬁcient Deployment of Machine Learning

Workloads on Neuromorphic Hardware

Peyton Chandarana, Mohammadreza Mohammadi, James Seekings, Ramtin Zand

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC

Abstract—As the technology industry is moving towards im-

plementing tasks such as natural language processing, path

planning, image classiﬁcation, and more on smaller edge com-

puting devices, the demand for more efﬁcient implementations of

algorithms and hardware accelerators has become a signiﬁcant

area of research. In recent years, several edge deep learning

hardware accelerators have been released that speciﬁcally focus

on reducing the power and area consumed by deep neural

networks (DNNs). On the other hand, spiking neural networks

(SNNs) which operate on discrete time-series data, have been

shown to achieve substantial power reductions over even the

aforementioned edge DNN accelerators when deployed on special-

ized neuromorphic event-based/asynchronous hardware. While

neuromorphic hardware has demonstrated great potential for

accelerating deep learning tasks at the edge, the current space

of algorithms and hardware is limited and still in rather early

development. Thus, many hybrid approaches have been proposed

which aim to convert pre-trained DNNs into SNNs. In this work,

we provide a general guide to converting pre-trained DNNs into

SNNs while also presenting techniques to improve the deployment

of converted SNNs on neuromorphic hardware with respect to

latency, power, and energy. Our experimental results show that

when compared against the Intel Neural Compute Stick 2, Intel’s

neuromorphic processor, Loihi, consumes up to 27×less power

and 5×less energy in the tested image classiﬁcation tasks by

using our SNN improvement techniques.

Index Terms—edge computing, hardware accelerator, spiking

neural networks, deep neural networks, neuromorphic computing

I. INTRODUCTION

In the last 10 years, deep learning has transformed the

technology industry enabling computers to perform image

classiﬁcation and recognition, translation, path planning, and

more [1]–[4]. While these efforts have been fruitful in terms

of providing the desired functionality, most of these imple-

mentations employ the use of power-hungry hardware such

as GPUs and TPUs [5] and are deployed in systems that

are not constrained by power limitations. In recent, years

many approaches have been proposed to alleviate these power

constraints with methods such as quantization [6], [7] and

approximate computing [8]. With these approaches, many new

edge-speciﬁc devices have been introduced such as the Nvidia

Jetson Nano, Intel Neural Compute Stick 2, and Google Coral

Edge TPU. Many of these edge devices were created to take

advantage of quantized networks that operate on lower preci-

sion values rather than the standard single or double precision

ﬂoating point representations. As a result, the overall power

consumption and architecture area are reduced to speciﬁcally

beneﬁt applications where space and power are limited.

Spiking neural networks (SNNs), considered the latest gen-

eration of artiﬁcial neural networks (ANNs), are a new class

of neural networks that focus on biological plausibility, energy

efﬁciency, and event-based computing [9]. Unlike their con-

tinuous ﬂoating-point value-based deep neural network (DNN)

counterparts, SNNs operate on discrete-temporal values which

represent the biological action potentials of neurons in the

brain [9]. SNNs have been shown in many works [10]–[12]

to accomplish comparable accuracies to DNNs while also

signiﬁcantly reducing power and energy consumption.

While, SNNs can be more energy and power efﬁcient,

training deep SNNs (DSNNs) has been a recurring challenge

due to the lack of suitable training/learning algorithms that per-

form as well as the backpropagation algorithm used in DNNs

[13]–[15]. Many SNN-speciﬁc learning algorithms have been

proposed such as spike-timing dependent plasticity (STDP)

and variants of it [14], [16]. These learning approaches rely on

the temporal patterns found in the time between spikes to adapt

the weight values as the network sees more input [17]. This

approach to learning, while efﬁcient and suitable for SNNs of

low depth dimensionality, do not typically scale well to deeper

networks due to the lack of feedback from subsequent layers

during training [13], [15]. To address or even bypass the train-

ing and design challenges introduced in SNNs, many DNN to

SNN conversion approaches have been proposed [18]–[20].

One such conversion approach, the SNN Conversion Toolbox

[20], uses the parameters in pre-trained DNNs to create a

similar SNN and deploy them on Loihi to provide energy-

efﬁcient and event-based computation to highly constrained

environments in edge computing applications. In this work,

we aim to generalize the process of converting pre-trained

DNNs into SNNs and deploying the SNN on neuromorphic

hardware such as Loihi by contributing the following:

•We provide general guidelines for designing and training

DNNs for conversion into SNNs.

•After the SNNs are created, we present analysis and

optimization techniques to further optimize the SNNs

with respect to power, latency, and energy.

•We compare the performance of SNNs on Loihi against

the Intel Neural Compute Stick 2 in classifying static

images.

The remainder of this work is organized as follows. In

Section II, we provide an overview of the two hardware

platforms used in this work, the Intel Neural Compute Stick

2 and Intel Loihi, along with their respective APIs. In Section

arXiv:2210.05006v2 [cs.LG] 22 Nov 2022

Fig. 1. Intel Neural Compute Stick 2 (top) [21] and Intel Loihi Kapoho Bay

(bottom) [22].

Fig. 2. Nahuku-32 Loihi server blade with 32 interconnected Loihi chips

[22].

III, we discuss the conversion methodology and network

considerations for converting DNNs to SNNs using the SNN

Conversion Toolbox. We then provide some insights and

techniques, in Section IV, for optimizing the SNNs in terms of

latency and energy consumption. In Section V, we present our

experimental results with respect to inference accuracy, power,

latency, and energy on the three separate image classiﬁcation

tasks. Finally, in Section VI, we conclude our work with a

discussion of the ﬁndings and future directions of research.

II. EDGE HARDWARE

To demonstrate the beneﬁts of neuromorphic hardware for

machine learning tasks we use two hardware platforms along

with their respective software APIs to perform our experi-

ments. Here, we brieﬂy describe the architectures and APIs of

an edge computing neural network accelerator, the Intel Neural

Compute Stick 2, and a neuromorphic hardware platform, Intel

Loihi.

A. Intel Neural Compute Stick 2

In 2017, Intel launched the Movidius Neural Compute Stick

meant to be used in edge computing devices to accelerate

neural networks, speciﬁcally in computer vision-based appli-

cations using convolutional neural networks (CNNs). Since

then, Intel has released an improved version called the Neural

Compute Stick 2 (NCS2), which we use herein.

The NCS2, shown in Fig. 1, provides a plug-and-play USB

interface for use with edge or small-computer devices like

the Raspberry Pi. Speciﬁcally, NCS2 is clocked at 700 MHz

and includes 16 shave cores, a neural engine, and 4 GB of

memory which combine to implement a Vision Processing

Unit (VPU) [23]. Compared to conventional hardware’s full

or double-precision ﬂoating point operations, the Intel NCS2

only performs 16-bit ﬂoating-point operations allowing power

and area savings at the cost of precision/accuracy.

To deploy models on NCS2, Intel provides an API called

OpenVINO [24] which allows models to be compiled and

scaled/quantized for deployment on NCS2. Once the model

is optimized for NCS2, the network can then be deployed on

the device and input can be presented to perform inference.

B. Intel Loihi

Intel’s neuromorphic platform, Loihi, was introduced in

2018 [25] with the goal of deploying SNNs on hardware to

better establish neuromorphic computing’s viability to accel-

erate tasks such as image classiﬁcation and event-based or

real-time computing problems.

In the subsequent years since its launch in 2018, Loihi has

been shown to have magnitudes lower energy consumption

in machine learning applications while achieving comparable

and even, in some cases, better accuracy than the traditional

DNNs deployed on GPUs and TPUs. In terms of scalability,

Loihi’s hardware architecture enables it to be scaled from

small USB form factor devices of one to four Loihi chips to

the much larger data center implementations with many Loihi

chips contained within a single server blade as seen in Figure

2 [25], [26]. Each ﬁrst-generation Loihi chip is comprised

of 128 specialized-event-driven neuro-cores, each capable of

implementing up to 1,024 spiking neurons in an SNN. Addi-

tionally, each neuro-core in the Loihi chip also contains 128

KB of state memory and allows the implementation of up to

4,096 fan-in or fan-out axons connecting to other neurons.

In this work, we employ the SNN Conversion Toolbox [20]

along with its custom Loihi backend, NxTF [27], to convert

DNNs into SNNs to be deployed on Intel’s Loihi platform.

We go into further detail about this conversion process and

methodology in Section III.

III. DNN TO SNN CONVERSION METHODOLOGY

Our experiments perform image classiﬁcation on three dis-

tinct image datasets: the MNIST handwritten digit dataset

[28], the fashion MNIST (FMNIST) clothing dataset [29], and

the American Sign Language (ASL) Alphabet [30]. MNIST

and FMNIST consist of 10 distinct classes each. MNIST

contains 70,000 images in total of handwritten digits zero

through nine. FMNIST, like MNIST, also contains 70,000

static images including images of different pieces of clothing

such as pullovers, trousers, bags, etc. Unlike the MNIST

and FMNIST, the ASL Alphabet dataset contains 24 classes

representing static hand gestures corresponding to the English

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Energy-EfcientDeploymentofMachineLearningWorkloadsonNeuromorphicHardwarePeytonChandarana,MohammadrezaMohammadi,JamesSeekings,RamtinZandDepartmentofComputerScienceandEngineering,UniversityofSouthCarolina,Columbia,SCAbstractAsthetechnologyindustryismovingtowardsim-plementingtaskssuchasnaturallanguag...

展开>> 收起<<

Energy-Efﬁcient Deployment of Machine Learning Workloads on Neuromorphic Hardware Peyton Chandarana Mohammadreza Mohammadi James Seekings Ramtin Zand.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Energy-Efﬁcient Deployment of Machine Learning Workloads on Neuromorphic Hardware Peyton Chandarana Mohammadreza Mohammadi James Seekings Ramtin Zand

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: