NASA Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

2025-05-02 0 0 969.94KB 9 页 10玖币

侵权投诉

NASA: Neural Architecture Search and Acceleration

for Hardware Inspired Hybrid Networks

Huihong Shi1,2∗, Haoran You1, Yang Zhao3, Zhongfeng Wang2, and Yingyan Lin1

1Georgia Institute of Technology, 2Nanjing University, 3Rice University,

1USA, 2P.R. China, 3USA,

{eiclab,hyou37,celine.lin}@gatech.edu,zy34@rice.edu,zfwang@nju.edu.cn

Abstract

Multiplication is arguably the most cost-dominant oper-

ation in modern deep neural networks (DNNs), limiting

their achievable eciency and thus more extensive deploy-

ment in resource-constrained applications. To tackle this

limitation, pioneering works have developed handcrafted

multiplication-free DNNs, which require expert knowledge

and time-consuming manual iteration, calling for fast devel-

opment tools. To this end, we propose a

eural

rchitecture

earch and

cceleration framework dubbed NASA, which

enables automated multiplication-reduced DNN develop-

ment and integrates a dedicated multiplication-reduced accel-

erator for boosting DNNs’ achievable eciency. Specically,

NASA adopts neural architecture search (NAS) spaces that

augment the state-of-the-art one with hardware inspired

multiplication-free operators, such as shift and adder, armed

with a novel progressive pretrain strategy (PGP) together

with customized training recipes to automatically search for

optimal multiplication-reduced DNNs; On top of that, NASA

further develops a dedicated accelerator, which advocates a

chunk-based template and auto-mapper dedicated for NASA-

NAS resulting DNNs to better leverage their algorithmic

properties for boosting hardware eciency. Experimental

results and ablation studies consistently validate the advan-

tages of NASA’s algorithm-hardware co-design framework in

terms of achievable accuracy and eciency tradeos. Codes

are available at hps://github.com/GATECH-EIC/NASA.

ACM Reference Format:

Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan

Lin. 2022. NASA: Neural Architecture Search and Acceleration

∗

Work done when Huihong was a visiting student at Georgia Tech. Corre-

spondence should be addressed to: Zhongfeng Wang and Yingyan Lin.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for prot or commercial advantage and that copies bear

this notice and the full citation on the rst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior specic permission and/or a fee. Request

permissions from permissions@acm.org.

ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA

ACM ISBN 978-1-4503-9217-4/22/10. . . $15.00

hps://doi.org/10.1145/3508352.3549478

for Hardware Inspired Hybrid Networks. In IEEE/ACM Interna-

tional Conference on Computer-Aided Design (ICCAD ’22), October

30-November 3, 2022, San Diego, CA, USA. ACM, New York, NY, USA,

9pages. hps://doi.org/10.1145/3508352.3549478

1 Introduction

Modern deep neural networks (DNNs) have achieved great

success in various computer vision tasks [

which has motivated a substantially increased demand for

DNN-powered solutions in numerous real-world applica-

tions. However, the extensively used multiplications in DNNs

dominate their energy consumption and have largely chal-

lenged DNNs’ achievable hardware eciency, motivating

multiplication-free DNNs that adopt hardware-friendly oper-

ators, such as additions and bit-wise shifts, which require a

smaller unit energy and area cost as compared to multiplica-

tions [

]. In particular, pioneering works of multiplication-

free DNNs include (1) DeepShift [

] which proposes to adopt

merely shift layers for DNNs, (2) AdderNet [

] which advo-

cates using adder layers to implement DNNs for trading the

massive multiplications with lower-cost additions, and (3)

ShiftAddNet [

] which combines both shift and adder lay-

ers to construct DNNs for better trading-o the achievable

accuracy and eciency.

Despite the promising hardware eciency of the

multiplication-free DNNs, their models’ expressiveness ca-

pacity and thus achievable accuracy are generally inferior to

their multiplication-based counterparts. As such, it is highly

desired to develop hybrid multiplication-reduced DNNs that

integrate both multiplication-based and multiplication-free

operators (e.g., shift and adder) to boost the hardware e-

ciency while maintaining the task accuracy. Motivated by

the recent success of neural architecture search (NAS) in

automating the design of ecient and accurate DNNs, one

natural thought is to leverage NAS to automatically search

for the aforementioned hybrid DNNs for various applications

and tasks, each of which often requires a dierent accuracy-

eciency trade-o and thus calls for a dedicated design of

the algorithms and their corresponding accelerators.

In parallel, various techniques [

] have

been proposed to boost the hardware eciency of DNNs,

promoting their real-world deployment from the hardware

perspective. For example, Eyeriss [

] proposes a row sta-

tionary dataow and a micro-architecture with hierarchi-

cal memories to enhance data locality and minimize the

arXiv:2210.13361v2 [cs.AR] 19 Dec 2022

ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin

Shift Adder

Output

Pretrain and Search

Conv

1. PGP:

Update W

2.NAS:

Update W/α

Conv1

Conv5

Shift2

Adder3

Shift4

Hybrid

model

Train from scratch Chunk-based Accelerator

2.Deploy:

Auto-Mapper

CLP ALP

Noc

DRAM

Global Buffer

SLP

NASA-NAS NASA-Accelerator

Input

Figure 1.

An overview of our NASA framework integrating neu-

ral architecture search (NAS) and acceleration engines dedicated

for hybrid DNNs, where NASA-NAS searches for hybrid models

via NAS with the proposed progressive pretrain strategy (PGP)

while NASA-Accelerator advocates a dedicated chunk-based design

armed with an auto-mapper to support NASA-NAS searched DNNs.

dominant data movement cost; and [

] explores a low-bit

quantization algorithm paired with a minimalist hardware

design for AdderNet [

] to leverage its algorithmic benets

for boosted hardware eciency. While it has been shown

that dedicated accelerators can achieve up to three orders-of-

magnitude eciency improvement as compared to general

computing platforms, such as GPUs and CPUs, existing ac-

celerators are customized for either multiplication-based or

multiplication-free DNNs, and thus could not fully lever-

age the algorithmic properties of the aforementioned hybrid

DNNs for maximal eciency. Thus, it is promising and de-

sirable to develop dedicated accelerators for hybrid DNNs

consisting of both multiplication-based and multiplication-

free operators, which yet is still underexplored.

To marry the best of both worlds - the higher achievable

accuracy of multiplication-based DNNs and the better hard-

ware eciency of multiplication-free DNNs, we target the

exploration

and

acceleration

hybrid DNNs

, and make

the following contributions:

•

We propose

NASA

, a

eural

rchitecture

earch and

cceleration framework (see Fig 1) to search for and ac-

celerate hardware inspired hybrid DNNs. To the best of

our knowledge, NASA is

the rst

algorithm and hard-

ware co-design framework dedicated for hybrid DNNs.

•

We develop a dedicated NAS engine called NASA-NAS in-

tegrated in NASA, which incorporates hardware-friendly

shift layers [

] and/or adder layers [

] into a state-of-

the-art (SOTA) hardware friendly NAS search space [

]

to construct hybrid DNN search spaces. Furthermore, to

enable eective NAS on top of the hybrid search space,

we propose a

ressive

retrain strategy (PGP) paired

with customized training recipes.

•

We further develop a dedicated accelerator called NASA-

Accelerator to better leverage the algorithmic properties

of hybrid DNNs for improved hardware eciency. Our

NASA-Accelerator advocates a dedicated chunk-based ac-

celerator to better support the heterogeneous layers in

hybrid DNNs, and integrates an auto-mapper to automat-

ically search for optimal dataows for executing hybrid

DNNs in the above chunk-based accelerators to further

improve eciency.

•

Extensive experiments and ablation studies validate the

eectiveness and advantages of our NASA in terms

of achievable accuracy and eciency tradeos, against

both SOTA multiplication-free and multiplication-based

systems. We believe our work can open up an exit-

ing perspective for the exploration and deployment of

multiplication-reduced hybrid models to boost both task

accuracy and eciency.

2 Related Works

2.1 Multiplication-free DNNs

To favor hardware eciency, pioneering eorts have been

made to replace the cost-dominant multiplications in vanilla

DNNs with more hardware-friendly operators, e.g., bit-wise

shift and adder, for enabling handcrafted multiplication-free

DNNs. For instance, ShiftNet [

] treats shift operations

as a zero op/parameter alternative to spatial convolutions

and advocates DNNs featuring shift layers; DeepShift [

]

substitutes multiplications with bit-wise shifts; AdderNets

[

] trades multiplications with lower-cost additions and

employs an

ℓ

1-normal distance as a cross-correlation substi-

tute to measure the similarity between input features and

weights; and inspired by a common hardware practice that

implements multiplications with logical bitwise shifts and

additions [

], ShiftAddNet [

] unies both shift and adder

layers to design DNNs with merely shift and adder opera-

tors. However, multiplication-free DNNs in general are still

inferior to their multiplication-based counterparts in terms

of task accuracy, motivating our NASA framework aiming to

marry the best of both worlds from powerful multiplication-

based and hardware ecient multiplication-free DNNs.

2.2 Neural Architecture Search

Early NAS methods [

] utilize reinforcement learn-

ing (RL) to search for DNN architectures, which gained great

success but were resource- and time-consuming. To tackle

this limitation, weight sharing methods [

] have

been proposed. Among them, dierentiable NAS (DNAS)

algorithms [

] have achieved SOTA results by re-

laxing the discrete search space to be continuous and then

applying gradient-based optimization methods to nd opti-

mal architectures from a pre-dened dierentiable supernet.

Specically, FBNet [

] employs a Gumbel Softmax sampling

method [

] and gradient-based optimization to search for

ecient and accurate DNNs targeting mobile devices; FB-

NetV2 [

] designs a masking mechanism for feature map

reuses in both spatial and channel dimensions, expanding the

search space greatly at a cost of a small memory overhead;

alternatively, ProxylessNAS [

] activates only a few paths

during the forward and backward processes of search, mak-

ing it possible for DNAS to optimize with large search spaces.

Despite the prosperity of NAS for vanilla DNNs, there still

lacks eorts in exploring NAS designs for hybrid DNNs.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NASA:NeuralArchitectureSearchandAccelerationforHardwareInspiredHybridNetworksHuihongShi1,2∗,HaoranYou1,YangZhao3,ZhongfengWang2,andYingyanLin11GeorgiaInstituteofTechnology,2NanjingUniversity,3RiceUniversity,1USA,2P.R.China,3USA,{eiclab,hyou37,celine.lin}@gatech.edu,zy34@rice.edu,zfwang@nju.edu.cnAbs...

展开>> 收起<<

NASA Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

NASA Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: