NASA Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

2025-05-02 0 0 969.94KB 9 页 10玖币
侵权投诉
NASA: Neural Architecture Search and Acceleration
for Hardware Inspired Hybrid Networks
Huihong Shi1,2, Haoran You1, Yang Zhao3, Zhongfeng Wang2, and Yingyan Lin1
1Georgia Institute of Technology, 2Nanjing University, 3Rice University,
1USA, 2P.R. China, 3USA,
{eiclab,hyou37,celine.lin}@gatech.edu,zy34@rice.edu,zfwang@nju.edu.cn
Abstract
Multiplication is arguably the most cost-dominant oper-
ation in modern deep neural networks (DNNs), limiting
their achievable eciency and thus more extensive deploy-
ment in resource-constrained applications. To tackle this
limitation, pioneering works have developed handcrafted
multiplication-free DNNs, which require expert knowledge
and time-consuming manual iteration, calling for fast devel-
opment tools. To this end, we propose a
N
eural
A
rchitecture
S
earch and
A
cceleration framework dubbed NASA, which
enables automated multiplication-reduced DNN develop-
ment and integrates a dedicated multiplication-reduced accel-
erator for boosting DNNs’ achievable eciency. Specically,
NASA adopts neural architecture search (NAS) spaces that
augment the state-of-the-art one with hardware inspired
multiplication-free operators, such as shift and adder, armed
with a novel progressive pretrain strategy (PGP) together
with customized training recipes to automatically search for
optimal multiplication-reduced DNNs; On top of that, NASA
further develops a dedicated accelerator, which advocates a
chunk-based template and auto-mapper dedicated for NASA-
NAS resulting DNNs to better leverage their algorithmic
properties for boosting hardware eciency. Experimental
results and ablation studies consistently validate the advan-
tages of NASA’s algorithm-hardware co-design framework in
terms of achievable accuracy and eciency tradeos. Codes
are available at hps://github.com/GATECH-EIC/NASA.
ACM Reference Format:
Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan
Lin. 2022. NASA: Neural Architecture Search and Acceleration
Work done when Huihong was a visiting student at Georgia Tech. Corre-
spondence should be addressed to: Zhongfeng Wang and Yingyan Lin.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9217-4/22/10. . . $15.00
hps://doi.org/10.1145/3508352.3549478
for Hardware Inspired Hybrid Networks. In IEEE/ACM Interna-
tional Conference on Computer-Aided Design (ICCAD ’22), October
30-November 3, 2022, San Diego, CA, USA. ACM, New York, NY, USA,
9pages. hps://doi.org/10.1145/3508352.3549478
1 Introduction
Modern deep neural networks (DNNs) have achieved great
success in various computer vision tasks [
7
,
14
,
15
,
17
],
which has motivated a substantially increased demand for
DNN-powered solutions in numerous real-world applica-
tions. However, the extensively used multiplications in DNNs
dominate their energy consumption and have largely chal-
lenged DNNs’ achievable hardware eciency, motivating
multiplication-free DNNs that adopt hardware-friendly oper-
ators, such as additions and bit-wise shifts, which require a
smaller unit energy and area cost as compared to multiplica-
tions [
26
]. In particular, pioneering works of multiplication-
free DNNs include (1) DeepShift [
6
] which proposes to adopt
merely shift layers for DNNs, (2) AdderNet [
20
] which advo-
cates using adder layers to implement DNNs for trading the
massive multiplications with lower-cost additions, and (3)
ShiftAddNet [
26
] which combines both shift and adder lay-
ers to construct DNNs for better trading-o the achievable
accuracy and eciency.
Despite the promising hardware eciency of the
multiplication-free DNNs, their models’ expressiveness ca-
pacity and thus achievable accuracy are generally inferior to
their multiplication-based counterparts. As such, it is highly
desired to develop hybrid multiplication-reduced DNNs that
integrate both multiplication-based and multiplication-free
operators (e.g., shift and adder) to boost the hardware e-
ciency while maintaining the task accuracy. Motivated by
the recent success of neural architecture search (NAS) in
automating the design of ecient and accurate DNNs, one
natural thought is to leverage NAS to automatically search
for the aforementioned hybrid DNNs for various applications
and tasks, each of which often requires a dierent accuracy-
eciency trade-o and thus calls for a dedicated design of
the algorithms and their corresponding accelerators.
In parallel, various techniques [
1
,
5
,
16
,
21
,
28
,
29
] have
been proposed to boost the hardware eciency of DNNs,
promoting their real-world deployment from the hardware
perspective. For example, Eyeriss [
5
] proposes a row sta-
tionary dataow and a micro-architecture with hierarchi-
cal memories to enhance data locality and minimize the
arXiv:2210.13361v2 [cs.AR] 19 Dec 2022
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin
Shift Adder
Output
Pretrain and Search
Conv
1. PGP:
Update W
2.NAS:
Update W/α
Conv1
Conv5
Shift2
Adder3
Shift4
Hybrid
model
Train from scratch Chunk-based Accelerator
2.Deploy:
Auto-Mapper
CLP ALP
Noc
DRAM
Global Buffer
SLP
NASA-NAS NASA-Accelerator
Input
Figure 1.
An overview of our NASA framework integrating neu-
ral architecture search (NAS) and acceleration engines dedicated
for hybrid DNNs, where NASA-NAS searches for hybrid models
via NAS with the proposed progressive pretrain strategy (PGP)
while NASA-Accelerator advocates a dedicated chunk-based design
armed with an auto-mapper to support NASA-NAS searched DNNs.
dominant data movement cost; and [
21
] explores a low-bit
quantization algorithm paired with a minimalist hardware
design for AdderNet [
20
] to leverage its algorithmic benets
for boosted hardware eciency. While it has been shown
that dedicated accelerators can achieve up to three orders-of-
magnitude eciency improvement as compared to general
computing platforms, such as GPUs and CPUs, existing ac-
celerators are customized for either multiplication-based or
multiplication-free DNNs, and thus could not fully lever-
age the algorithmic properties of the aforementioned hybrid
DNNs for maximal eciency. Thus, it is promising and de-
sirable to develop dedicated accelerators for hybrid DNNs
consisting of both multiplication-based and multiplication-
free operators, which yet is still underexplored.
To marry the best of both worlds - the higher achievable
accuracy of multiplication-based DNNs and the better hard-
ware eciency of multiplication-free DNNs, we target the
exploration
and
acceleration
of
hybrid DNNs
, and make
the following contributions:
We propose
NASA
, a
N
eural
A
rchitecture
S
earch and
A
cceleration framework (see Fig 1) to search for and ac-
celerate hardware inspired hybrid DNNs. To the best of
our knowledge, NASA is
the rst
algorithm and hard-
ware co-design framework dedicated for hybrid DNNs.
We develop a dedicated NAS engine called NASA-NAS in-
tegrated in NASA, which incorporates hardware-friendly
shift layers [
6
] and/or adder layers [
20
] into a state-of-
the-art (SOTA) hardware friendly NAS search space [
22
]
to construct hybrid DNN search spaces. Furthermore, to
enable eective NAS on top of the hybrid search space,
we propose a
P
ro
G
ressive
P
retrain strategy (PGP) paired
with customized training recipes.
We further develop a dedicated accelerator called NASA-
Accelerator to better leverage the algorithmic properties
of hybrid DNNs for improved hardware eciency. Our
NASA-Accelerator advocates a dedicated chunk-based ac-
celerator to better support the heterogeneous layers in
hybrid DNNs, and integrates an auto-mapper to automat-
ically search for optimal dataows for executing hybrid
DNNs in the above chunk-based accelerators to further
improve eciency.
Extensive experiments and ablation studies validate the
eectiveness and advantages of our NASA in terms
of achievable accuracy and eciency tradeos, against
both SOTA multiplication-free and multiplication-based
systems. We believe our work can open up an exit-
ing perspective for the exploration and deployment of
multiplication-reduced hybrid models to boost both task
accuracy and eciency.
2 Related Works
2.1 Multiplication-free DNNs
To favor hardware eciency, pioneering eorts have been
made to replace the cost-dominant multiplications in vanilla
DNNs with more hardware-friendly operators, e.g., bit-wise
shift and adder, for enabling handcrafted multiplication-free
DNNs. For instance, ShiftNet [
23
] treats shift operations
as a zero op/parameter alternative to spatial convolutions
and advocates DNNs featuring shift layers; DeepShift [
6
]
substitutes multiplications with bit-wise shifts; AdderNets
[
20
] trades multiplications with lower-cost additions and
employs an
1-normal distance as a cross-correlation substi-
tute to measure the similarity between input features and
weights; and inspired by a common hardware practice that
implements multiplications with logical bitwise shifts and
additions [
25
], ShiftAddNet [
26
] unies both shift and adder
layers to design DNNs with merely shift and adder opera-
tors. However, multiplication-free DNNs in general are still
inferior to their multiplication-based counterparts in terms
of task accuracy, motivating our NASA framework aiming to
marry the best of both worlds from powerful multiplication-
based and hardware ecient multiplication-free DNNs.
2.2 Neural Architecture Search
Early NAS methods [
13
,
31
,
32
] utilize reinforcement learn-
ing (RL) to search for DNN architectures, which gained great
success but were resource- and time-consuming. To tackle
this limitation, weight sharing methods [
3
,
11
,
12
] have
been proposed. Among them, dierentiable NAS (DNAS)
algorithms [
3
,
11
,
22
] have achieved SOTA results by re-
laxing the discrete search space to be continuous and then
applying gradient-based optimization methods to nd opti-
mal architectures from a pre-dened dierentiable supernet.
Specically, FBNet [
22
] employs a Gumbel Softmax sampling
method [
9
] and gradient-based optimization to search for
ecient and accurate DNNs targeting mobile devices; FB-
NetV2 [
18
] designs a masking mechanism for feature map
reuses in both spatial and channel dimensions, expanding the
search space greatly at a cost of a small memory overhead;
alternatively, ProxylessNAS [
3
] activates only a few paths
during the forward and backward processes of search, mak-
ing it possible for DNAS to optimize with large search spaces.
Despite the prosperity of NAS for vanilla DNNs, there still
lacks eorts in exploring NAS designs for hybrid DNNs.
摘要:

NASA:NeuralArchitectureSearchandAccelerationforHardwareInspiredHybridNetworksHuihongShi1,2∗,HaoranYou1,YangZhao3,ZhongfengWang2,andYingyanLin11GeorgiaInstituteofTechnology,2NanjingUniversity,3RiceUniversity,1USA,2P.R.China,3USA,{eiclab,hyou37,celine.lin}@gatech.edu,zy34@rice.edu,zfwang@nju.edu.cnAbs...

展开>> 收起<<
NASA Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:969.94KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注