
NASA: Neural Architecture Search and Acceleration
for Hardware Inspired Hybrid Networks
Huihong Shi1,2∗, Haoran You1, Yang Zhao3, Zhongfeng Wang2, and Yingyan Lin1
1Georgia Institute of Technology, 2Nanjing University, 3Rice University,
1USA, 2P.R. China, 3USA,
{eiclab,hyou37,celine.lin}@gatech.edu,zy34@rice.edu,zfwang@nju.edu.cn
Abstract
Multiplication is arguably the most cost-dominant oper-
ation in modern deep neural networks (DNNs), limiting
their achievable eciency and thus more extensive deploy-
ment in resource-constrained applications. To tackle this
limitation, pioneering works have developed handcrafted
multiplication-free DNNs, which require expert knowledge
and time-consuming manual iteration, calling for fast devel-
opment tools. To this end, we propose a
N
eural
A
rchitecture
S
earch and
A
cceleration framework dubbed NASA, which
enables automated multiplication-reduced DNN develop-
ment and integrates a dedicated multiplication-reduced accel-
erator for boosting DNNs’ achievable eciency. Specically,
NASA adopts neural architecture search (NAS) spaces that
augment the state-of-the-art one with hardware inspired
multiplication-free operators, such as shift and adder, armed
with a novel progressive pretrain strategy (PGP) together
with customized training recipes to automatically search for
optimal multiplication-reduced DNNs; On top of that, NASA
further develops a dedicated accelerator, which advocates a
chunk-based template and auto-mapper dedicated for NASA-
NAS resulting DNNs to better leverage their algorithmic
properties for boosting hardware eciency. Experimental
results and ablation studies consistently validate the advan-
tages of NASA’s algorithm-hardware co-design framework in
terms of achievable accuracy and eciency tradeos. Codes
are available at hps://github.com/GATECH-EIC/NASA.
ACM Reference Format:
Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan
Lin. 2022. NASA: Neural Architecture Search and Acceleration
∗
Work done when Huihong was a visiting student at Georgia Tech. Corre-
spondence should be addressed to: Zhongfeng Wang and Yingyan Lin.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
ICCAD ’22, October 30-November 3, 2022, San Diego, CA, USA
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9217-4/22/10. . . $15.00
hps://doi.org/10.1145/3508352.3549478
for Hardware Inspired Hybrid Networks. In IEEE/ACM Interna-
tional Conference on Computer-Aided Design (ICCAD ’22), October
30-November 3, 2022, San Diego, CA, USA. ACM, New York, NY, USA,
9pages. hps://doi.org/10.1145/3508352.3549478
1 Introduction
Modern deep neural networks (DNNs) have achieved great
success in various computer vision tasks [
7
,
14
,
15
,
17
],
which has motivated a substantially increased demand for
DNN-powered solutions in numerous real-world applica-
tions. However, the extensively used multiplications in DNNs
dominate their energy consumption and have largely chal-
lenged DNNs’ achievable hardware eciency, motivating
multiplication-free DNNs that adopt hardware-friendly oper-
ators, such as additions and bit-wise shifts, which require a
smaller unit energy and area cost as compared to multiplica-
tions [
26
]. In particular, pioneering works of multiplication-
free DNNs include (1) DeepShift [
6
] which proposes to adopt
merely shift layers for DNNs, (2) AdderNet [
20
] which advo-
cates using adder layers to implement DNNs for trading the
massive multiplications with lower-cost additions, and (3)
ShiftAddNet [
26
] which combines both shift and adder lay-
ers to construct DNNs for better trading-o the achievable
accuracy and eciency.
Despite the promising hardware eciency of the
multiplication-free DNNs, their models’ expressiveness ca-
pacity and thus achievable accuracy are generally inferior to
their multiplication-based counterparts. As such, it is highly
desired to develop hybrid multiplication-reduced DNNs that
integrate both multiplication-based and multiplication-free
operators (e.g., shift and adder) to boost the hardware e-
ciency while maintaining the task accuracy. Motivated by
the recent success of neural architecture search (NAS) in
automating the design of ecient and accurate DNNs, one
natural thought is to leverage NAS to automatically search
for the aforementioned hybrid DNNs for various applications
and tasks, each of which often requires a dierent accuracy-
eciency trade-o and thus calls for a dedicated design of
the algorithms and their corresponding accelerators.
In parallel, various techniques [
1
,
5
,
16
,
21
,
28
,
29
] have
been proposed to boost the hardware eciency of DNNs,
promoting their real-world deployment from the hardware
perspective. For example, Eyeriss [
5
] proposes a row sta-
tionary dataow and a micro-architecture with hierarchi-
cal memories to enhance data locality and minimize the
arXiv:2210.13361v2 [cs.AR] 19 Dec 2022