Low Error-Rate Approximate Multiplier Design for DNNs with Hardware-Driven Co-Optimization Yao Lu Jide Zhang Su Zheng Zhen Li Lingli Wang

2025-05-02 0 0 323.18KB 5 页 10玖币
侵权投诉
Low Error-Rate Approximate Multiplier Design for
DNNs with Hardware-Driven Co-Optimization
Yao Lu, Jide Zhang, Su Zheng, Zhen Li, Lingli Wang
State Key Laboratory of ASIC & System, Fudan University, Shanghai, China
Emails: {yaolu20, szheng19, lizhen19, llwang}@fudan.edu.cn, zhangjd21@m.fudan.edu.cn
Abstract—In this paper, two approximate 3×3 multipliers are
proposed and the synthesis results of the ASAP-7nm process
library justify that they can reduce the area by 31.38% and
36.17%, and the power consumption by 36.73% and 35.66%
compared with the exact multiplier, respectively. They can be
aggregated with a 2×2 multiplier to produce an 8×8 multiplier
with low error-rate based on the distribution of DNN weights.
We propose a hardware-driven software co-optimization method
to improve the DNN accuracy by retraining. Based on the
proposed two approximate 3-bit multipliers, three approximate
8-bit multipliers with low error-rate are designed for DNNs.
Compared with the exact 8-bit unsigned multiplier, our design
can achieve a significant advantage over other approximate
multipliers on the public dataset.
Index Terms—hardware-driven co-optimization, approximate
computing, low error-rate multiplier design, low DNN accuracy
loss
I. INTRODUCTION
Approximate computing has gradually become an energy-
saving solution for digital systems [1]. In some error-tolerant
applications that do not require high precision, such as im-
age processing, communication system, deep neural networks
(DNNs), etc., approximate arithmetic circuits can usually bring
the hardware cost advantages such as the area, power, and
critical path delay. In DNN applications, the accuracy loss
caused by approximation can be negligible by the careful co-
optimization of approximate multiplier architectures with the
DNN retraining software platform.
References [2] has shown that approximate multipliers can
effectively reduce the area and power consumption with small
precision loss. It is a common way to approximate multiplica-
tion by algebraic methods. A logarithm-based multiplication
system is introduced in [3], which approximates the logarithm
by the Mitchell Algorithm through a linear function. An
iterative logarithmic multiplier is introduced in [4], which
continuously approaches the exact value by controlling the
number of decimal iterations. A further derived truncated error
correction [5] is applied to the truncated iterative multiplier
in [6], which provides greater flexibility than the static ap-
proximate multiplication, with the improvement of latency,
precision, and hardware cost.
Moreover, the research on approximate multiplication is no
longer limited to arithmetic algorithms but hardware archi-
tectures. Reference [7] describes in detail a low-power, high-
This work is supported by the National Natural Science Foundation of
China under grants 61971143 and 62174035.
performance approximate multiplier, SiEi, which improves
accuracy by compensating some of the errors. A multiplier
RoBA in [8] has the characteristics of high speed and energy
saving. It rounds the operand to the nearest exponent of
two. This method saves the intensive part of the multiplica-
tion operations at the cost of a high error rate. The error-
tolerant multiplier (ETM) of [9] is based on the truncation
of a multiplier into an accurate multiplication part for most-
significant-bits (MSBs) and a non-multiplication part for least-
significant-bits (LSBs). The method of modifying the low-
bit width multiplier based on the Karnaugh map (K-map)
proposed in [10] has been proven to effectively reduce the area
and critical path delay. This method is also used in [11] [12]
for the approximate 4-2 compressor, full adder, half adder, and
then applied in the reduction process of the Wallace multiplier
tree.
In [13] [14], there are different error metrics for approximate
multipliers: error distance (ED), mean error distance (MED),
normalized mean error distance (NMED), mean relative error
distance (MRED) and error rate (ER). These metrics are
useful to evaluate multipliers but not necessarily suitable for
various applications. Hence, DNN accuracy loss (DAL) is
adopted to reflect the accuracy of DNN caused by approximate
multipliers.
Through the aggregation of low bit-width multipliers, the
Wallace multiplier can effectively reduce the layer number of
adders to shorten the critical path. Reference [10] introduces
a 2×2 approximate multiplier, which has significant improve-
ment in both area and power consumption but leads to a high
accuracy loss after aggregating into a large multiplier. Hence
two 3×3 approximate multiplier architectures are proposed,
which can be used for the partial product generation for
aggregating into large multipliers according to the distribution
of DNN weights. In DNN accelerators, 8-bit is a common
configuration. The results of the 8-bit quantization experiment
in [15] are convictive and Eyeriss-v2 in [16] also uses 8-
bit quantization configuration. Three 8×8 unsigned multiplier
architectures are then proposed based on 3-bit multipliers and
evaluated by the extended DNN platform based on [17].
With the proposed approximate multipliers, the evaluation
results show that the average DAL is about 0.4% and 12%
for LeNet on MNIST dataset [18] and CIFAR10 dataset
[19], respectively. Our platform then retrains the DNN by
regularization to improve DNN accuracy, which can help
reduce the average accuracy loss to 0.2% and 9%. In addition,
arXiv:2210.03916v2 [cs.AR] 16 Nov 2022
摘要:

LowError-RateApproximateMultiplierDesignforDNNswithHardware-DrivenCo-OptimizationYaoLu,JideZhang,SuZheng,ZhenLi,LingliWangStateKeyLaboratoryofASIC&System,FudanUniversity,Shanghai,ChinaEmails:fyaolu20,szheng19,lizhen19,llwangg@fudan.edu.cn,zhangjd21@m.fudan.edu.cnAbstract—Inthispaper,twoapproximate3×...

展开>> 收起<<
Low Error-Rate Approximate Multiplier Design for DNNs with Hardware-Driven Co-Optimization Yao Lu Jide Zhang Su Zheng Zhen Li Lingli Wang.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:323.18KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注