BayesFT Bayesian Optimization for Fault Tolerant Neural Network Architecture Nanyang Ye

2025-04-27 0 0 2.63MB 7 页 10玖币
侵权投诉
BayesFT: Bayesian Optimization for Fault Tolerant
Neural Network Architecture
Nanyang Ye
Shanghai Jiao Tong University
Shanghai, China
ynylincoln@sjtu.edu.cn
Jingbiao Mei
University of Cambridge
Cambridge, United Kingdom
jm2245@cam.ac.uk
Zhicheng Fang
Shanghai Jiao Tong University
Shanghai, China
fangzhicheng@sjtu.edu.cn
Yuwen Zhang
University College London
London, United Kingdom
yuwen.zhang.20@ucl.ac.uk
Ziqing Zhang
University of Cambridge
Cambridge, United Kingdom
zz404@cam.ac.uk
Huaying Wu
Shanghai Jiao Tong University
Shanghai, China
wuhuaying@sjtu.edu.cn
Xiaoyao Liang
Shanghai Jiao Tong University
Shanghai, China
liang-xy@sjtu.edu.cn
Abstract—To deploy deep learning algorithms on resource-
limited scenarios, an emerging device-resistive random access
memory (ReRAM) has been regarded as promising via analog
computing. However, the practicability of ReRAM is primarily
limited due to the weight drifting of ReRAM neural networks
due to multi-factor reasons, including manufacturing, thermal
noises, and etc. In this paper, we propose a novel Bayesian
optimization method for fault tolerant neural network archi-
tecture (BayesFT). For neural architecture search space design,
instead of conducting neural architecture search on the whole
feasible neural architecture search space, we first systematically
explore the weight drifting tolerance of different neural network
components, such as dropout, normalization, number of layers,
and activation functions in which dropout is found to be able to
improve the neural network robustness to weight drifting. Based
on our analysis, we propose an efficient search space by only
searching for dropout rates for each layer. Then, we use Bayesian
optimization to search for the optimal neural architecture robust
to weight drifting. Empirical experiments demonstrate that our
algorithmic framework has outperformed the state-of-the-art
methods by up to 10 times on various tasks, such as image
classification and object detection.
I. INTRODUCTION
Deep learning has achieved tremendous success in various
fields, such as image classification, objection detection, natural
language processing, and autonomous driving. To deploy deep
learning algorithms on resource limited scenarios, such as
internet of things, a lot of research has been conducted on
integrating deep learning algorithms into deep neural network
(DNN) accelerators, such as FPGAs, and domain specific
ASICs. Whereas these approaches have demonstrated energy,
latency and throughout efficiency improvements over tradi-
tional ways of using a general-purpose graphic computing
unit (GPU), one inherent limitation is that digital circuits
consume a lot of power to maintain high enough triggering
voltage to differentiate two states. Besides, unlike human
brains where neurons are all capable of computation and
storage, information has to be transmitted repeatedly between
computing component and memory to update DNNs. These
properties are fundamentally different from human brains and
lead to high energy costs and arguably deviating our DNN
systems from emulating human intelligence.
To build machines like humans, neuromorphic computing
has been proposed to simulate the human brain circuits
for deep learning, which receives wide attention both from
academia and industry. One emerging trend in neuromorphic
computing is resistive random access memory (ReRAM) for
deep learning with memristors [1]–[3]. Memristor is a non-
volatile electronic memory device and one of the four fun-
damental electronic circuit components taking decades to be
realized.
However, ReRAM has been demonstrated to be not well
compatible with existing deep learning paradigms designed
for deterministic circuit behaviors. Due to the analog property
of ReRAM, the stability of ReRAM can be largely affected
by thermal noises, electrical noises, process variations, and
programming errors. The weights of DNN represented by
the memristance of a memristor cell, can be easily distorted,
largely jeopardizing the utility of the ReRAM deep learning
systems.
To mitigate the negative effects of memristance distortion,
several methods have been proposed whereas most of the
settings are at the cost of extra hardware costs. For example,
Liu et al. first learned the importance of neural network
weights and then finetuned the important weights that were
distorted [4]. Chen et al. proposed a method to re-write DNN
into ReRAM after diagnosis for each ReRAM device. This
approach is not scalable as re-training DNN is needed for
each weight distortion pattern of ReRAM devices [5]. While
improvements have been observed, these methods ignore fac-
tors, such as programming errors and weight drifting during
usage. Besides, they are not scalable for massive production of
ReRAM devices. Diagnosing and re-training DNNs for each
ReRAM device are time-consuming and expensive. Recently,
Liu et al. mitigated this problem with a new DNN architec-
ture by substituting the error correction code scheme of the
commonly-used softmax layer for outputting the prediction
for image classification tasks [6]. In this approach, instead of
arXiv:2210.01795v1 [cs.LG] 30 Sep 2022
predicting each class’s probability in image classification, it
computed a series of binary codes from images and predicted
the image’s class by comparing the series of codes with each
class’s codes precomputed and stored in a codebook. For
example, if for an image, the computed code is 10001, and
in the codebook, the class cat’s code is 10000 and the dog’s
code is 11111. As the computed code has a smaller hamming
distance to the class cat’s code, the neural network will output
cat as the result. Although this method does not need re-
training DNNs each time, in this scheme, the errors caused
by the previous layer’s weight drifts will propagate to later
layers, leading to high error entanglement in the last layers
responsible for generating codes. Besides, the error-correction
scheme is designed for image classification tasks cannot be
directly implemented in object detection tasks that are crucial
in many applications, such as autonomous driving.
In this paper, we revisited the problem of fault tolerance of
neural networks and identified several factors that are crucial
to the robustness to weight drifting. Perhaps surprisingly, we
found that the architectural choice (i.e. dropout, normalization,
and complexity of models, etc) played an essential role in
determining the robustness to weight drifting. We proposed a
Bayesian optimization method to automatically search for fault
tolerant neural network architectures. We name this approach
“BayesFT”.
Our contributions can be summarized as follows:
1) We systematically analyzed the weight drifting robust-
ness of different neural architecture components. We
identified key architectural factors in determining the
weight drifting robustness, such as dropout, normaliza-
tion, and complexity of models.
2) Based on our analysis, we proposed a Bayesian opti-
mization framework—BayesFT to automatically search
for the fault tolerant neural network architectures that
are robust to weight drifting in ReRAM devices.
3) We conducted extensive numerical experiments on var-
ious tasks and datasets, such as CIFAR-10, traffic sign
recognition for image classification and PennFudanPed
for object detection. Results demonstrated that our meth-
ods could improve robustness by more than 10-100 times
with only negligible computational costs and engineer-
ing efforts.
II. PRELIMINARY
A. Basics of DNN
A DNN can be viewed as the composition of many non-
linear functions. Formally, given input data x∈ Rdand
its corresponding label y, the task is to minimize the loss
`(fθ(x),y), where `is the loss function, fis the neural
network with weights w. For a Klayer neural network, f
can be viewed as the composite of a series of functions
f=f1f2◦ · · · ◦ fK. There are several commonly used
layers in DNN. Convolutional layers extract features with con-
volution operation based on learned kernels. Fully connected
layers apply non-linear function after matrix product. For more
detailed introductions, we refer readers to the deep learning
book [7].
B. Memristance drifting modeling
Following the setting of [6] and [5], to simulate the mem-
ristance drifting due to multi factors as mentioned above,
we apply the following drifting term to each neural network
weight w:
θ0θeλ, λ N (0, σ2)(1)
where θ0is the drifted neural network parameters, which
follows a log-normal distribution. We can vary σto change
the level of variation to simulate different ReRAM devices
and deployment scenarios. It is worth noting that although we
consider this setting in our paper, our methodology can be
seamlessly extended to other possible weight drifting distribu-
tions.
The weight drifting in ReRAM can cause significant per-
formance degradation for DNNs. To visualize this, a plotting
of simple binary classification dataset generated with Scikit-
Learn is presented. As the level of weight perturbation in-
creases, the shape of the decision boundary shifts and therefore
reduces the accuracy of classification. These figures give the
intuition that the weight perturbation would cause reduction
in classification accuracy.
Fig. 1: Decision boundary shifts caused by memristance drift-
ing
III. BAYESFT: BAYESIAN OPTIMIZATION FOR FAULT
TOLERANT NEURAL NETWORK ARCHITECTURE
A. Exploration of fault tolerant neural architecture
We first do an ablation study to investigate the fault tol-
erance of neural architecture factors, such as dropout, nor-
malization, model complexity, and activation function using a
multi-layer perceptron (MLP) on MNIST dataset1. The results
are shown in Figure 2. Next, we will discuss the experiment
results in detail.
1Same experiments are also conducted with larger models on CIFAR-10
dataset and the results are similar.
摘要:

BayesFT:BayesianOptimizationforFaultTolerantNeuralNetworkArchitectureNanyangYeShanghaiJiaoTongUniversityShanghai,Chinaynylincoln@sjtu.edu.cnJingbiaoMeiUniversityofCambridgeCambridge,UnitedKingdomjm2245@cam.ac.ukZhichengFangShanghaiJiaoTongUniversityShanghai,Chinafangzhicheng@sjtu.edu.cnYuwenZhangUni...

展开>> 收起<<
BayesFT Bayesian Optimization for Fault Tolerant Neural Network Architecture Nanyang Ye.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:2.63MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注