BayesFT: Bayesian Optimization for Fault Tolerant
Neural Network Architecture
Nanyang Ye
Shanghai Jiao Tong University
Shanghai, China
ynylincoln@sjtu.edu.cn
Jingbiao Mei
University of Cambridge
Cambridge, United Kingdom
jm2245@cam.ac.uk
Zhicheng Fang
Shanghai Jiao Tong University
Shanghai, China
fangzhicheng@sjtu.edu.cn
Yuwen Zhang
University College London
London, United Kingdom
yuwen.zhang.20@ucl.ac.uk
Ziqing Zhang
University of Cambridge
Cambridge, United Kingdom
zz404@cam.ac.uk
Huaying Wu
Shanghai Jiao Tong University
Shanghai, China
wuhuaying@sjtu.edu.cn
Xiaoyao Liang
Shanghai Jiao Tong University
Shanghai, China
liang-xy@sjtu.edu.cn
Abstract—To deploy deep learning algorithms on resource-
limited scenarios, an emerging device-resistive random access
memory (ReRAM) has been regarded as promising via analog
computing. However, the practicability of ReRAM is primarily
limited due to the weight drifting of ReRAM neural networks
due to multi-factor reasons, including manufacturing, thermal
noises, and etc. In this paper, we propose a novel Bayesian
optimization method for fault tolerant neural network archi-
tecture (BayesFT). For neural architecture search space design,
instead of conducting neural architecture search on the whole
feasible neural architecture search space, we first systematically
explore the weight drifting tolerance of different neural network
components, such as dropout, normalization, number of layers,
and activation functions in which dropout is found to be able to
improve the neural network robustness to weight drifting. Based
on our analysis, we propose an efficient search space by only
searching for dropout rates for each layer. Then, we use Bayesian
optimization to search for the optimal neural architecture robust
to weight drifting. Empirical experiments demonstrate that our
algorithmic framework has outperformed the state-of-the-art
methods by up to 10 times on various tasks, such as image
classification and object detection.
I. INTRODUCTION
Deep learning has achieved tremendous success in various
fields, such as image classification, objection detection, natural
language processing, and autonomous driving. To deploy deep
learning algorithms on resource limited scenarios, such as
internet of things, a lot of research has been conducted on
integrating deep learning algorithms into deep neural network
(DNN) accelerators, such as FPGAs, and domain specific
ASICs. Whereas these approaches have demonstrated energy,
latency and throughout efficiency improvements over tradi-
tional ways of using a general-purpose graphic computing
unit (GPU), one inherent limitation is that digital circuits
consume a lot of power to maintain high enough triggering
voltage to differentiate two states. Besides, unlike human
brains where neurons are all capable of computation and
storage, information has to be transmitted repeatedly between
computing component and memory to update DNNs. These
properties are fundamentally different from human brains and
lead to high energy costs and arguably deviating our DNN
systems from emulating human intelligence.
To build machines like humans, neuromorphic computing
has been proposed to simulate the human brain circuits
for deep learning, which receives wide attention both from
academia and industry. One emerging trend in neuromorphic
computing is resistive random access memory (ReRAM) for
deep learning with memristors [1]–[3]. Memristor is a non-
volatile electronic memory device and one of the four fun-
damental electronic circuit components taking decades to be
realized.
However, ReRAM has been demonstrated to be not well
compatible with existing deep learning paradigms designed
for deterministic circuit behaviors. Due to the analog property
of ReRAM, the stability of ReRAM can be largely affected
by thermal noises, electrical noises, process variations, and
programming errors. The weights of DNN represented by
the memristance of a memristor cell, can be easily distorted,
largely jeopardizing the utility of the ReRAM deep learning
systems.
To mitigate the negative effects of memristance distortion,
several methods have been proposed whereas most of the
settings are at the cost of extra hardware costs. For example,
Liu et al. first learned the importance of neural network
weights and then finetuned the important weights that were
distorted [4]. Chen et al. proposed a method to re-write DNN
into ReRAM after diagnosis for each ReRAM device. This
approach is not scalable as re-training DNN is needed for
each weight distortion pattern of ReRAM devices [5]. While
improvements have been observed, these methods ignore fac-
tors, such as programming errors and weight drifting during
usage. Besides, they are not scalable for massive production of
ReRAM devices. Diagnosing and re-training DNNs for each
ReRAM device are time-consuming and expensive. Recently,
Liu et al. mitigated this problem with a new DNN architec-
ture by substituting the error correction code scheme of the
commonly-used softmax layer for outputting the prediction
for image classification tasks [6]. In this approach, instead of
arXiv:2210.01795v1 [cs.LG] 30 Sep 2022