BayesFT Bayesian Optimization for Fault Tolerant Neural Network Architecture Nanyang Ye

2025-04-27 0 0 2.63MB 7 页 10玖币

侵权投诉

BayesFT: Bayesian Optimization for Fault Tolerant

Neural Network Architecture

Nanyang Ye

Shanghai Jiao Tong University

Shanghai, China

ynylincoln@sjtu.edu.cn

Jingbiao Mei

University of Cambridge

Cambridge, United Kingdom

jm2245@cam.ac.uk

Zhicheng Fang

Shanghai Jiao Tong University

Shanghai, China

fangzhicheng@sjtu.edu.cn

Yuwen Zhang

University College London

London, United Kingdom

yuwen.zhang.20@ucl.ac.uk

Ziqing Zhang

University of Cambridge

Cambridge, United Kingdom

zz404@cam.ac.uk

Huaying Wu

Shanghai Jiao Tong University

Shanghai, China

wuhuaying@sjtu.edu.cn

Xiaoyao Liang

Shanghai Jiao Tong University

Shanghai, China

liang-xy@sjtu.edu.cn

Abstract—To deploy deep learning algorithms on resource-

limited scenarios, an emerging device-resistive random access

memory (ReRAM) has been regarded as promising via analog

computing. However, the practicability of ReRAM is primarily

limited due to the weight drifting of ReRAM neural networks

due to multi-factor reasons, including manufacturing, thermal

noises, and etc. In this paper, we propose a novel Bayesian

optimization method for fault tolerant neural network archi-

tecture (BayesFT). For neural architecture search space design,

instead of conducting neural architecture search on the whole

feasible neural architecture search space, we ﬁrst systematically

explore the weight drifting tolerance of different neural network

components, such as dropout, normalization, number of layers,

and activation functions in which dropout is found to be able to

improve the neural network robustness to weight drifting. Based

on our analysis, we propose an efﬁcient search space by only

searching for dropout rates for each layer. Then, we use Bayesian

optimization to search for the optimal neural architecture robust

to weight drifting. Empirical experiments demonstrate that our

algorithmic framework has outperformed the state-of-the-art

methods by up to 10 times on various tasks, such as image

classiﬁcation and object detection.

I. INTRODUCTION

Deep learning has achieved tremendous success in various

ﬁelds, such as image classiﬁcation, objection detection, natural

language processing, and autonomous driving. To deploy deep

learning algorithms on resource limited scenarios, such as

internet of things, a lot of research has been conducted on

integrating deep learning algorithms into deep neural network

(DNN) accelerators, such as FPGAs, and domain speciﬁc

ASICs. Whereas these approaches have demonstrated energy,

latency and throughout efﬁciency improvements over tradi-

tional ways of using a general-purpose graphic computing

unit (GPU), one inherent limitation is that digital circuits

consume a lot of power to maintain high enough triggering

voltage to differentiate two states. Besides, unlike human

brains where neurons are all capable of computation and

storage, information has to be transmitted repeatedly between

computing component and memory to update DNNs. These

properties are fundamentally different from human brains and

lead to high energy costs and arguably deviating our DNN

systems from emulating human intelligence.

To build machines like humans, neuromorphic computing

has been proposed to simulate the human brain circuits

for deep learning, which receives wide attention both from

academia and industry. One emerging trend in neuromorphic

computing is resistive random access memory (ReRAM) for

deep learning with memristors [1]–[3]. Memristor is a non-

volatile electronic memory device and one of the four fun-

damental electronic circuit components taking decades to be

realized.

However, ReRAM has been demonstrated to be not well

compatible with existing deep learning paradigms designed

for deterministic circuit behaviors. Due to the analog property

of ReRAM, the stability of ReRAM can be largely affected

by thermal noises, electrical noises, process variations, and

programming errors. The weights of DNN represented by

the memristance of a memristor cell, can be easily distorted,

largely jeopardizing the utility of the ReRAM deep learning

systems.

To mitigate the negative effects of memristance distortion,

several methods have been proposed whereas most of the

settings are at the cost of extra hardware costs. For example,

Liu et al. ﬁrst learned the importance of neural network

weights and then ﬁnetuned the important weights that were

distorted [4]. Chen et al. proposed a method to re-write DNN

into ReRAM after diagnosis for each ReRAM device. This

approach is not scalable as re-training DNN is needed for

each weight distortion pattern of ReRAM devices [5]. While

improvements have been observed, these methods ignore fac-

tors, such as programming errors and weight drifting during

usage. Besides, they are not scalable for massive production of

ReRAM devices. Diagnosing and re-training DNNs for each

ReRAM device are time-consuming and expensive. Recently,

Liu et al. mitigated this problem with a new DNN architec-

ture by substituting the error correction code scheme of the

commonly-used softmax layer for outputting the prediction

for image classiﬁcation tasks [6]. In this approach, instead of

arXiv:2210.01795v1 [cs.LG] 30 Sep 2022

predicting each class’s probability in image classiﬁcation, it

computed a series of binary codes from images and predicted

the image’s class by comparing the series of codes with each

class’s codes precomputed and stored in a codebook. For

example, if for an image, the computed code is 10001, and

in the codebook, the class cat’s code is 10000 and the dog’s

code is 11111. As the computed code has a smaller hamming

distance to the class cat’s code, the neural network will output

cat as the result. Although this method does not need re-

training DNNs each time, in this scheme, the errors caused

by the previous layer’s weight drifts will propagate to later

layers, leading to high error entanglement in the last layers

responsible for generating codes. Besides, the error-correction

scheme is designed for image classiﬁcation tasks cannot be

directly implemented in object detection tasks that are crucial

in many applications, such as autonomous driving.

In this paper, we revisited the problem of fault tolerance of

neural networks and identiﬁed several factors that are crucial

to the robustness to weight drifting. Perhaps surprisingly, we

found that the architectural choice (i.e. dropout, normalization,

and complexity of models, etc) played an essential role in

determining the robustness to weight drifting. We proposed a

Bayesian optimization method to automatically search for fault

tolerant neural network architectures. We name this approach

“BayesFT”.

Our contributions can be summarized as follows:

1) We systematically analyzed the weight drifting robust-

ness of different neural architecture components. We

identiﬁed key architectural factors in determining the

weight drifting robustness, such as dropout, normaliza-

tion, and complexity of models.

2) Based on our analysis, we proposed a Bayesian opti-

mization framework—BayesFT to automatically search

for the fault tolerant neural network architectures that

are robust to weight drifting in ReRAM devices.

3) We conducted extensive numerical experiments on var-

ious tasks and datasets, such as CIFAR-10, trafﬁc sign

recognition for image classiﬁcation and PennFudanPed

for object detection. Results demonstrated that our meth-

ods could improve robustness by more than 10-100 times

with only negligible computational costs and engineer-

ing efforts.

II. PRELIMINARY

A. Basics of DNN

A DNN can be viewed as the composition of many non-

linear functions. Formally, given input data x∈ Rdand

its corresponding label y, the task is to minimize the loss

`(fθ(x),y), where `is the loss function, fis the neural

network with weights w. For a Klayer neural network, f

can be viewed as the composite of a series of functions

f=f1◦f2◦ · · · ◦ fK. There are several commonly used

layers in DNN. Convolutional layers extract features with con-

volution operation based on learned kernels. Fully connected

layers apply non-linear function after matrix product. For more

detailed introductions, we refer readers to the deep learning

book [7].

B. Memristance drifting modeling

Following the setting of [6] and [5], to simulate the mem-

ristance drifting due to multi factors as mentioned above,

we apply the following drifting term to each neural network

weight w:

θ0←θeλ, λ ∼ N (0, σ2)(1)

where θ0is the drifted neural network parameters, which

follows a log-normal distribution. We can vary σto change

the level of variation to simulate different ReRAM devices

and deployment scenarios. It is worth noting that although we

consider this setting in our paper, our methodology can be

seamlessly extended to other possible weight drifting distribu-

tions.

The weight drifting in ReRAM can cause signiﬁcant per-

formance degradation for DNNs. To visualize this, a plotting

of simple binary classiﬁcation dataset generated with Scikit-

Learn is presented. As the level of weight perturbation in-

creases, the shape of the decision boundary shifts and therefore

reduces the accuracy of classiﬁcation. These ﬁgures give the

intuition that the weight perturbation would cause reduction

in classiﬁcation accuracy.

Fig. 1: Decision boundary shifts caused by memristance drift-

ing

III. BAYESFT: BAYESIAN OPTIMIZATION FOR FAULT

TOLERANT NEURAL NETWORK ARCHITECTURE

A. Exploration of fault tolerant neural architecture

We ﬁrst do an ablation study to investigate the fault tol-

erance of neural architecture factors, such as dropout, nor-

malization, model complexity, and activation function using a

multi-layer perceptron (MLP) on MNIST dataset1. The results

are shown in Figure 2. Next, we will discuss the experiment

results in detail.

1Same experiments are also conducted with larger models on CIFAR-10

dataset and the results are similar.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BayesFT:BayesianOptimizationforFaultTolerantNeuralNetworkArchitectureNanyangYeShanghaiJiaoTongUniversityShanghai,Chinaynylincoln@sjtu.edu.cnJingbiaoMeiUniversityofCambridgeCambridge,UnitedKingdomjm2245@cam.ac.ukZhichengFangShanghaiJiaoTongUniversityShanghai,Chinafangzhicheng@sjtu.edu.cnYuwenZhangUni...

展开>> 收起<<

BayesFT Bayesian Optimization for Fault Tolerant Neural Network Architecture Nanyang Ye.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BayesFT Bayesian Optimization for Fault Tolerant Neural Network Architecture Nanyang Ye

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: