DeepPerform An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks

2025-05-06 1 0 1.73MB 13 页 10玖币

侵权投诉

DeepPerform: An Eicient Approach for Performance Testing of

Resource-Constrained Neural Networks

Simin Chen

simin.chen@UTDallas.edu

UT Dallas

Dallas, USA

Mirazul Haque

mirazul.haque@utdallas.edu

UT Dallas

Dallas, USA

Cong Liu

congl@ucr.edu

UC Riverside

Riverside, USA

Wei Yang

wei.yang@utdallas.edu

UT Dallas

Dallas, USA

ABSTRACT

Today, an increasing number of Adaptive Deep Neural Networks

(AdNNs) are being used on resource-constrained embedded devices.

We observe that, similar to traditional software, redundant com-

putation exists in AdNNs, resulting in considerable performance

degradation. The performance degradation is dependent on the in-

put and is referred to as input-dependent performance bottlenecks

(IDPBs). To ensure an AdNN satises the performance requirements

of resource-constrained applications, it is essential to conduct per-

formance testing to detect IDPBs in the AdNN. Existing neural

network testing methods are primarily concerned with correctness

testing, which does not involve performance testing. To ll this

gap, we propose

DeepPerform

, a scalable approach to generate

test samples to detect the IDPBs in AdNNs. We rst demonstrate

how the problem of generating performance test samples detecting

IDPBs can be formulated as an optimization problem. Following

that, we demonstrate how

DeepPerform

eciently handles the op-

timization problem by learning and estimating the distribution of

AdNNs’ computational consumption. We evaluate

DeepPerform

three widely used datasets against ve popular AdNN models. The

results show that

DeepPerform

generates test samples that cause

more severe performance degradation (FLOPs: increase up to 552%).

Furthermore,

DeepPerform

is substantially more ecient than the

baseline methods in generating test inputs (runtime overhead: only

6–10 milliseconds).

CCS CONCEPTS

•Software and its engineering →Software notations and

tools;•Computing methodologies →Machine learning.

KEYWORDS

Machine learning, software testing, performance analysis

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than the

author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior specic permission

and/or a fee. Request permissions from permissions@acm.org.

ASE ’22, October 10–14, 2022, Rochester, MI, USA

ACM ISBN 978-1-4503-9475-8/22/10.

https://doi.org/10.1145/3551349.3561158

ACM Reference Format:

Simin Chen, Mirazul Haque, Cong Liu, and Wei Yang. 2022.

DeepPerform

An Ecient Approach for Performance Testing of Resource-Constrained

Neural Networks. In 37th IEEE/ACM International Conference on Automated

Software Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM,

New York, NY, USA, 13 pages. https://doi.org/10.1145/3551349.3561158

1 INTRODUCTION

Deep Neural Networks (DNNs) have shown potential in many ap-

plications, such as image classication, image segmentation, and ob-

ject detection [

]. However, the power of using DNNs comes

at substantial computational costs [

]. The costs,

especially the inference-time cost, can be a concern for deploying

DNNs on resource-constrained embedded devices such as mobile

phones and IoT devices. To enable deploying DNNs on resource-

constrained devices, researchers propose a series of Adaptive Neural

Networks (AdNNs) [

]. AdNNs selectively activate

partial computation units (e.g., convolution layer, fully connected

layer) for dierent inputs rather than whole units for computation.

The partial unit selection mechanism enables AdNNs to achieve

real-time prediction on resource-constrained devices.

Similar to the traditional systems [

], performance bottlenecks

also exist in AdNNs. Among the performance bottlenecks, some of

them can be detected only when given specic input values. Hence,

these problems are referred to as input-dependent performance

bottlenecks (IDPBs). Some IDPBs will cause severe performance

degradation and result in catastrophic consequences. For example,

consider an AdNN deployed on a drone for obstacle detection. If

AdNNs’ energy consumption increases ve times suddenly for spe-

cic inputs, it will make the drone out of battery in the middle of a

trip. Because of these reasons, conducting performance testing to

nd IDPB is a crucial step before AdNNs’ deployment process.

However, to the best of our knowledge, most of the existing work

for testing neural networks are mainly focusing on correctness test-

ing, which can not be applied to performance testing. The main

dierence between correctness testing and performance testing is

that correctness testing aims to detect models’ incorrect classica-

tions; while the performance testing is to nd IDPBs that trigger

performance degradation. Because incorrect classications may

not lead to performance degradation, existing correctness testing

methods can not be applied for performance testing. To ll this

gap and accelerate the process of deploying neural networks on

arXiv:2210.05370v2 [cs.LG] 20 Oct 2022

ASE ’22, October 10–14, 2022, Rochester, MI, USA Simin Chen, Mirazul Haque, Cong Liu, and Wei Yang

resource-constrained devices, there is a strong need for an auto-

mated performance testing framework to nd IDPBs.

We identify two main challenges in designing such a perfor-

mance testing framework. First, traditional performance metrics (e.g.,

latency, energy consumption) are hardware-dependent metrics.

Measuring these hardware-dependent metrics requires repeated

experiments because of the system noises. Thus, directly applying

these hardware-dependent metrics as guidelines to generate test

samples would be inecient. Second, AdNNs’ performance adjust-

ment strategy is learned from datasets rather than conforming to

logic specications (such as relations between model inputs and

outputs). Without a logical relation between AdNNs’ inputs and

AdNNs’ performance, it is challenging to search for inputs that can

trigger performance degradation in AdNNs.

To address the above challenges, we propose

DeepPerform

, which

enables ecient performance testing for AdNNs by generating test

samples that trigger IDPBs of AdNNs (

DeepPerform

focuses on

the performance testing of latency degradation and energy con-

sumption degradation as these two metrics are critical for per-

formance testing [

]). To address the rst challenge, we rst

conduct a preliminary study (§3) to illustrate the relationship be-

tween computational complexity (FLOPs) and hardware-dependent

performance metrics (latency, energy consumption). We then trans-

fer the problem of degrading system performance into increasing

AdNNs’ computational complexity (Eq.(3)). To address the second

challenge, we apply the a paradigm similar to Generative Adver-

sarial Networks (GANs) to design

DeepPerform

. In the training

process,

DeepPerform

learns and approximates the distribution of

the samples that require more computational complexity. After

DeepPerform

is well trained,

DeepPerform

generates test samples

that activate more redundant computational units in AdNNs. In

addition, because

DeepPerform

does not require backward prop-

agation during the test sample generation phase,

DeepPerform

generates test samples much more eciently, thus more scalable

for comprehensive testing on large models and datasets.

To evaluate

DeepPerform

, we select ve widely-used model-

dataset pairs as experimental subjects and explore following four

perspectives: eectiveness,eciency,coverage, and sensitivity. First,

to evaluate the eectiveness of the performance degradation caused

by test samples generated by

DeepPerform

, we measure the in-

crease in computational complexity (FLOPs) and resource con-

sumption (latency, energy) caused by the inputs generated by

DeepPerform

. For measuring eciency, we evaluate the online

time-overheads and total time-overheads of

DeepPerform

in gen-

erating dierent scale samples for dierent scale experimental sub-

jects. For coverage evaluation, we measure the computational units

covered by the test inputs generated by

DeepPerform

. For sensitiv-

ity measurement, we measure how

DeepPerform

’s eectiveness is

dependent on the ADNNs’ congurations and hardware platforms.

The experimental results show that

DeepPerform

generated inputs

increase AdNNs’ computational FLOPs up to 552%, with 6-10 mil-

liseconds overheads for generating one test sample. We summarize

our contribution as follows:

•Approach.

We propose a learning-based approach

, namely

DeepPerform

, to learn the distribution to generate the test

1https://github.com/SeekingDream/DeepPerform

samples for performance testing. Our novel design enables

generating test samples more eciently, thus enable scalable

performance testing.

•Evaluation.

We evaluate

DeepPerform

on ve AdNN mod-

els and three datasets. The evaluation results suggest that

DeepPerform

nds more severe diverse performance bugs

while covering more AdNNs’ behaviors, with only 6-10 mil-

liseconds of online overheads for generating test inputs.

•Application.

We demonstrate that developers could benet

from

DeepPerform

. Specically, developers can use the test

samples generated by

DeepPerform

to train a detector to

lter out the inputs requiring high abnormal computational

resources (§6).

2 BACKGROUND

2.1 AdNNs’ Working Mechanisms

DNN

Block

DNN

Block

DNN

Block

Computing

Unit

Computing

Unit

DNN

Block

DNN

Block

DNN

Block

Computing

Unit

Computing

Unit

Not

Used

Not

Used

Not

Used

Final

O/P

(a) Conditinal-skipping AdNNs

(b) Early-termination AdNNs

Figure 1: Working mechanism of AdNNs

The main objective of AdNNs [

] is to

balance performance and accuracy. As shown in Fig. 2, AdNNs will

allocate more computational resources to inputs with more complex

semantics. AdNNs use intermediate outputs to deactivate specic

components of neural networks, thus reducing computing resource

consumption. According to the working mechanism, AdNNs can

be divided mainly into two types: Conditional-skipping AdNNs and

Early-termination AdNNs, as shown in Fig. 1. Conditional-skipping

AdNNs skip specic layers/blocks if the intermediate outputs pro-

vided by specied computing units match predened criteria.

(in

the case of ResNet). The working mechanism of the conditional-

skipping AdNN can be formulated as:

(𝐼𝑛𝑖+1=𝑂𝑢𝑡𝑖,if 𝐵𝑖(𝑥) ≥ 𝜏𝑖

𝑂𝑢𝑡𝑖+1=𝑂𝑢𝑡𝑖,otherwise (1)

where

𝑥

is the input,

𝐼𝑛𝑖

represents the input of

𝑖𝑡ℎ

layer,

𝑂𝑢𝑡𝑖

represents the output of

𝑖𝑡ℎ

layer,

𝐵𝑖

represents the specied com-

puting unit output of

𝑖𝑡ℎ

layer and

𝜏𝑖

is the congurable threshold

that decides AdNNs’ performance-accuracy trade-o mode. Early-

termination AdNNs terminate computation early if the intermediate

a block consists of multiple layers whose output is determined by adding the output

of the last layer and input to the block.

DeepPerform: An Eicient Approach for Performance Testing of Resource-Constrained Neural Networks ASE ’22, October 10–14, 2022, Rochester, MI, USA

outputs satisfy a particular criteria. The working mechanism of

early-termination AdNNs can be formulated as,

(𝐸𝑥𝑖𝑡𝑁 𝑁 (𝑥)=𝐸𝑥𝑖𝑡𝑖(𝑥),if 𝐵𝑖(𝑥) ≥ 𝜏𝑖

𝐼𝑛𝑖+1(𝑥)=𝑂𝑢𝑡𝑖(𝑥),otherwise (2)

2.2 Redundant Computation

In a software program, if an operation is not required but performed,

we term the operation as redundant operation. For Adaptive Neural

Networks, if a component is activated without aecting AdNNs’

nal predictions, we dene the computation as a redundant com-

putation. AdNNs are created based on the philosophy that all the

inputs should not require all DNN components for inference. For

example, we can refer to the images in Fig. 2. The left box shows the

AdNNs’ design philosophy. That is, AdNNs consume more energy

for detecting images with further complexity. However, when the

third image in the left box is perturbed with minimal perturbations

and becomes the rightmost one, AdNNs’ inference energy consump-

tion will increase signicantly (from 30

𝑗

to 68

𝑗

). We refer to such

additional computation as redundant computation or performance

degradation.

2.3 Performance & Computational Complexity

In this section, we describe the relationship between hardware-

dependent performance metrics and DNN computational complex-

ity. Although many metrics can reect DNN performance, we

chose latency and energy consumption as hardware-dependent

performance metrics because of their critical nature for real-time

embedded systems [

]. Measuring hardware-dependent per-

formance metrics (e.g., latency, energy consumption) usually re-

quires many repeated experiments, which is costly. Hence, exist-

ing work [

] proposes to apply oating point

operations (FLOPs) to represent DNN computational complexity.

However, a recent study [

] demonstrates that simply lowering

DNN computational complexity (FLOPs) does not always improve

DNN runtime performance. This is because modern hardware plat-

forms usually apply parallelism to handle DNN oating-point op-

erations (FLOPs). Parallelism can accelerate computation within

layers, while each DNN layer is computed sequentially. Thus, For

two DNNs with the same total FLOPs, dierent FLOPs allocating

strategies will result in dierent parallelism utilization and dierent

DNN model performance. However, for AdNNs, each layer/block

usually has a similar structure and FLOPs [

]. Thus

the parallelism utilization is similar for each block. Because paral-

lelism can not accelerate computation between blocks, increasing

the number of computational blocks/layers will degrade AdNNs’

performance. To further understand the relation between AdNNs’

FLOPs and AdNNs’ model performance, we conduct a study in §3.

3 PRELIMINARY STUDY

3.1 Study Approach

Our intuition is to explore the worst computational complexity of

an algorithm or model. For AdNNs, the basic computation are the

oating-point operations (FLOPs). Thus, we made an assumption

that the FLOPs count of an AdNN is a hardware-independent metric

to approximate AdNN performance. To validate such an assumption,

30 + 38

Redundant !!!

Simple Complex

10 20 30 40 50 60

Image

Energy (j)

Perturbed Image

Figure 2: Left Box shows that AdNNs allocate dierent com-

putational resources for images with dierent semantic

complexity; rights box shows that perturbed image could

trigger redundant computation and cause energy surge.

we conduct an empirical study. Specically, we compute the Pearson

Product-moment Correlation Co-ecient (PCCs) [

] between AdNN

FLOPs against AdNN latency and energy consumption. PCCs are

widely used in statistical methods to measure the linear correlation

between two variables. PCCs are normalized covariance measure-

ments, ranging from -1 to 1. Higher PCCs indicate that the two

variables are more positively related. If the PCCs between FLOPs

against system latency and system energy consumption are both

high, then we validate our assumption.

3.2 Study Model & Dataset

We select subjects (e.g., model,dataset) following policies below.

•The selected subjects are publicly available.

•The selected subjects are widely used in existing work.

•

The selected dataset and models should be diverse from dierent

perspectives. e.g.,, the selected models should include both early-

termination and conditional-skipping AdNNs.

We select ve popular model-dataset combinations used for image

classication tasks as our experimental subjects. The dataset and the

corresponding model are listed in Table 1. We explain the selected

datasets and corresponding models below.

Datasets.

CIFAR-10 [

] is a database for object recognition. There

is a total of ten object classes for this dataset, and the image size of

the image in CIFAR-10 is 32

32. CIFAR-10 contains 50,000 training

images and 10,000 testing images. CIFAR-100 [

] is similar to

CIFAR-10 [

] but with 100 classes. It also contains 50,000 training

images and 10,000 testing images. SVHN [

] is a real-world image

dataset obtained from house numbers in Google Street View images.

There are 73257 training images and 26032 testing images in SVHN.

Models.

For CIFAR-10 dataset, we use SkipNet [

] and BlockDrop

[

] models. SkipNet applies reinforcement learning to train DNNs

to skip unnecessary blocks, and BlockDrop trains a policy network

to activate partial blocks to save computation costs. We download

trained SkipNet and BlockDrop from the authors’ websites. For

CIFAR-100 dataset, we use RaNet [

] and DeepShallow [

] mod-

els for evaluation. DeepShallow adaptive scales DNN depth, while

RaNet scales both input resolution and DNN depth to balance ac-

curacy and performance. For SVHN dataset, DeepShallow [

] is

used for evaluation. For RaNet [

] and DeepShallow [

] archi-

tecture, the author does not release the trained model weights but

open-source their training codes. Therefore, we follow the authors’

instructions to train the model weights.

3.3 Study Process

We begin by evaluating each model’s computational complexity on

the original hold-out test dataset. After that, we deploy the AdNN

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeepPerform:AnEfficientApproachforPerformanceTestingofResource-ConstrainedNeuralNetworksSiminChensimin.chen@UTDallas.eduUTDallasDallas,USAMirazulHaquemirazul.haque@utdallas.eduUTDallasDallas,USACongLiucongl@ucr.eduUCRiversideRiverside,USAWeiYangwei.yang@utdallas.eduUTDallasDallas,USAABSTRACTToday,an...

展开>> 收起<<

DeepPerform An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DeepPerform An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: