DeepPerform An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks

2025-05-06 0 0 1.73MB 13 页 10玖币
侵权投诉
DeepPerform: An Eicient Approach for Performance Testing of
Resource-Constrained Neural Networks
Simin Chen
simin.chen@UTDallas.edu
UT Dallas
Dallas, USA
Mirazul Haque
mirazul.haque@utdallas.edu
UT Dallas
Dallas, USA
Cong Liu
congl@ucr.edu
UC Riverside
Riverside, USA
Wei Yang
wei.yang@utdallas.edu
UT Dallas
Dallas, USA
ABSTRACT
Today, an increasing number of Adaptive Deep Neural Networks
(AdNNs) are being used on resource-constrained embedded devices.
We observe that, similar to traditional software, redundant com-
putation exists in AdNNs, resulting in considerable performance
degradation. The performance degradation is dependent on the in-
put and is referred to as input-dependent performance bottlenecks
(IDPBs). To ensure an AdNN satises the performance requirements
of resource-constrained applications, it is essential to conduct per-
formance testing to detect IDPBs in the AdNN. Existing neural
network testing methods are primarily concerned with correctness
testing, which does not involve performance testing. To ll this
gap, we propose
DeepPerform
, a scalable approach to generate
test samples to detect the IDPBs in AdNNs. We rst demonstrate
how the problem of generating performance test samples detecting
IDPBs can be formulated as an optimization problem. Following
that, we demonstrate how
DeepPerform
eciently handles the op-
timization problem by learning and estimating the distribution of
AdNNs’ computational consumption. We evaluate
DeepPerform
on
three widely used datasets against ve popular AdNN models. The
results show that
DeepPerform
generates test samples that cause
more severe performance degradation (FLOPs: increase up to 552%).
Furthermore,
DeepPerform
is substantially more ecient than the
baseline methods in generating test inputs (runtime overhead: only
6–10 milliseconds).
CCS CONCEPTS
Software and its engineering Software notations and
tools;Computing methodologies Machine learning.
KEYWORDS
Machine learning, software testing, performance analysis
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
ASE ’22, October 10–14, 2022, Rochester, MI, USA
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9475-8/22/10.
https://doi.org/10.1145/3551349.3561158
ACM Reference Format:
Simin Chen, Mirazul Haque, Cong Liu, and Wei Yang. 2022.
DeepPerform
:
An Ecient Approach for Performance Testing of Resource-Constrained
Neural Networks. In 37th IEEE/ACM International Conference on Automated
Software Engineering (ASE ’22), October 10–14, 2022, Rochester, MI, USA. ACM,
New York, NY, USA, 13 pages. https://doi.org/10.1145/3551349.3561158
1 INTRODUCTION
Deep Neural Networks (DNNs) have shown potential in many ap-
plications, such as image classication, image segmentation, and ob-
ject detection [
9
,
20
,
46
]. However, the power of using DNNs comes
at substantial computational costs [
19
,
30
,
34
,
47
,
54
]. The costs,
especially the inference-time cost, can be a concern for deploying
DNNs on resource-constrained embedded devices such as mobile
phones and IoT devices. To enable deploying DNNs on resource-
constrained devices, researchers propose a series of Adaptive Neural
Networks (AdNNs) [
2
,
12
,
14
,
23
,
49
,
51
]. AdNNs selectively activate
partial computation units (e.g., convolution layer, fully connected
layer) for dierent inputs rather than whole units for computation.
The partial unit selection mechanism enables AdNNs to achieve
real-time prediction on resource-constrained devices.
Similar to the traditional systems [
55
], performance bottlenecks
also exist in AdNNs. Among the performance bottlenecks, some of
them can be detected only when given specic input values. Hence,
these problems are referred to as input-dependent performance
bottlenecks (IDPBs). Some IDPBs will cause severe performance
degradation and result in catastrophic consequences. For example,
consider an AdNN deployed on a drone for obstacle detection. If
AdNNs’ energy consumption increases ve times suddenly for spe-
cic inputs, it will make the drone out of battery in the middle of a
trip. Because of these reasons, conducting performance testing to
nd IDPB is a crucial step before AdNNs’ deployment process.
However, to the best of our knowledge, most of the existing work
for testing neural networks are mainly focusing on correctness test-
ing, which can not be applied to performance testing. The main
dierence between correctness testing and performance testing is
that correctness testing aims to detect models’ incorrect classica-
tions; while the performance testing is to nd IDPBs that trigger
performance degradation. Because incorrect classications may
not lead to performance degradation, existing correctness testing
methods can not be applied for performance testing. To ll this
gap and accelerate the process of deploying neural networks on
arXiv:2210.05370v2 [cs.LG] 20 Oct 2022
ASE ’22, October 10–14, 2022, Rochester, MI, USA Simin Chen, Mirazul Haque, Cong Liu, and Wei Yang
resource-constrained devices, there is a strong need for an auto-
mated performance testing framework to nd IDPBs.
We identify two main challenges in designing such a perfor-
mance testing framework. First, traditional performance metrics (e.g.,
latency, energy consumption) are hardware-dependent metrics.
Measuring these hardware-dependent metrics requires repeated
experiments because of the system noises. Thus, directly applying
these hardware-dependent metrics as guidelines to generate test
samples would be inecient. Second, AdNNs’ performance adjust-
ment strategy is learned from datasets rather than conforming to
logic specications (such as relations between model inputs and
outputs). Without a logical relation between AdNNs’ inputs and
AdNNs’ performance, it is challenging to search for inputs that can
trigger performance degradation in AdNNs.
To address the above challenges, we propose
DeepPerform
, which
enables ecient performance testing for AdNNs by generating test
samples that trigger IDPBs of AdNNs (
DeepPerform
focuses on
the performance testing of latency degradation and energy con-
sumption degradation as these two metrics are critical for per-
formance testing [
3
,
49
]). To address the rst challenge, we rst
conduct a preliminary study (§3) to illustrate the relationship be-
tween computational complexity (FLOPs) and hardware-dependent
performance metrics (latency, energy consumption). We then trans-
fer the problem of degrading system performance into increasing
AdNNs’ computational complexity (Eq.(3)). To address the second
challenge, we apply the a paradigm similar to Generative Adver-
sarial Networks (GANs) to design
DeepPerform
. In the training
process,
DeepPerform
learns and approximates the distribution of
the samples that require more computational complexity. After
DeepPerform
is well trained,
DeepPerform
generates test samples
that activate more redundant computational units in AdNNs. In
addition, because
DeepPerform
does not require backward prop-
agation during the test sample generation phase,
DeepPerform
generates test samples much more eciently, thus more scalable
for comprehensive testing on large models and datasets.
To evaluate
DeepPerform
, we select ve widely-used model-
dataset pairs as experimental subjects and explore following four
perspectives: eectiveness,eciency,coverage, and sensitivity. First,
to evaluate the eectiveness of the performance degradation caused
by test samples generated by
DeepPerform
, we measure the in-
crease in computational complexity (FLOPs) and resource con-
sumption (latency, energy) caused by the inputs generated by
DeepPerform
. For measuring eciency, we evaluate the online
time-overheads and total time-overheads of
DeepPerform
in gen-
erating dierent scale samples for dierent scale experimental sub-
jects. For coverage evaluation, we measure the computational units
covered by the test inputs generated by
DeepPerform
. For sensitiv-
ity measurement, we measure how
DeepPerform
’s eectiveness is
dependent on the ADNNs’ congurations and hardware platforms.
The experimental results show that
DeepPerform
generated inputs
increase AdNNs’ computational FLOPs up to 552%, with 6-10 mil-
liseconds overheads for generating one test sample. We summarize
our contribution as follows:
Approach.
We propose a learning-based approach
1
, namely
DeepPerform
, to learn the distribution to generate the test
1https://github.com/SeekingDream/DeepPerform
samples for performance testing. Our novel design enables
generating test samples more eciently, thus enable scalable
performance testing.
Evaluation.
We evaluate
DeepPerform
on ve AdNN mod-
els and three datasets. The evaluation results suggest that
DeepPerform
nds more severe diverse performance bugs
while covering more AdNNs’ behaviors, with only 6-10 mil-
liseconds of online overheads for generating test inputs.
Application.
We demonstrate that developers could benet
from
DeepPerform
. Specically, developers can use the test
samples generated by
DeepPerform
to train a detector to
lter out the inputs requiring high abnormal computational
resources (§6).
2 BACKGROUND
2.1 AdNNs’ Working Mechanisms
DNN
Block
DNN
Block
DNN
Block
Computing
Unit
Computing
Unit
DNN
Block
DNN
Block
DNN
Block
Computing
Unit
Computing
Unit
Not
Used
Not
Used
Not
Used
Final
O/P
(a) Conditinal-skipping AdNNs
(b) Early-termination AdNNs
Figure 1: Working mechanism of AdNNs
The main objective of AdNNs [
5
,
12
,
14
,
24
,
29
,
35
,
41
,
44
,
49
,
52
] is to
balance performance and accuracy. As shown in Fig. 2, AdNNs will
allocate more computational resources to inputs with more complex
semantics. AdNNs use intermediate outputs to deactivate specic
components of neural networks, thus reducing computing resource
consumption. According to the working mechanism, AdNNs can
be divided mainly into two types: Conditional-skipping AdNNs and
Early-termination AdNNs, as shown in Fig. 1. Conditional-skipping
AdNNs skip specic layers/blocks if the intermediate outputs pro-
vided by specied computing units match predened criteria.
2
(in
the case of ResNet). The working mechanism of the conditional-
skipping AdNN can be formulated as:
(𝐼𝑛𝑖+1=𝑂𝑢𝑡𝑖,if 𝐵𝑖(𝑥) 𝜏𝑖
𝑂𝑢𝑡𝑖+1=𝑂𝑢𝑡𝑖,otherwise (1)
where
𝑥
is the input,
𝐼𝑛𝑖
represents the input of
𝑖𝑡
layer,
𝑂𝑢𝑡𝑖
represents the output of
𝑖𝑡
layer,
𝐵𝑖
represents the specied com-
puting unit output of
𝑖𝑡
layer and
𝜏𝑖
is the congurable threshold
that decides AdNNs’ performance-accuracy trade-o mode. Early-
termination AdNNs terminate computation early if the intermediate
2
a block consists of multiple layers whose output is determined by adding the output
of the last layer and input to the block.
DeepPerform: An Eicient Approach for Performance Testing of Resource-Constrained Neural Networks ASE ’22, October 10–14, 2022, Rochester, MI, USA
outputs satisfy a particular criteria. The working mechanism of
early-termination AdNNs can be formulated as,
(𝐸𝑥𝑖𝑡𝑁 𝑁 (𝑥)=𝐸𝑥𝑖𝑡𝑖(𝑥),if 𝐵𝑖(𝑥) 𝜏𝑖
𝐼𝑛𝑖+1(𝑥)=𝑂𝑢𝑡𝑖(𝑥),otherwise (2)
2.2 Redundant Computation
In a software program, if an operation is not required but performed,
we term the operation as redundant operation. For Adaptive Neural
Networks, if a component is activated without aecting AdNNs’
nal predictions, we dene the computation as a redundant com-
putation. AdNNs are created based on the philosophy that all the
inputs should not require all DNN components for inference. For
example, we can refer to the images in Fig. 2. The left box shows the
AdNNs’ design philosophy. That is, AdNNs consume more energy
for detecting images with further complexity. However, when the
third image in the left box is perturbed with minimal perturbations
and becomes the rightmost one, AdNNs’ inference energy consump-
tion will increase signicantly (from 30
𝑗
to 68
𝑗
). We refer to such
additional computation as redundant computation or performance
degradation.
2.3 Performance & Computational Complexity
In this section, we describe the relationship between hardware-
dependent performance metrics and DNN computational complex-
ity. Although many metrics can reect DNN performance, we
chose latency and energy consumption as hardware-dependent
performance metrics because of their critical nature for real-time
embedded systems [
3
,
49
]. Measuring hardware-dependent per-
formance metrics (e.g., latency, energy consumption) usually re-
quires many repeated experiments, which is costly. Hence, exist-
ing work [
12
,
14
,
29
,
35
,
41
,
52
] proposes to apply oating point
operations (FLOPs) to represent DNN computational complexity.
However, a recent study [
43
] demonstrates that simply lowering
DNN computational complexity (FLOPs) does not always improve
DNN runtime performance. This is because modern hardware plat-
forms usually apply parallelism to handle DNN oating-point op-
erations (FLOPs). Parallelism can accelerate computation within
layers, while each DNN layer is computed sequentially. Thus, For
two DNNs with the same total FLOPs, dierent FLOPs allocating
strategies will result in dierent parallelism utilization and dierent
DNN model performance. However, for AdNNs, each layer/block
usually has a similar structure and FLOPs [
12
,
14
,
34
,
52
]. Thus
the parallelism utilization is similar for each block. Because paral-
lelism can not accelerate computation between blocks, increasing
the number of computational blocks/layers will degrade AdNNs’
performance. To further understand the relation between AdNNs’
FLOPs and AdNNs’ model performance, we conduct a study in §3.
3 PRELIMINARY STUDY
3.1 Study Approach
Our intuition is to explore the worst computational complexity of
an algorithm or model. For AdNNs, the basic computation are the
oating-point operations (FLOPs). Thus, we made an assumption
that the FLOPs count of an AdNN is a hardware-independent metric
to approximate AdNN performance. To validate such an assumption,
30 + 38
Redundant !!!
Simple Complex
10 20 30 40 50 60
Image
Energy (j)
Perturbed Image
Figure 2: Left Box shows that AdNNs allocate dierent com-
putational resources for images with dierent semantic
complexity; rights box shows that perturbed image could
trigger redundant computation and cause energy surge.
we conduct an empirical study. Specically, we compute the Pearson
Product-moment Correlation Co-ecient (PCCs) [
40
] between AdNN
FLOPs against AdNN latency and energy consumption. PCCs are
widely used in statistical methods to measure the linear correlation
between two variables. PCCs are normalized covariance measure-
ments, ranging from -1 to 1. Higher PCCs indicate that the two
variables are more positively related. If the PCCs between FLOPs
against system latency and system energy consumption are both
high, then we validate our assumption.
3.2 Study Model & Dataset
We select subjects (e.g., model,dataset) following policies below.
The selected subjects are publicly available.
The selected subjects are widely used in existing work.
The selected dataset and models should be diverse from dierent
perspectives. e.g.,, the selected models should include both early-
termination and conditional-skipping AdNNs.
We select ve popular model-dataset combinations used for image
classication tasks as our experimental subjects. The dataset and the
corresponding model are listed in Table 1. We explain the selected
datasets and corresponding models below.
Datasets.
CIFAR-10 [
25
] is a database for object recognition. There
is a total of ten object classes for this dataset, and the image size of
the image in CIFAR-10 is 32
×
32. CIFAR-10 contains 50,000 training
images and 10,000 testing images. CIFAR-100 [
25
] is similar to
CIFAR-10 [
25
] but with 100 classes. It also contains 50,000 training
images and 10,000 testing images. SVHN [
36
] is a real-world image
dataset obtained from house numbers in Google Street View images.
There are 73257 training images and 26032 testing images in SVHN.
Models.
For CIFAR-10 dataset, we use SkipNet [
52
] and BlockDrop
[
53
] models. SkipNet applies reinforcement learning to train DNNs
to skip unnecessary blocks, and BlockDrop trains a policy network
to activate partial blocks to save computation costs. We download
trained SkipNet and BlockDrop from the authors’ websites. For
CIFAR-100 dataset, we use RaNet [
56
] and DeepShallow [
24
] mod-
els for evaluation. DeepShallow adaptive scales DNN depth, while
RaNet scales both input resolution and DNN depth to balance ac-
curacy and performance. For SVHN dataset, DeepShallow [
24
] is
used for evaluation. For RaNet [
56
] and DeepShallow [
24
] archi-
tecture, the author does not release the trained model weights but
open-source their training codes. Therefore, we follow the authors’
instructions to train the model weights.
3.3 Study Process
We begin by evaluating each model’s computational complexity on
the original hold-out test dataset. After that, we deploy the AdNN
摘要:

DeepPerform:AnEfficientApproachforPerformanceTestingofResource-ConstrainedNeuralNetworksSiminChensimin.chen@UTDallas.eduUTDallasDallas,USAMirazulHaquemirazul.haque@utdallas.eduUTDallasDallas,USACongLiucongl@ucr.eduUCRiversideRiverside,USAWeiYangwei.yang@utdallas.eduUTDallasDallas,USAABSTRACTToday,an...

展开>> 收起<<
DeepPerform An Efficient Approach for Performance Testing of Resource-Constrained Neural Networks.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:13 页 大小:1.73MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注