1 DinoDroid Testing Android Apps Using Deep Q-Networks

2025-04-27 0 0 3.84MB 14 页 10玖币
侵权投诉
1
DinoDroid: Testing Android Apps Using Deep
Q-Networks
Yu Zhao, Member, IEEE, Brent Harrison, Member, IEEE, and Tingting Yu, Member, IEEE
Abstract—The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps).
Developers need to guarantee the quality of mobile apps before it is released to the market. There have been many approaches using
different strategies to test the GUI of mobile apps. However, they still need improvement due to their limited effectiveness. In this paper,
we propose DinoDroid, an approach based on deep Q-networks to automate testing of Android apps. DinoDroid learns a behavior
model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. DinoDroid is able to
capture the fine-grained details of GUI events (e.g., the content of GUI widgets) and use them as features that are fed into deep neural
network, which acts as the agent to guide app exploration. DinoDroid automatically adapts the learned model during the exploration
without the need of any modeling strategies or pre-defined rules. We conduct experiments on 64 open-source Android apps. The
results showed that DinoDroid outperforms existing Android testing tools in terms of code coverage and bug detection.
Index Terms—Mobile Testing, Deep Q-Networks, Reinforcement Learning.
F
1 INTRODUCTION
Mobile applications (apps) have become extremely popu-
lar with about three million apps in Google Play’s app
store [1]. The increase in app complexity has created sig-
nificant concerns about the quality of apps. Also, because
of the rapid releasing cycle of apps and limited human
resources, it is difficult for developers to manually construct
test cases. Therefore, different automated mobile app testing
techniques have been developed and applied [2].
Test cases for mobile apps are often represented by
sequences of GUI events 1to mimic the interactions between
users and apps. The goal of an automated test generator
is generating such event sequences to achieve high code
coverage and/or detecting bugs. A successful test generator
is able to exercise the correct GUI widget on the current app
page, so that when exercising that widget, it can bring the
app to a new page, leading to the exploration of new events.
However, existing mobile app testing tools often explore a
limited set of events because they have limited capability
of understanding which GUI events would expand the
exploration like humans do. This can lead to automated test
generators performing unnecessary actions that are unlikely
to lead to new coverage or detect new bugs.
Many automated GUI testing approaches for mobile
apps have been proposed, such as random testing [3], [4]
Yu Zhao was with the Department of Computer Science and Computer
Science and Cybersecurity, University of Central Missouri, Warrensburg,
MO, 64093.
E-mail: yzhao@ucmo.edu
Brent Harrison was with the Department of Computer Science, University
of Kentucky, Lexington, KY, 40506.
E-mail: harrison@cs.uky.edu.
Tingting Yu (corresponding author) was with the Department of Electrical
Engineering and Computer Science, University of Cincinnati, Cincinnati,
OH, 45221.
E-mail: yutt@ucmail.uc.edu
1. In our setting, an event refers to an executable GUI widget associ-
ated with an action type (e.g., click, scroll, edit, swipe, etc).
and model-based testing [5], [6], [7]. Random testing (e.g.,
Monkey [3]) is popular in testing mobile apps because of
its simplicity and availability. It generates tests by sending
thousands of GUI events per second to the app. While
random testing can sometimes be effective, it is difficult
to explore hard-to-reach events to drive the app to new
pages because of the natural of randomness. Model-based
testing [7], [8] can improve code coverage by employing
pre-defined strategies or rules to guide the app exploration.
For example, A3E [7] employs depth-first search (DFS) to
explore the model of an app under test (AUT) based on
event-flow across app pages. Stoat [8] utilizes a stochastic
Finite State Machine model to describe the behavior of AUT
and then utilizes Markov Chain Monte Carlo sampling [9] to
guide the testing. However, model-based testing often relies
on human-designed models and it is almost impossible to
precisely model an app’s behavior. Also, many techniques
apply pre-defined rules to the model for improving testing.
For example, Stoat [8] designed rules to assign each event
an execution weight in order to speed up exploration. How-
ever, these pre-defined rules are often derived from limited
observations and may not generalize to a wide categories of
apps.
To summarize, the inherent limitation of the above tech-
niques is that they do not automatically understand GUI
layout and the content of the GUI elements, so it is difficult
for them to exercise the most effective events that will bring
the app into new states. Recently, machine learning tech-
niques have been proposed to perform GUI testing in mobile
apps [10], [11], [12], [13]. For example, Humanoid [12] uses
deep learning to learn from human-generated interaction
traces and uses the learned model to guide test generation
as a human tester. However, this approach relies on human-
generated datasets (i.e., interaction traces) to train a model
and needs to combine the model with a set of pre-defined
rules to guide testing.
Reinforcement learning (RL) can teach machine to decide
arXiv:2210.06307v1 [cs.SE] 12 Oct 2022
2
which events to explore rather than relying on pre-defined
models or human-made strategies [14]. A Q-table is used
to record the reward of each event and the information
of previous testing knowledge. The reward function can
be defined based on the differences between pages [11] or
unique activities [10]. Reinforcement learners will learn to
maximize cumulative reward with the goal of achieving
higher code coverage or detecting more bugs.
While existing RL techniques have improved app test-
ing, they focus on abstracting the information of app pages
and then use the abstracted features to train behavior mod-
els for testing [15], [16]. For example, QBE [15], a Q-learning-
based Android app testing tool, abstracts each app page into
five categories based on the number of widgets (e.g., too-
few, few, moderate, many, too-many). The five categories are
used to decide which events to explore. However, existing
RL techniques do not understand the fine-grained informa-
tion of app pages like human testers normally do during
testing, such as the execution frequencies and the content of
GUI widgets. This is due to the limitation of the basic tabular
setting of RL, which requires a finite number of states [17].
Therefore, the learned model may not capture the accurate
behaviors of the app. Also, many RL-based techniques focus
on training each app independently [18], [16], [19], [20], [11]
and thus cannot transfer the model learned from one app to
another.
To address the aforementioned challenges, we propose
a novel approach, DinoDroid, based on deep Q-networks
(DQN). DinoDroid learns a behavior model from a set of
existing apps and the learned model can be used to explore
and generate tests for new apps. During the training pro-
cess, DinoDroid is able to understand and learn the details
of app events by leveraging a deep neural network (DNN)
model [21]. More precisely, we have developed a set of
features taken as input by DinoDroid. The insight of these
features represents what a human tester would do during
the exploration. For example, a human tester may decide
which widget to execute based on its content or how many
times it was executed in the past. DinoDroid does not use
any pre-defined rules or thresholds to tune the parameters
of these features but let the DQN agent learn a behavior
model based on the feature values (represented by vectors)
automatically obtained during training and testing phases.
A key novel component of DinoDroid is a deep neural
network (DNN) model that can process multiple complex
features to predict Q value for each GUI event to guide Q-
learning. With the DNN, DinoDroid can be easily extended
to handle other types of features. Specifically, to test an
app, DinoDroid first trains a set of existing apps to learn a
behavior model. The DNN serves as an agent to compute
Q values used to determine the action (i.e., which event
to execute) at each iteration. In the meantime, DinoDroid
maintains a special event flow graph (EFG) to record and
update the feature vectors, which are used for DNN to
compute Q values.
Because the features are often shared among different
apps, DinoDroid is able to apply the model learned from ex-
isting apps to new AUTs. To do this, the agent continuously
adapts the existing model to the new AUT by generating
new actions to reach the desired testing goal (e.g., code
coverage).
Fig. 1: A Motivating Example
In summary, our paper makes the following contribu-
tions:
An approach to testing Android apps based on deep
Q-learning.
A novel and the first deep Q-learning model that can
process complex features at a fine-grained level.
An empirical study showing that the approach is effec-
tive at achieving higher code coverage and better bug
detection than the state-of-the-art tools.
The implementation of the approach as a publicly
available tool, DinoDroid, along with all experiment
data [22].
2 MOTIVATION AND BACKGROUND
In this section, we first describe a motivating example
of DinoDroid, followed by the background of deep Q-
networks (DQN), the problem formulation, and the discus-
sion of existing work.
2.1 A Motivating Example.
Fig. 1 shows an example of the app lockpatterngenerator [23].
This simple example demonstrates the ideas of DinoDroid,
but the real testing process is much more complex. After
clicking “Minimum length”, a message box pops up with
a textfield and two clickable buttons. Therefore, the current
page of the app has a total of five events (i.e., “restart”,
“back”, “menu”, “OK”, and “Cancel”). The home button is
not considered because it is not specific to the app. When a
human tester encounters this page, he/she needs to decide
which event to execute based on his/her prior experience.
For example, a tester is likely to execute the events that have
never been executed before. The tester may also need to
know the execution context of the current page (e.g., the
layout of next page) to decide which widget to exercise.
In this example, suppose none of the five events on
the current page have been executed before. Intuitively, the
3
tester tends to select the “OK” event to execute because it
is more likely to bring the app to a new page. “Cancel”
is very possible to be the next event to consider because
“restart”, “back”, and “menu” are general events, so the
tester may have already had experience in executing them
when testing other apps. In summary, to decide whether an
event has a higher priority to be executed, the tester may
need to consider its “features”, such as how many times
it was executed (i.e., execution frequency) and the content
of the widget. DinoDroid is able to automatically learn a
behavior model from a set of existing apps based on these
features and the learned model can be used to test new apps.
Tab.(a)-Tab.(c) in Fig. 1 are used to illustrate DinoDroid.
In this example, DinoDroid dynamically records the feature
values for each event, including the execution frequency, the
number of events not executed on the next page (i.e., child
page), and the text on the event associated with the event.
Next, DinoDroid employs a deep neural network to predict
the accumulative reward (i.e., Q value) of each event on
the current page based on the aforementioned features and
selects the event that has the largest Q value to execute.
Tab.(a) shows the feature values and Q values when
the page appears the first time. Since “OK” has the largest
Q value, it is selected for execution. DinoDroid continues
exploring the events on the new page and updating the Q
value. When the second time this page appears, the Q value
of the event on executing “OK” button decreases because it
is already executed. As a result, “Cancel” has the largest Q
value and is selected for execution. In this case (Tab.(b)),
the child page of “OK” contains 10 unexecuted events.
However, suppose the child page contains zero unexecuted
events (Tab.(c)), the Q value becomes much smaller. This
is because DinoDroid tends to select the event whose child
page contains more unexecuted events.
The underlying assumption of our approach is that the
application under test should follow the principle of least
surprise (PLS). If an app does not meet the PLS, e.g.,
an “OK” textual widget is incorrectly associated with the
functionally of “Cancel”, it would mislead DinoDroid when
finding the right events to execute. Specifically, DinoDroid
exploits the learned knowledge to execute correct events
that result in higher code coverage or triggering bugs.
2.2 Background
2.2.1 Q-Learning
Q-learning [24] is a model-free reinforcement learning
method which seeks to learn a behavior policy for any
finite Markov decision process (FMDP), Q-learning finds
an optimal policy, π, that maximizes expected cumulative
reward over a sequence of actions taken. Q-learning is based
on trial-and-error learning in which an agent interacts with
its environment and assigns utility estimates known as Q
values to each state.
As shown in Fig. 2 the agent iteratively interacts with the
outside environment. At each iteration t, the agent selects an
action atAbased on the current state stSand executes
it on the outside environment. After exercising the action,
there is a new state st+1 S, which can be observed by
the agent. In the meantime, an immediate reward rtRis
Fig. 2: Deep Q-Networks
received. Then the agent will update the Q values using the
Bellman equation [25] as follows:
Q(st, at)Q(st, at)+α(rt+γmax
aQ(st+1, a)Q(st, at))
In this equation, αis a learning rate between 0 and 1, γ
is a discount factor between 0 and 1, stis the state at time
t, and atis the action taken at time t. Once learned, these Q
values can be used to determine optimal behavior in each
state by selecting action at= arg maxaQ(st, a).
2.2.2 Deep Q-Networks
Deep Q-networks (DQN) are used to scale the classic Q-
learning to more complex state and action spaces [26], [27].
For the classical Q-learning, Q(st, at)are stored and visited
in a Q table. It can only handle the fully observed, low-
dimensional state and action space. As shown in Fig. 2, in
DQN, a deep neural network (DNN), specifically involving
convolutional neural networks (CNN) [28], is a multi-
layered neural network that for a given state stoutputs Q
values for each action Q(st, a). Because a neural network
can input and output high-dimensional state and action
space, DQN has an ability to scale more complex state
and action spaces. A neural network can also generalize Q
values to unseen states, which is not possible when using a
Q-table. It utilizes the follow loss function [27] to alter the
network to minimize the temporal difference (TD) [29] error
as a loss function loss =rt+γmax
aQ(st+1, a)Q(st, at).
The γis the discount factor which is between 0 and 1. In
other word, with the input of (st, at), the neural network is
trained to predict the Q value as:
Q(st, at) = rt+γmax
aQ(st+1, a)(1)
So in a training sample, the input is (st, at)and output
is the corresponding Q value which can be computed by
rt+γmax
aQ(st+1, a).
2.3 Terminologies
A GUI widget is a graphical element of an app, such as
a button, a text field, and a check box. An event is an
executable GUI widget with a particular event type (e.g.
click, long-click, swipe, edit), so a widget can be associated
with one or more events. In our setting, a state srepresents
an app page (i.e., a set of widgets shown on the current
screen. If the set of widgets is different, we have another
page). We use stto represent the current state and st+1 to
represent the next state. A reward ris calculated based on
the improvement of coverage. If code coverage increases, r
is assigned a positive number (r=5 by default); otherwise, r
is assigned a negative number (r=-2 by default). An Agent
摘要:

1DinoDroid:TestingAndroidAppsUsingDeepQ-NetworksYuZhao,Member,IEEE,BrentHarrison,Member,IEEE,andTingtingYu,Member,IEEEAbstract—Thelargedemandofmobiledevicescreatessignicantconcernsaboutthequalityofmobileapplications(apps).Developersneedtoguaranteethequalityofmobileappsbeforeitisreleasedtothemarket....

展开>> 收起<<
1 DinoDroid Testing Android Apps Using Deep Q-Networks.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:3.84MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注