Putting Them under Microscope A Fine-Grained Approach for Detecting Redundant Test Cases in Natural Language

2025-05-02 0 0 1.34MB 12 页 10玖币
侵权投诉
Puing Them under Microscope: A Fine-Grained Approach for
Detecting Redundant Test Cases in Natural Language
Zhiyuan Chang
Mingyang Li
{zhiyuan2019,mingyang2017}@iscas.ac.cn
Laboratory for Internet Software Technologies,
Institute of Software Chinese Academy of Sciences
Beijing, China
University of Chinese Academy of Sciences
,
Beijing
,
China
Junjie Wang
junjie@iscas.ac.cn
Laboratory for Internet Software Technologies,
Institute of Software Chinese Academy of Sciences
Beijing, China
University of Chinese Academy of Sciences
,
Beijing
,
China
Qing Wang
wq@iscas.ac.cn
Laboratory for Internet Software Technologies,
Institute of Software Chinese Academy of Sciences
Beijing, China
State Key Laboratory of Computer Science,
Institute of Software Chinese Academy of Sciences
Beijing, China
University of Chinese Academy of Sciences
,
Beijing
,
China
Shoubin Li
shoubin@iscas.ac.cn
Laboratory for Internet Software Technologies,
Institute of Software Chinese Academy of Sciences
Beijing, China
University of Chinese Academy of Sciences
,
Beijing
,
China
ABSTRACT
Natural language (NL) documentation is the bridge between soft-
ware managers and testers, and NL test cases are prevalent in
system-level testing and other quality assurance activities. Due
to reasons such as requirements redundancy, parallel testing, and
tester turnover within long evolving history, there are inevitably
lots of redundant test cases, which signicantly increase the cost.
Previous redundancy detection approaches typically treat the tex-
tual descriptions as a whole to compare their similarity and suer
from low precision. Our observation reveals that a test case can
have explicit test-oriented entities, such as tested function Compo-
nents, Constraints, etc; and there are also specic relations between
these entities. This inspires us with a potential opportunity for
accurate redundancy detection. In this paper, we rst dene ve
test-oriented entity categories and four associated relation cate-
gories and re-formulate the NL test case redundancy detection
problem as the comparison of detailed testing content guided by
the test-oriented entities and relations. Following that, we propose
Tscope
, a ne-grained approach for redundant NL test case de-
tection by dissecting test cases into atomic test tuple(s) with the
entities restricted by associated relations. To serve as the test case
dissection,
Tscope
designs a context-aware model for the automatic
entity and relation extraction. Evaluation on 3,467 test cases from
Both authors contributed equally to this research.
Corresponding authors.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ESEC/FSE ’22, November 14–18, 2022, Singapore, Singapore
©2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9413-0/22/11.
https://doi.org/10.1145/3540250.3549089
ten projects shows
Tscope
could achieve 91.8% precision, 74.8%
recall, and 82.4% F1, signicantly outperforming state-of-the-art
approaches and commonly-used classiers. This new formulation
of the NL test case redundant detection problem can motivate the
follow-up studies to further improve this task and other related
tasks involving NL descriptions.
CCS CONCEPTS
Software and its engineering Software testing and de-
bugging;Acceptance testing.
KEYWORDS
Test Case Redundancy, Entity and Relation Extraction, Natural
Language Processing
ACM Reference Format:
Zhiyuan Chang, Mingyang Li, Junjie Wang, Qing Wang, and Shoubin Li.
2022. Putting Them under Microscope: A Fine-Grained Approach for De-
tecting Redundant Test Cases in Natural Language. In Proceedings of the
30th ACM Joint European Software Engineering Conference and Sympo-
sium on the Foundations of Software Engineering (ESEC/FSE ’22), Novem-
ber 14–18, 2022, Singapore, Singapore. ACM, New York, NY, USA, 12 pages.
https://doi.org/10.1145/3540250.3549089
1 INTRODUCTION
Software testing is an activity to ensure that an entire system meets
its requirements [
5
]. In the testing phase, testers need to analyze the
requirements specication, identify all the test execution scenarios,
and then instantiate them in manually written test cases [
54
]. Such
test cases are typically described in natural language (NL). Due to
their adjustability and interpretability, the NL test cases are still
prevalent in industrial practice [32].
A requirement covers multiple features, and there may be over-
lapping features among requirements. For a large software project,
arXiv:2210.01661v1 [cs.SE] 4 Oct 2022
ESEC/FSE ’22, November 14–18, 2022, Singapore, Singapore Zhiyuan Chang, Mingyang Li, Junjie Wang, Qing Wang, and Shoubin Li
the requirements are typically tested by dierent engineers, and en-
gineers are not aware of the feature overlapping. Test redundancy
may produce when each test engineer individually designs test
case(s) for assigned requirements [
14
,
36
]. As the system evolves,
the redundant test cases signicantly increase the cost of testing,
as well as maintenance eort[
36
]. The problem is especially obvi-
ous in the manual testing scenario where human testers must read
through test steps and carry them out manually by interacting with
the system [22].
To alleviate the issue, information retrieval-based approaches
have been proposed to automatically detect redundancy among the
NL test cases [
32
,
49
,
53
]. The general idea is to vectorize the de-
scription of the test case with text representing models, e.g., vector
space model or Doc2Vec, and conduct the similarity comparison
on it. However, these existing approaches suer from low accuracy
because they treat test cases’ textual descriptions as a whole, and
thus can not capture its ne-grained semantic information and
inherent meaning. Meanwhile, we have the following two observa-
tions which can facilitate the similarity comparison and redundancy
detection of the NL test case.
First, the test case has explicit categories of test-oriented
entities which can facilitate accurate redundancy detection.
Take Figure 1as an example, the two test cases look similar in their
textual descriptions, and would be detected as redundancy with the
aforementioned information retrieval-based approaches. However,
if putting these two test cases under the microscope, we can nd
that the executing manners of these two test cases (“mesa-util tool”
and “UnixBench tool”) are dierent, based on which, we can distin-
guish them accurately. More than that, one can easily observe that
there are dierent categories of test-oriented entities, for example,
“gear rotation processing” is the tested functional component, while
“when drawing 3D graphics” is the pre-conditions for executing the
test case. Only when the specic categories of test-oriented entities
are mapped, can the two test cases be determined as redundant.
Taken in this sense, this paper aims at identifying the test-oriented
entities to facilitate the accurate detection of NL test cases.
Figure 1: Non-redundant test cases with similar descriptions
Second, there might be multiple test-oriented entities that
need to be carefully parsed and matched to ensure accurate
redundancy detection.
The rst observation has motivated us to
conduct the comparison within the same category of test-oriented
entities for determining redundancy. However, when we put the
two test cases in Figure 2under the microscope, a second observa-
tion is made. There are both testing Behavior “browse” and tested
Component “visit history” in these two test cases, yet they are ex-
pressing dierent test-oriented operational information. In detail, in
test case #346, the Behavior “browse” is targeted at Component “con-
tent of each resource diretory”, and the Component “visit history”
is associated with the Behavior “switch”, while in test case #525
Behavior “browse” is directly for Component “visit history”. The
observation implies that the multiple test-oriented entities need to
be carefully parsed and matched, and it is necessary to identify the
test-oriented operational information, i.e., entities and associated
relations when analyzing test cases to achieve accurate redundancy
detection.
Figure 2: Non-redundant test cases with multiple test-
oriented entities
Motivated by the two ndings, we dene ve test-oriented en-
tity categories, i.e., Component,Behavior,Prerequisite,Manner and
Constraint, and four relation categories associated with the enti-
ties. We then re-formulate the NL test case redundancy detection
problem as the comparison of detailed testing content guided by
the test-oriented entities and relations.
Following that, we propose a ne-grained redundant test case de-
tection approach
Tscope1
, which dissects the test case into atomic
test tuple(s) with the ve entities restricted by their associated rela-
tions, and conducts the comparison on them. One example test tuple
dissected from Test case #525 in Figure 2is as follows, Behavior
“browse, component “visit history” and Manner “mouse. To achieve
this,
Tscope
rst designs a context-aware model for extracting test-
oriented entities and relations from test case descriptions, which
considers the global context of the test case for entity extraction,
and the local context of the involving entities for relation extraction.
After that,
Tscope
dissects each test case into the structured atomic
testing tuple(s) guided by the extracted entities and relations. Fi-
nally,
Tscope
detects redundancy by comparing the entities in each
tuple pair, considering the semantic meaning of the entities as well
as their involved indicative words.
We evaluate
Tscope
on 3,467 test cases from ten projects. The
evaluation results show that
Tscope
could reach 97.5% precision,
94.8% recall for the entity extraction, and 90.4% precision, 97.6%
recall for the relation extraction, which signicantly outperforms
two state-of-the-art approaches. For the redundancy detection task,
Tscope
could achieve 91.8% precision, 74.8% recall and 82.4% F1.
Compared with the two state-of-the-art redundancy detection ap-
proaches and four commonly-used classiers,
Tscope
is 19.8%-23.4%
higher in F1. Moreover, the results of ablation experiments show
that the ve entity categories all play signicant roles in Tscope.
The new formulation of the NL test case redundant detection
problem can motivate the follow-up studies to further improve this
task, and other related tasks involving NL descriptions. Actually,
there are several tasks in software engineering domain involving
the similarity comparison of two textual documents, e.g., duplicate
test reports detection [
24
,
25
], similar Stack Overow questions
identication [
55
], duplicate requirements detection [
38
], etc. The
previous techniques typically treat the textual descriptions as a
1
We name our approach as
Tscope
considering it likes a microscope to inspect the
detailed information in test cases to facilitate the redundant detection.
Puing Them under Microscope: A Fine-Grained Approach for Detecting Redundant Test Cases in Natural LanguageESEC/FSE ’22, November 14–18, 2022, Singapore, Singapore
whole for the similarity comparison, while ignoring the ne-grained
semantic information hidden in the text. The new formulation
proposed in this paper, i.e., comparison of detailed content guided
by the scenario-related entities and relations, could potentially
motivate the researchers in these related elds.
In summary, the key contributions of this paper are as follows:
The new formulation of the NL test case redundancy detec-
tion problem, i.e., the comparison of detailed testing content
guided by the test-oriented entities and relations.
A ne-grained redundancy detection approach
Tscope
for
NL test cases, which dissects the test case into atomic test
tuple(s) with the ve entities restricted by their associated
relations, and conducts the comparison on them.
A context-aware model for extracting test-oriented entities
and their relations from test case descriptions, which in-
volves the global context of the test case in entity extraction,
and the local context of the involved entities for relation
extraction.
Evaluation with 3,467 test cases from ten projects, with
promising results. We also publicize the source code
2
for
facilitating follow-up studies and other related tasks.
The remainders of the paper are as follows: Section 2presents the
empirical studies of the entity category for redundancy detection.
Section 3elaborates the approach. Section 4presents the experi-
ment design. Section 5describes the results. Section 6discusses
the learned lessons. Section 7introduces the related work and its
limitations. Section 8concludes our work.
2 EMPIRICAL ANALYSIS OF ENTITIES AND
RELATIONS
2.1 Categories of Entities and Relations
Motivated by the observations in Section 1, we provide a new for-
mulation of the NL test case redundancy detection problem, i.e., the
comparison of detailed testing content guided by the test-oriented
entities and relations. To achieve this, we dene ve categories of
entities and four categories of relations associated with the entities.
Specically, we explore entity and relation categories through a
bottom-up analysis approach. Specically, three researchers (details
in Section 4.2) are involved in mining the categories of entities and
relations that aect redundancy detection in the test case text. If all
three researchers agree on adding a category, this entity category is
admitted and added to the entity category set. While if their views
diverge, the decision is made through a voting mechanism, i.e., the
entity category will be added to the set if it is admitted by at least
two researchers. Finally, we obtain the ve entity categories and
corresponding relations among the entity categories. Table 1shows
each entity/relation category and examples.
2.1.1
Categories of Entities
.The denition of the ve entity
categories is based on the purpose and basics of software testing,
as well as the observations on NL test cases. First, test cases are
driven by the feature(s) in requirements, and a feature species the
behavior
of one or more
components
in terms of their current
conditions
[
9
]. Taken in this sense, the key entities in a feature will
2https://github.com/czycurefun/testcase_detection
also be reected in the test case descriptions. Therefore, we identify
three entity categories “Component”, “Behavior” and “Prerequisite”
respectively.
Second, according to our observations, test cases dier by the
Manner sometimes. For example, there are descriptions of two non-
redundant test cases in Figure 1. The two test cases have the same
Prerequisite (“When drawing 3D graphics”) and Component (“gear
rotation processing”), but dierent operation manner (“mesa-util
tool” and “UnixBench tool”). To reect this dierence, we dene an
entity category “Manner”.
Third, in some cases, test cases may dier by the satised con-
straints. For example, there are two descriptions, “Test there are
preset applications after the system installation” and “Test the preset
applications including FTP application after the system installation”.
The two test cases have the same Component (“preset applications”)
but the latter additionally involves the constraint (“including FTP
application”). Accordingly, we dene an entity category “Constraint
to indicate the dierence.
2.1.2
Categories of Relations
.As shown in Figure 2, there may
be multiple test-oriented entities per entity category within a test
case, which implies the need for inspecting the entities within the
test case a step further. Taking Test Case #346 in Figure 2as an
example, Behavior “browse” is targeting at Components “contents
of each resource diretory”, and Behavior “switch” is acting on Com-
ponents “visit history”. This demonstrates the mapping between
Components and Behavior, and we dene it as the Act relation.
We also observe the relations in terms of the other three cate-
gories of entities, e.g., the executing manner of the testing. And
considering the components in the test case are the basic object of
the testing content, we dene other three relations between Com-
ponent and Prerequisite,Manner,Constraint to indicate the detailed
information of the testing (details in Table 1).
2.2 Correlation Analysis
We conduct an empirical study to investigate the eectiveness of
the entity categories for redundancy detection. Specically, we
randomly sample 5,000 test case pairs and manually label each test
case by comparing each pair.
3
Then, we build ve Boolean variables
by manual judgment, i.e.,
𝐸𝑄𝑐𝑜𝑚
,
𝐸𝑄𝑏𝑒ℎ
,
𝐸𝑄𝑝𝑟𝑒
,
𝐸𝑄𝑚𝑎𝑛
and
𝐸𝑄𝑐𝑜𝑛
.
Each variable represents the entities belonging to each category
in the summaries are manually judged as equivalent. At the same
time, a variable
𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡
is built according to the redundancy
label (not based on entity comparison), representing whether a test
case is truly redundant.
We analyze the correlation between the above ve variables
and the variable
𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡
. Table 2shows the Pearson correlation
coecient and p-value of the correlation test. The results show
that the ve entity categories are signicantly correlated to the
variable
𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡
, which indicates the eectiveness of each en-
tity category for redundancy detection. Moreover, we analyze the
consistency of the two variables, i.e.,
𝐸𝑄𝑎𝑙𝑙
and
𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡
, where
𝐸𝑄𝑎𝑙𝑙
represents that the entities belonging to the ve entity cate-
gories in the test case pair are all equivalent by manual comparison.
Cohen
´
kappa coecient is 0.984, which shows the signicant consis-
tency of the two distributions. The results indicate that redundant
3
The test case pairs are built from the dataset in Table 4. The pairing and labeling
processes are consistent with the descriptions in Section 4.2.
摘要:

PuttingThemunderMicroscope:AFine-GrainedApproachforDetectingRedundantTestCasesinNaturalLanguageZhiyuanChang∗MingyangLi∗{zhiyuan2019,mingyang2017}@iscas.ac.cnLaboratoryforInternetSoftwareTechnologies,InstituteofSoftwareChineseAcademyofSciencesBeijing,ChinaUniversityofChineseAcademyofSciences,Beijing,...

展开>> 收起<<
Putting Them under Microscope A Fine-Grained Approach for Detecting Redundant Test Cases in Natural Language.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:1.34MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注