Putting Them under Microscope A Fine-Grained Approach for Detecting Redundant Test Cases in Natural Language

2025-05-02 1 0 1.34MB 12 页 10玖币

侵权投诉

Puing Them under Microscope: A Fine-Grained Approach for

Detecting Redundant Test Cases in Natural Language

Zhiyuan Chang∗

Mingyang Li∗

{zhiyuan2019,mingyang2017}@iscas.ac.cn

Laboratory for Internet Software Technologies,

Institute of Software Chinese Academy of Sciences

Beijing, China

University of Chinese Academy of Sciences

Beijing

China

Junjie Wang†

junjie@iscas.ac.cn

Laboratory for Internet Software Technologies,

Institute of Software Chinese Academy of Sciences

Beijing, China

University of Chinese Academy of Sciences

Beijing

China

Qing Wang†

wq@iscas.ac.cn

Laboratory for Internet Software Technologies,

Institute of Software Chinese Academy of Sciences

Beijing, China

State Key Laboratory of Computer Science,

Institute of Software Chinese Academy of Sciences

Beijing, China

University of Chinese Academy of Sciences

Beijing

China

Shoubin Li

shoubin@iscas.ac.cn

Laboratory for Internet Software Technologies,

Institute of Software Chinese Academy of Sciences

Beijing, China

University of Chinese Academy of Sciences

Beijing

China

ABSTRACT

Natural language (NL) documentation is the bridge between soft-

ware managers and testers, and NL test cases are prevalent in

system-level testing and other quality assurance activities. Due

to reasons such as requirements redundancy, parallel testing, and

tester turnover within long evolving history, there are inevitably

lots of redundant test cases, which signicantly increase the cost.

Previous redundancy detection approaches typically treat the tex-

tual descriptions as a whole to compare their similarity and suer

from low precision. Our observation reveals that a test case can

have explicit test-oriented entities, such as tested function Compo-

nents, Constraints, etc; and there are also specic relations between

these entities. This inspires us with a potential opportunity for

accurate redundancy detection. In this paper, we rst dene ve

test-oriented entity categories and four associated relation cate-

gories and re-formulate the NL test case redundancy detection

problem as the comparison of detailed testing content guided by

the test-oriented entities and relations. Following that, we propose

Tscope

, a ne-grained approach for redundant NL test case de-

tection by dissecting test cases into atomic test tuple(s) with the

entities restricted by associated relations. To serve as the test case

dissection,

Tscope

designs a context-aware model for the automatic

entity and relation extraction. Evaluation on 3,467 test cases from

∗Both authors contributed equally to this research.

†Corresponding authors.

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

ESEC/FSE ’22, November 14–18, 2022, Singapore, Singapore

ACM ISBN 978-1-4503-9413-0/22/11.

https://doi.org/10.1145/3540250.3549089

ten projects shows

Tscope

could achieve 91.8% precision, 74.8%

recall, and 82.4% F1, signicantly outperforming state-of-the-art

approaches and commonly-used classiers. This new formulation

of the NL test case redundant detection problem can motivate the

follow-up studies to further improve this task and other related

tasks involving NL descriptions.

CCS CONCEPTS

•Software and its engineering →Software testing and de-

bugging;Acceptance testing.

KEYWORDS

Test Case Redundancy, Entity and Relation Extraction, Natural

Language Processing

ACM Reference Format:

Zhiyuan Chang, Mingyang Li, Junjie Wang, Qing Wang, and Shoubin Li.

2022. Putting Them under Microscope: A Fine-Grained Approach for De-

tecting Redundant Test Cases in Natural Language. In Proceedings of the

30th ACM Joint European Software Engineering Conference and Sympo-

sium on the Foundations of Software Engineering (ESEC/FSE ’22), Novem-

ber 14–18, 2022, Singapore, Singapore. ACM, New York, NY, USA, 12 pages.

https://doi.org/10.1145/3540250.3549089

1 INTRODUCTION

Software testing is an activity to ensure that an entire system meets

its requirements [

]. In the testing phase, testers need to analyze the

requirements specication, identify all the test execution scenarios,

and then instantiate them in manually written test cases [

]. Such

test cases are typically described in natural language (NL). Due to

their adjustability and interpretability, the NL test cases are still

prevalent in industrial practice [32].

A requirement covers multiple features, and there may be over-

lapping features among requirements. For a large software project,

arXiv:2210.01661v1 [cs.SE] 4 Oct 2022

ESEC/FSE ’22, November 14–18, 2022, Singapore, Singapore Zhiyuan Chang, Mingyang Li, Junjie Wang, Qing Wang, and Shoubin Li

the requirements are typically tested by dierent engineers, and en-

gineers are not aware of the feature overlapping. Test redundancy

may produce when each test engineer individually designs test

case(s) for assigned requirements [

]. As the system evolves,

the redundant test cases signicantly increase the cost of testing,

as well as maintenance eort[

]. The problem is especially obvi-

ous in the manual testing scenario where human testers must read

through test steps and carry them out manually by interacting with

the system [22].

To alleviate the issue, information retrieval-based approaches

have been proposed to automatically detect redundancy among the

NL test cases [

]. The general idea is to vectorize the de-

scription of the test case with text representing models, e.g., vector

space model or Doc2Vec, and conduct the similarity comparison

on it. However, these existing approaches suer from low accuracy

because they treat test cases’ textual descriptions as a whole, and

thus can not capture its ne-grained semantic information and

inherent meaning. Meanwhile, we have the following two observa-

tions which can facilitate the similarity comparison and redundancy

detection of the NL test case.

First, the test case has explicit categories of test-oriented

entities which can facilitate accurate redundancy detection.

Take Figure 1as an example, the two test cases look similar in their

textual descriptions, and would be detected as redundancy with the

aforementioned information retrieval-based approaches. However,

if putting these two test cases under the microscope, we can nd

that the executing manners of these two test cases (“mesa-util tool”

and “UnixBench tool”) are dierent, based on which, we can distin-

guish them accurately. More than that, one can easily observe that

there are dierent categories of test-oriented entities, for example,

“gear rotation processing” is the tested functional component, while

“when drawing 3D graphics” is the pre-conditions for executing the

test case. Only when the specic categories of test-oriented entities

are mapped, can the two test cases be determined as redundant.

Taken in this sense, this paper aims at identifying the test-oriented

entities to facilitate the accurate detection of NL test cases.

Figure 1: Non-redundant test cases with similar descriptions

Second, there might be multiple test-oriented entities that

need to be carefully parsed and matched to ensure accurate

redundancy detection.

The rst observation has motivated us to

conduct the comparison within the same category of test-oriented

entities for determining redundancy. However, when we put the

two test cases in Figure 2under the microscope, a second observa-

tion is made. There are both testing Behavior “browse” and tested

Component “visit history” in these two test cases, yet they are ex-

pressing dierent test-oriented operational information. In detail, in

test case #346, the Behavior “browse” is targeted at Component “con-

tent of each resource diretory”, and the Component “visit history”

is associated with the Behavior “switch”, while in test case #525

Behavior “browse” is directly for Component “visit history”. The

observation implies that the multiple test-oriented entities need to

be carefully parsed and matched, and it is necessary to identify the

test-oriented operational information, i.e., entities and associated

relations when analyzing test cases to achieve accurate redundancy

detection.

Figure 2: Non-redundant test cases with multiple test-

oriented entities

Motivated by the two ndings, we dene ve test-oriented en-

tity categories, i.e., Component,Behavior,Prerequisite,Manner and

Constraint, and four relation categories associated with the enti-

ties. We then re-formulate the NL test case redundancy detection

problem as the comparison of detailed testing content guided by

the test-oriented entities and relations.

Following that, we propose a ne-grained redundant test case de-

tection approach

Tscope1

, which dissects the test case into atomic

test tuple(s) with the ve entities restricted by their associated rela-

tions, and conducts the comparison on them. One example test tuple

dissected from Test case #525 in Figure 2is as follows, Behavior

“browse”, component “visit history” and Manner “mouse”. To achieve

this,

Tscope

rst designs a context-aware model for extracting test-

oriented entities and relations from test case descriptions, which

considers the global context of the test case for entity extraction,

and the local context of the involving entities for relation extraction.

After that,

Tscope

dissects each test case into the structured atomic

testing tuple(s) guided by the extracted entities and relations. Fi-

nally,

Tscope

detects redundancy by comparing the entities in each

tuple pair, considering the semantic meaning of the entities as well

as their involved indicative words.

We evaluate

Tscope

on 3,467 test cases from ten projects. The

evaluation results show that

Tscope

could reach 97.5% precision,

94.8% recall for the entity extraction, and 90.4% precision, 97.6%

recall for the relation extraction, which signicantly outperforms

two state-of-the-art approaches. For the redundancy detection task,

Tscope

could achieve 91.8% precision, 74.8% recall and 82.4% F1.

Compared with the two state-of-the-art redundancy detection ap-

proaches and four commonly-used classiers,

Tscope

is 19.8%-23.4%

higher in F1. Moreover, the results of ablation experiments show

that the ve entity categories all play signicant roles in Tscope.

The new formulation of the NL test case redundant detection

problem can motivate the follow-up studies to further improve this

task, and other related tasks involving NL descriptions. Actually,

there are several tasks in software engineering domain involving

the similarity comparison of two textual documents, e.g., duplicate

test reports detection [

], similar Stack Overow questions

identication [

], duplicate requirements detection [

], etc. The

previous techniques typically treat the textual descriptions as a

We name our approach as

Tscope

considering it likes a microscope to inspect the

detailed information in test cases to facilitate the redundant detection.

Puing Them under Microscope: A Fine-Grained Approach for Detecting Redundant Test Cases in Natural LanguageESEC/FSE ’22, November 14–18, 2022, Singapore, Singapore

whole for the similarity comparison, while ignoring the ne-grained

semantic information hidden in the text. The new formulation

proposed in this paper, i.e., comparison of detailed content guided

by the scenario-related entities and relations, could potentially

motivate the researchers in these related elds.

In summary, the key contributions of this paper are as follows:

•

The new formulation of the NL test case redundancy detec-

tion problem, i.e., the comparison of detailed testing content

guided by the test-oriented entities and relations.

•

A ne-grained redundancy detection approach

Tscope

for

NL test cases, which dissects the test case into atomic test

tuple(s) with the ve entities restricted by their associated

relations, and conducts the comparison on them.

•

A context-aware model for extracting test-oriented entities

and their relations from test case descriptions, which in-

volves the global context of the test case in entity extraction,

and the local context of the involved entities for relation

extraction.

•

Evaluation with 3,467 test cases from ten projects, with

promising results. We also publicize the source code

for

facilitating follow-up studies and other related tasks.

The remainders of the paper are as follows: Section 2presents the

empirical studies of the entity category for redundancy detection.

Section 3elaborates the approach. Section 4presents the experi-

ment design. Section 5describes the results. Section 6discusses

the learned lessons. Section 7introduces the related work and its

limitations. Section 8concludes our work.

2 EMPIRICAL ANALYSIS OF ENTITIES AND

RELATIONS

2.1 Categories of Entities and Relations

Motivated by the observations in Section 1, we provide a new for-

mulation of the NL test case redundancy detection problem, i.e., the

comparison of detailed testing content guided by the test-oriented

entities and relations. To achieve this, we dene ve categories of

entities and four categories of relations associated with the entities.

Specically, we explore entity and relation categories through a

bottom-up analysis approach. Specically, three researchers (details

in Section 4.2) are involved in mining the categories of entities and

relations that aect redundancy detection in the test case text. If all

three researchers agree on adding a category, this entity category is

admitted and added to the entity category set. While if their views

diverge, the decision is made through a voting mechanism, i.e., the

entity category will be added to the set if it is admitted by at least

two researchers. Finally, we obtain the ve entity categories and

corresponding relations among the entity categories. Table 1shows

each entity/relation category and examples.

2.1.1

Categories of Entities

.The denition of the ve entity

categories is based on the purpose and basics of software testing,

as well as the observations on NL test cases. First, test cases are

driven by the feature(s) in requirements, and a feature species the

behavior

of one or more

components

in terms of their current

conditions

[

]. Taken in this sense, the key entities in a feature will

2https://github.com/czycurefun/testcase_detection

also be reected in the test case descriptions. Therefore, we identify

three entity categories “Component”, “Behavior” and “Prerequisite”

respectively.

Second, according to our observations, test cases dier by the

Manner sometimes. For example, there are descriptions of two non-

redundant test cases in Figure 1. The two test cases have the same

Prerequisite (“When drawing 3D graphics”) and Component (“gear

rotation processing”), but dierent operation manner (“mesa-util

tool” and “UnixBench tool”). To reect this dierence, we dene an

entity category “Manner”.

Third, in some cases, test cases may dier by the satised con-

straints. For example, there are two descriptions, “Test there are

preset applications after the system installation” and “Test the preset

applications including FTP application after the system installation”.

The two test cases have the same Component (“preset applications”)

but the latter additionally involves the constraint (“including FTP

application”). Accordingly, we dene an entity category “Constraint”

to indicate the dierence.

2.1.2

Categories of Relations

.As shown in Figure 2, there may

be multiple test-oriented entities per entity category within a test

case, which implies the need for inspecting the entities within the

test case a step further. Taking Test Case #346 in Figure 2as an

example, Behavior “browse” is targeting at Components “contents

of each resource diretory”, and Behavior “switch” is acting on Com-

ponents “visit history”. This demonstrates the mapping between

Components and Behavior, and we dene it as the Act relation.

We also observe the relations in terms of the other three cate-

gories of entities, e.g., the executing manner of the testing. And

considering the components in the test case are the basic object of

the testing content, we dene other three relations between Com-

ponent and Prerequisite,Manner,Constraint to indicate the detailed

information of the testing (details in Table 1).

2.2 Correlation Analysis

We conduct an empirical study to investigate the eectiveness of

the entity categories for redundancy detection. Specically, we

randomly sample 5,000 test case pairs and manually label each test

case by comparing each pair.

Then, we build ve Boolean variables

by manual judgment, i.e.,

𝐸𝑄𝑐𝑜𝑚

𝐸𝑄𝑏𝑒ℎ

𝐸𝑄𝑝𝑟𝑒

𝐸𝑄𝑚𝑎𝑛

and

𝐸𝑄𝑐𝑜𝑛

Each variable represents the entities belonging to each category

in the summaries are manually judged as equivalent. At the same

time, a variable

𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡

is built according to the redundancy

label (not based on entity comparison), representing whether a test

case is truly redundant.

We analyze the correlation between the above ve variables

and the variable

𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡

. Table 2shows the Pearson correlation

coecient and p-value of the correlation test. The results show

that the ve entity categories are signicantly correlated to the

variable

𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡

, which indicates the eectiveness of each en-

tity category for redundancy detection. Moreover, we analyze the

consistency of the two variables, i.e.,

𝐸𝑄𝑎𝑙𝑙

and

𝑅𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡

, where

𝐸𝑄𝑎𝑙𝑙

represents that the entities belonging to the ve entity cate-

gories in the test case pair are all equivalent by manual comparison.

Cohen

kappa coecient is 0.984, which shows the signicant consis-

tency of the two distributions. The results indicate that redundant

The test case pairs are built from the dataset in Table 4. The pairing and labeling

processes are consistent with the descriptions in Section 4.2.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PuttingThemunderMicroscope:AFine-GrainedApproachforDetectingRedundantTestCasesinNaturalLanguageZhiyuanChang∗MingyangLi∗{zhiyuan2019,mingyang2017}@iscas.ac.cnLaboratoryforInternetSoftwareTechnologies,InstituteofSoftwareChineseAcademyofSciencesBeijing,ChinaUniversityofChineseAcademyofSciences,Beijing,...

展开>> 收起<<

Putting Them under Microscope A Fine-Grained Approach for Detecting Redundant Test Cases in Natural Language.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Putting Them under Microscope A Fine-Grained Approach for Detecting Redundant Test Cases in Natural Language

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: