Requirements Engineering for Machine Learning A Review and Reﬂection Zhongyi Pei Lin Liu Chen Wang Jianmin Wang

2025-04-29 0 0 5.23MB 19 页 10玖币

侵权投诉

Requirements Engineering for Machine Learning: A

Review and Reﬂection

Zhongyi Pei, Lin Liu, Chen Wang, Jianmin Wang

National Engineering Research Center for Big Data Software

School of Software, Tsinghua University

Beijing, China

{peizhyi, linliu, wang_chen, jimwang}@tsinghua.edu.cn

Abstract—Today, many industrial processes are undergoing

digital transformation, which often requires the integration of

well-understood domain models and state-of-the-art machine

learning technology in business processes. However, requirements

elicitation and design decision making about when, where and

how to embed various domain models and end-to-end machine

learning techniques properly into a given business workﬂow

requires further exploration. This paper aims to provide an

overview of the requirements engineering process for machine

learning applications in terms of cross domain collaborations.

We ﬁrst review the literature on requirements engineering

for machine learning, and then go through the collaborative

requirements analysis process step-by-step. An example case of

industrial data-driven intelligence applications is also discussed

in relation to the aforementioned steps.

Index Terms—requirements engineering, machine learning,

domain model, industrial engineering, review

I. INTRODUCTION

TODAY, the world is witnessing many successful appli-

cations of machine learning techniques, including image

recognition, speech recognition, trafﬁc prediction, self-driving

cars, virtual personal assistants, buyers’ preference prediction

and product recommendations [1]. In recent years, there are

many research efforts on understanding how the software

engineering processes should response to the needs of machine

learning applications, and what changes have data-intensive

intelligent systems brought to requirements engineering [2].

In requirements engineering, there are growing interests in

understanding various needs and aspects of machine learning

application systems. Research topics of interest include the

non-functional requirements elicitation and quality assurance

of machine learning models and applications, especially the

ones different from traditional information systems develop-

ments. For instance, performance metrics, such as precision

and recall, F-measure, ROC curve, are critical acceptance

criteria for the viability of speciﬁc machine learning algo-

rithms in speciﬁc contexts, which also direct the continuous

optimization of ML models. In addition, Berry discussed

requirements speciﬁcations for AI applications in terms of

performance measures acceptable in a given context, as a

value or criteria [3]. Other well-discussed topics include the

Financial Support from National key Research and Development Program

Project 2021YFB1715200, and NSFC Innovation Group Project 62021002 is

gratefully acknowledged.

explainability of machine learning models [4], the fairness

and unbiasness of predictive analysis results [5], the legal and

ethical compliance requirements of ML intensive systems, etc.

There are three sub-disciplines involved, namely software

requirements engineering, data and knoweldge engineering,

and artiﬁcial intelligence/machine learning involved. In re-

quirements engineering, various conceptual modeling ap-

proaches are used to elicit software system requirements and

specify the expected system structure and behaviour. For

instance, goal-oriented requirements modeling ﬁrst represents

the high-level objectives of system users and designers, and

then elaborates on the success and acceptance criteria of re-

quired system by goal decomposition and reﬁnement [6]. After

fully understanding the high-level objectives, system archtec-

ture and behavior are designed and represented as formal/semi-

formal modeling speciﬁcations. For example, automata and

state machine diagrams in UML and SysML diagrams [7] are

provn useful in analysing reactive systems requirements, speci-

fying domain object properties and business logics through hu-

man understandable patterns, and widely used in the domain of

industrial automation and control. Besides, quality assurance

to speciﬁed system behaviors and causal relationship can be

conducted by formalized veriﬁcations and validations [8].

On the other hand, in many science and engineering do-

mains, there are dominating physical or process models, such

as mechanical models in mechanical engineering, chemical

reaction models in chemical engineering, structural mechanics

models in building and construction etc. The mathematical

models are in the form of equations, directed causal networks,

3D simulations of structures or dyanmic behaviors [9], which

deﬁnes the nature of the learning problem, the structure, the

loss functions and hyperparameters of neural networks models

and algorithms, referred to as machine learning models.

The collaboration of people with different expertise is

considered a major challenge, as we need to bridging se-

mantical gaps between different knowledge areas, integrating

interdisciplinary methods and tools into a coherent process,

and generating evolvable learning systems.

This paper aims to provide an overview of the collabora-

tion among the different roles in requirements engineering

for machine learning systems. We ﬁrst review the literature

on requirements engineering for machine learning, and then

dig into what each role concerns during the collaborative

requirement understanding and system development process.

We further summarize the typical patterns for collaborations,

and propose high-level guidelines for evaluation and selection

of viable patterns.

The rest of the paper are structured as follows: Section II

explains our research method, by which we select literature

papers; Section III gives our analysis result, a brief review

of related work and a summary of the general concerns

and challenges of collaboration; In Section IV we propose

a collaborative requirements analysis process and present one

example case and the lessons learnt from actual requirements

analysis; Section V concludes the paper.

II. RESEARCH METHOD

Research on RE4ML (requirements engineering for machine

learning) has attracted growing interest in recent years. In

this section, we ﬁrst raise the research questions, and then

introduce our review method. The review protocol includes:

(i) how to select the document sources; (ii) what to use as the

search string; and (iii) the inclusion or exclusion criteria in this

review. Following this protocol, the researchers performed a

parallel search in order to identify studies that address the

research questions.

A. Research Quesions

The main research questions we aim to answer in this paper

are as follows:

RQ1: What are the roles involved in engineering data-driven

intelligence applications?

RQ2: What are the major areas for engineers playing different

role to collaborate during requirements stage?

RQ3: What kind of support a collaborative requirements en-

gineering for machine learning is needed?

RQ4: What are the important issues require more future study?

We use these quesions to direct the review of the literature.

We ﬁrst examine the issues concerning different roles, and

summerize the scenarios when collaboration and mutual un-

derstanding is required. Then we give some example patterns

for cross-knowledge area collaboration. At last, we try to

propose a routine by which the patterns of collaboration are

evaluated and adapted for a given problem.

B. Search Strategy

Our search strategy was set out to ﬁnd the conjunction of

requirements engineering, data science and machine learning.

We conduct a search string-based database search on two

speciﬁc digital libraries, IEEExplore and ACM Digital Library.

For preventing from missing related papers, we use as few as

words to ﬁlter the papers. We use requirements as a required

word in title, while requirements engineering and machine

learning are required as the author keywords of the search.

The search is conducted by AND-operators. The year range

from January 2016 to June 2022 is also adopted since we

focus on the research that follows the recent trend of machine

learning.

C. Inclusion and Exclusion Criteria

The above search strategy yield 83 papers, 42 from IEEEx-

plore and 41 from ACM Digital Library. We ﬁrst executed

our exclusion criteria over these papers. By our exclusion

criteria, we ﬁltered out the publications whose topic has less

association with software engineering. An efﬁcient way to

do this is to ﬁlter out the papers whose title contains words

like teach,student,education and child. A large number of

the papers using machine learning to promote requirements

engineering steps (commonly known as ML for RE) should

also be ﬁltered out because their motivations are not consistent

with our research goals. We found that some words in the

titles could help us locate them, like automatic elicitation,

automated identiﬁcation,requirements classiﬁcation and ma-

chine learning-driven requirements. In addition to the above

ﬁltering methods, we had to complete the exclusion by reading

the abstracts and checking the motivations. After executing the

exclusion criteria, only 16 papers were left.

Then we conducted an iterative backward and forward

Snowballing method for reﬁning our results based on the

remaining papers via Google Scholar. The scope was limited to

software engineering methods for machine learning, machine

learning applications, developement issues of machine learning

ranging from 2016 to 2022. The ﬁnal list of include 163

papers. The processes of ﬁltering and reﬁning were done by

the ﬁrst two authors, and a detailed discussion was held to

reach consensus among all the authors.

III. SURVEY RESULTS AND DISCUSSION

We ﬁrst give out a list of all the selected papers in Table I.

As an early milestone in the data-driven intelligence develop-

ment paradigm, the Cross-Industry Standard Process for Data

Mining (CRISP-DM) organizes related analytics activities into

six phases: Business Understanding, Data Understanding, Data

Preparation, Modeling, Evaluation and Deployment [168].

The CRISP-DM suggests a well-deﬁned sequence of tasks

with iterative feedback loops that suggests a requirements

analysis cycle of data preparation, model design and evalution.

Recently, CRISP-ML(Q) extends CRISP-DM to support the

development of machine learning applications, whose special

focus is on quality measurements of machine learning models,

including robustness, scalability, explainability, model com-

plexity and resource demands [169].

Vogelsang and Borg set out to deﬁne characteristics and

challenges unique to Requirements Engineering (RE) for ML-

based systems [20]. They identiﬁed several major changes

in development paradigms, including the elicitation of ML

performance measurements, the emerging of quality require-

ments such as explainability, freedom from discrimination, and

speciﬁc legal requirements.

There are many recent proposals on software engineering

approaches for machine learning applications. Amershi et

al. [178] studied several representative example ML projects

in Microsoft, in which several major challenges and suc-

cess factors are summarised, including: sustainable end-to-

end pipeline; data collection, cleaning and accessibility; model

TABLE I: Topics of All the Seleted Papers

Topics Sum Papers

Big Picture 15 [10-24]

Stakeholders, Roles and Collaboration 8 [25-32]

Requirements Process Model 7 [33-39]

Requirements Elicitation and Speciﬁcation 9 [3, 40-47]

Quality, Security, Ethics, and Assessment 38 [48-85]

Physics-Informed and Knowledge-based 19 [9, 86-103]

Machine Learning System Development 15 [103-117]

Interpretability and Explainability 17 [118-134]

Data Pipeline 8 [135-142]

Model Provenance, Veriﬁcation 7 [143-149]

Applications 18 [150-167]

TABLE II: Distribution of Requirements-Related Concerns for ML Applications

Summary Business Experts Requirements Engineers Software Engineers Domain Experts Data Scientists

Concerns

(Functional

Goals, Non-

functional

Requirements)

•Business Goals

•Accuracy

•Stability

•Efﬁciency

•Fairness

•Stakeholders

•User Stories

•Domain Models

•Resources

•System Scope

•Prototyping

•Architecture

•Interface

•Speed and Cost

•Capacity

•Mechanism

design

•Data Explanation

•Knowledge

acquisition

•Data Pipeline

•Task Deﬁnition

•Train Resources

•Model Performace

•Explainability

Key

challenges

of RE for

data-driven

intelligence

In data-driven

intelligent applications,

the satisfaction

of business goals

are constrained

by limitations of

technological solutions.

Sometimes the business

experts have to make

compromises and

accept a less than

expected solution.

The requirements

process for data-driven

intelligence applications

is more complex than

traditional requirements

engineering, hence

impose changes to

existing vocabulary and

requirements analysis

tools.

The complexity of the

software architecture

requires extension

to include data and

machine learning

models. What is more,

it is harder to deﬁne the

prototype which relies

on a not unexplainable

model.

Domain experts shares

their understanding and

knowledge about the

working mechanism of

a given problem. How-

ever, this is a progres-

sive task as our under-

standing of the domain

evolves constantly.

It is extremely challeng-

ing for data scientists as

good quality data is al-

ways hard to get. Over-

come this limitation and

make good use of the

available data, and con-

vey technical limitations

as early as possible are

equally important.

Reference [170] [171] [79] [47] [80] [172] [173] [17] [22] [174] [175] [176] [87] [20] [177] [32]

evaluation, evolution and deployment, etc. Then a nine-stage

process model was proposed to address the above data-oriented

challenges (e.g., collection, cleaning, and labeling) and model-

oriented challenges (e.g., model requirements, feature engi-

neering, training, evaluation, deployment, and monitoring), in

which feedback loops are constructed from model evaluation

and monitoring back to the previous stages, and from model

training to feature engineering (e.g., in representation learn-

ing).

Nalchigar et al. [39] proposes a modeling methodology

representing generic ML design as solution patterns for busi-

ness analytics. The pattern maps an actual business decision

goal to a few questions, which are then answered through

insights obtained from machine learning based on given data.

Washizaki et al. [179] reviews architectural patterns and de-

sign patterns for ML systems covering different ML related

tasks, such as datalake for storage, provision of raw data for

analytics, decoupling of business logic from machine learning

workﬂow, adoption of event-driven micro-services, version

management of machine learning models, etc. The knowhow

is rich and reusable but cannot cover ML application design

process systematically. Trustworthiness of ML applications

requires the compliance to applicable laws and regulations,

as well as a series of domain speciﬁc physical laws. Hence

the elicitation and evaluation of the compliance has become

another major topic of interest in RE for ML. Sothilingam et al.

[180] conducted an empirical case study of three ML software

project organizations, and examined variations in project team

designs using i* concepts of Agents, Roles, and Positions to

support the analysis of complex organizational relationships

for insufﬁcient roles and expertises mapping.

There are related study on integrating scientiﬁc knowledge

with machine learning for engineering and environmental

systems, as well as hybrid modelling approaches that combine

machine learning and simulations [181]. The integration could

go both ways, either using ML to enhance domain models

where the cause-effect relations are not fully evident [182],

or using common-sense knowledge, common knowledge and

domain knowledge models to modify generic models for

speciﬁc domain. This is also called physics-aware learning or

informed machine learning [98].

A. RQ1: What are the roles involved in engineering data-

driven intelligence applications?

In requirements engineering for traditional software de-

velopment, the main roles are business experts, software

requirements engineers and development engineers. A gen-

eral requirements process starts with deﬁning the scope of

the business problem, which identiﬁes the stakeholders by

establishing the extent of the work. The software requirements

engineer further identiﬁes the requirements after requirements

elicitation and speciﬁcation through communication with the

stakeholders, especially the business expert. When it comes to

requirements of machine learning (or data-driven intelligence)

functionalities, data scientists will take part in the RE process,

and domain experts also play an irreplaceable role in industrial

applications since domain knowledge are always necessary for

understanding relevant theory and scenarios.

We summarize the concerns and challenges in process of

RE for ML in Table II. It is not an exhaustive list, but include

the ones that are most mentioned in the literature related to

data-driven intelligence requirements. For example, fairness is

introduced into the non-functional requirements since machine

learning models can be biased by chosing training datasets in

favor of certain group. And stability becomes more important

than ever as the predictive results generated by machine

learning models are unreliable when there is minor changes

of situation.

The challenges stand for urgent problems to be solved from

each role. For business experts, building a reasonable cognition

on related technologies is quite meaningful, which would give

the proposed business goals more supports. For requirements

engineers, researchers have proposed some novel requirements

modeling methods for machine learning applications in recent

years, considering factors like privacy [183], security [76],

scenarios [47] and goal revision [184]. For the other roles, the

challenges mainly come from multidisciplinary and technical

bottlenecks.

B. RQ2: What are the areas for the engineers to collaborate

during requirements stage?

In RE, there are many proven practices for the elicitation,

modeling, speciﬁcation, veriﬁcation and management of re-

quirements. These include goal-oriented modeling and analysis

of functional requirements using KAOS and non-functional

requirements using NFR, actor-based analysis to organiza-

tional structures with iStar, and scenario-based description

of use-system interactions with use cases and use stories.

These approaches well apply to the requirements processes

for current industrial applications. For one example, the Volere

Reqirements Process [185] is generally applicable to any early

requirements stage when we try to understand the business

context, form a system design idea, and verify it.

However, as we discussed in section III-A, the concerns

of each role have changed and more roles must be involved.

Digging into the concerns of each role, we can see the con-

nection between them. For example, the business goals from

business experts should be fulﬁlled by the prototypes from de-

velopment engineers, while the prototypes must correctly use

the machine learning models from data scientists. We decribe

the connections in Fig. 1, where the roles are represented by

circles, and red lines highlight the analysis process of using

data-driven ML approach to address a problem.

Here we list the most widely discussed collaboration-related

issues covered by the references.

•What should be considered if we want to use machine

learning models as expected? This issue covers a wide

range, including the widely concerned topic, XAI (or

trustworthy AI). The collaboration on this issue gener-

ally happens between requirements engineers and data

scientists. [77]

•How can software architectures be designed to enable

robust integration of machine learning models? This issue

exists because there is a huge gap between software

development technologies and data science. The architec-

tures design considerations have to include data quality,

uncertainty, privacy and so on. Obviously this belongs

to the partnership of development engineers and data

scientists. [85]

•How can the process of requirements analysis be adaptive

to machine learning systems? Due to big gap between tra-

ditional software and machine learning systems, existing

requirements methods have to be improved accordingly.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RequirementsEngineeringforMachineLearning:AReviewandReectionZhongyiPei,LinLiu,ChenWang,JianminWangNationalEngineeringResearchCenterforBigDataSoftwareSchoolofSoftware,TsinghuaUniversityBeijing,China{peizhyi,linliu,wang_chen,jimwang}@tsinghua.edu.cnAbstractToday,manyindustrialprocessesareundergoingd...

展开>> 收起<<

Requirements Engineering for Machine Learning A Review and Reﬂection Zhongyi Pei Lin Liu Chen Wang Jianmin Wang.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Requirements Engineering for Machine Learning A Review and Reﬂection Zhongyi Pei Lin Liu Chen Wang Jianmin Wang

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: