Requirements Engineering for Machine Learning A Review and Reflection Zhongyi Pei Lin Liu Chen Wang Jianmin Wang

2025-04-29 0 0 5.23MB 19 页 10玖币
侵权投诉
Requirements Engineering for Machine Learning: A
Review and Reflection
Zhongyi Pei, Lin Liu, Chen Wang, Jianmin Wang
National Engineering Research Center for Big Data Software
School of Software, Tsinghua University
Beijing, China
{peizhyi, linliu, wang_chen, jimwang}@tsinghua.edu.cn
Abstract—Today, many industrial processes are undergoing
digital transformation, which often requires the integration of
well-understood domain models and state-of-the-art machine
learning technology in business processes. However, requirements
elicitation and design decision making about when, where and
how to embed various domain models and end-to-end machine
learning techniques properly into a given business workflow
requires further exploration. This paper aims to provide an
overview of the requirements engineering process for machine
learning applications in terms of cross domain collaborations.
We first review the literature on requirements engineering
for machine learning, and then go through the collaborative
requirements analysis process step-by-step. An example case of
industrial data-driven intelligence applications is also discussed
in relation to the aforementioned steps.
Index Terms—requirements engineering, machine learning,
domain model, industrial engineering, review
I. INTRODUCTION
TODAY, the world is witnessing many successful appli-
cations of machine learning techniques, including image
recognition, speech recognition, traffic prediction, self-driving
cars, virtual personal assistants, buyers’ preference prediction
and product recommendations [1]. In recent years, there are
many research efforts on understanding how the software
engineering processes should response to the needs of machine
learning applications, and what changes have data-intensive
intelligent systems brought to requirements engineering [2].
In requirements engineering, there are growing interests in
understanding various needs and aspects of machine learning
application systems. Research topics of interest include the
non-functional requirements elicitation and quality assurance
of machine learning models and applications, especially the
ones different from traditional information systems develop-
ments. For instance, performance metrics, such as precision
and recall, F-measure, ROC curve, are critical acceptance
criteria for the viability of specific machine learning algo-
rithms in specific contexts, which also direct the continuous
optimization of ML models. In addition, Berry discussed
requirements specifications for AI applications in terms of
performance measures acceptable in a given context, as a
value or criteria [3]. Other well-discussed topics include the
Financial Support from National key Research and Development Program
Project 2021YFB1715200, and NSFC Innovation Group Project 62021002 is
gratefully acknowledged.
explainability of machine learning models [4], the fairness
and unbiasness of predictive analysis results [5], the legal and
ethical compliance requirements of ML intensive systems, etc.
There are three sub-disciplines involved, namely software
requirements engineering, data and knoweldge engineering,
and artificial intelligence/machine learning involved. In re-
quirements engineering, various conceptual modeling ap-
proaches are used to elicit software system requirements and
specify the expected system structure and behaviour. For
instance, goal-oriented requirements modeling first represents
the high-level objectives of system users and designers, and
then elaborates on the success and acceptance criteria of re-
quired system by goal decomposition and refinement [6]. After
fully understanding the high-level objectives, system archtec-
ture and behavior are designed and represented as formal/semi-
formal modeling specifications. For example, automata and
state machine diagrams in UML and SysML diagrams [7] are
provn useful in analysing reactive systems requirements, speci-
fying domain object properties and business logics through hu-
man understandable patterns, and widely used in the domain of
industrial automation and control. Besides, quality assurance
to specified system behaviors and causal relationship can be
conducted by formalized verifications and validations [8].
On the other hand, in many science and engineering do-
mains, there are dominating physical or process models, such
as mechanical models in mechanical engineering, chemical
reaction models in chemical engineering, structural mechanics
models in building and construction etc. The mathematical
models are in the form of equations, directed causal networks,
3D simulations of structures or dyanmic behaviors [9], which
defines the nature of the learning problem, the structure, the
loss functions and hyperparameters of neural networks models
and algorithms, referred to as machine learning models.
The collaboration of people with different expertise is
considered a major challenge, as we need to bridging se-
mantical gaps between different knowledge areas, integrating
interdisciplinary methods and tools into a coherent process,
and generating evolvable learning systems.
This paper aims to provide an overview of the collabora-
tion among the different roles in requirements engineering
for machine learning systems. We first review the literature
on requirements engineering for machine learning, and then
dig into what each role concerns during the collaborative
requirement understanding and system development process.
We further summarize the typical patterns for collaborations,
and propose high-level guidelines for evaluation and selection
of viable patterns.
The rest of the paper are structured as follows: Section II
explains our research method, by which we select literature
papers; Section III gives our analysis result, a brief review
of related work and a summary of the general concerns
and challenges of collaboration; In Section IV we propose
a collaborative requirements analysis process and present one
example case and the lessons learnt from actual requirements
analysis; Section V concludes the paper.
II. RESEARCH METHOD
Research on RE4ML (requirements engineering for machine
learning) has attracted growing interest in recent years. In
this section, we first raise the research questions, and then
introduce our review method. The review protocol includes:
(i) how to select the document sources; (ii) what to use as the
search string; and (iii) the inclusion or exclusion criteria in this
review. Following this protocol, the researchers performed a
parallel search in order to identify studies that address the
research questions.
A. Research Quesions
The main research questions we aim to answer in this paper
are as follows:
RQ1: What are the roles involved in engineering data-driven
intelligence applications?
RQ2: What are the major areas for engineers playing different
role to collaborate during requirements stage?
RQ3: What kind of support a collaborative requirements en-
gineering for machine learning is needed?
RQ4: What are the important issues require more future study?
We use these quesions to direct the review of the literature.
We first examine the issues concerning different roles, and
summerize the scenarios when collaboration and mutual un-
derstanding is required. Then we give some example patterns
for cross-knowledge area collaboration. At last, we try to
propose a routine by which the patterns of collaboration are
evaluated and adapted for a given problem.
B. Search Strategy
Our search strategy was set out to find the conjunction of
requirements engineering, data science and machine learning.
We conduct a search string-based database search on two
specific digital libraries, IEEExplore and ACM Digital Library.
For preventing from missing related papers, we use as few as
words to filter the papers. We use requirements as a required
word in title, while requirements engineering and machine
learning are required as the author keywords of the search.
The search is conducted by AND-operators. The year range
from January 2016 to June 2022 is also adopted since we
focus on the research that follows the recent trend of machine
learning.
C. Inclusion and Exclusion Criteria
The above search strategy yield 83 papers, 42 from IEEEx-
plore and 41 from ACM Digital Library. We first executed
our exclusion criteria over these papers. By our exclusion
criteria, we filtered out the publications whose topic has less
association with software engineering. An efficient way to
do this is to filter out the papers whose title contains words
like teach,student,education and child. A large number of
the papers using machine learning to promote requirements
engineering steps (commonly known as ML for RE) should
also be filtered out because their motivations are not consistent
with our research goals. We found that some words in the
titles could help us locate them, like automatic elicitation,
automated identification,requirements classification and ma-
chine learning-driven requirements. In addition to the above
filtering methods, we had to complete the exclusion by reading
the abstracts and checking the motivations. After executing the
exclusion criteria, only 16 papers were left.
Then we conducted an iterative backward and forward
Snowballing method for refining our results based on the
remaining papers via Google Scholar. The scope was limited to
software engineering methods for machine learning, machine
learning applications, developement issues of machine learning
ranging from 2016 to 2022. The final list of include 163
papers. The processes of filtering and refining were done by
the first two authors, and a detailed discussion was held to
reach consensus among all the authors.
III. SURVEY RESULTS AND DISCUSSION
We first give out a list of all the selected papers in Table I.
As an early milestone in the data-driven intelligence develop-
ment paradigm, the Cross-Industry Standard Process for Data
Mining (CRISP-DM) organizes related analytics activities into
six phases: Business Understanding, Data Understanding, Data
Preparation, Modeling, Evaluation and Deployment [168].
The CRISP-DM suggests a well-defined sequence of tasks
with iterative feedback loops that suggests a requirements
analysis cycle of data preparation, model design and evalution.
Recently, CRISP-ML(Q) extends CRISP-DM to support the
development of machine learning applications, whose special
focus is on quality measurements of machine learning models,
including robustness, scalability, explainability, model com-
plexity and resource demands [169].
Vogelsang and Borg set out to define characteristics and
challenges unique to Requirements Engineering (RE) for ML-
based systems [20]. They identified several major changes
in development paradigms, including the elicitation of ML
performance measurements, the emerging of quality require-
ments such as explainability, freedom from discrimination, and
specific legal requirements.
There are many recent proposals on software engineering
approaches for machine learning applications. Amershi et
al. [178] studied several representative example ML projects
in Microsoft, in which several major challenges and suc-
cess factors are summarised, including: sustainable end-to-
end pipeline; data collection, cleaning and accessibility; model
TABLE I: Topics of All the Seleted Papers
Topics Sum Papers
Big Picture 15 [10-24]
Stakeholders, Roles and Collaboration 8 [25-32]
Requirements Process Model 7 [33-39]
Requirements Elicitation and Specification 9 [3, 40-47]
Quality, Security, Ethics, and Assessment 38 [48-85]
Physics-Informed and Knowledge-based 19 [9, 86-103]
Machine Learning System Development 15 [103-117]
Interpretability and Explainability 17 [118-134]
Data Pipeline 8 [135-142]
Model Provenance, Verification 7 [143-149]
Applications 18 [150-167]
TABLE II: Distribution of Requirements-Related Concerns for ML Applications
Summary Business Experts Requirements Engineers Software Engineers Domain Experts Data Scientists
Concerns
(Functional
Goals, Non-
functional
Requirements)
Business Goals
Accuracy
Stability
Efficiency
Fairness
Stakeholders
User Stories
Domain Models
Resources
System Scope
Prototyping
Architecture
Interface
Speed and Cost
Capacity
Mechanism
design
Data Explanation
Knowledge
acquisition
Data Pipeline
Task Definition
Train Resources
Model Performace
Explainability
Key
challenges
of RE for
data-driven
intelligence
In data-driven
intelligent applications,
the satisfaction
of business goals
are constrained
by limitations of
technological solutions.
Sometimes the business
experts have to make
compromises and
accept a less than
expected solution.
The requirements
process for data-driven
intelligence applications
is more complex than
traditional requirements
engineering, hence
impose changes to
existing vocabulary and
requirements analysis
tools.
The complexity of the
software architecture
requires extension
to include data and
machine learning
models. What is more,
it is harder to define the
prototype which relies
on a not unexplainable
model.
Domain experts shares
their understanding and
knowledge about the
working mechanism of
a given problem. How-
ever, this is a progres-
sive task as our under-
standing of the domain
evolves constantly.
It is extremely challeng-
ing for data scientists as
good quality data is al-
ways hard to get. Over-
come this limitation and
make good use of the
available data, and con-
vey technical limitations
as early as possible are
equally important.
Reference [170] [171] [79] [47] [80] [172] [173] [17] [22] [174] [175] [176] [87] [20] [177] [32]
evaluation, evolution and deployment, etc. Then a nine-stage
process model was proposed to address the above data-oriented
challenges (e.g., collection, cleaning, and labeling) and model-
oriented challenges (e.g., model requirements, feature engi-
neering, training, evaluation, deployment, and monitoring), in
which feedback loops are constructed from model evaluation
and monitoring back to the previous stages, and from model
training to feature engineering (e.g., in representation learn-
ing).
Nalchigar et al. [39] proposes a modeling methodology
representing generic ML design as solution patterns for busi-
ness analytics. The pattern maps an actual business decision
goal to a few questions, which are then answered through
insights obtained from machine learning based on given data.
Washizaki et al. [179] reviews architectural patterns and de-
sign patterns for ML systems covering different ML related
tasks, such as datalake for storage, provision of raw data for
analytics, decoupling of business logic from machine learning
workflow, adoption of event-driven micro-services, version
management of machine learning models, etc. The knowhow
is rich and reusable but cannot cover ML application design
process systematically. Trustworthiness of ML applications
requires the compliance to applicable laws and regulations,
as well as a series of domain specific physical laws. Hence
the elicitation and evaluation of the compliance has become
another major topic of interest in RE for ML. Sothilingam et al.
[180] conducted an empirical case study of three ML software
project organizations, and examined variations in project team
designs using i* concepts of Agents, Roles, and Positions to
support the analysis of complex organizational relationships
for insufficient roles and expertises mapping.
There are related study on integrating scientific knowledge
with machine learning for engineering and environmental
systems, as well as hybrid modelling approaches that combine
machine learning and simulations [181]. The integration could
go both ways, either using ML to enhance domain models
where the cause-effect relations are not fully evident [182],
or using common-sense knowledge, common knowledge and
domain knowledge models to modify generic models for
specific domain. This is also called physics-aware learning or
informed machine learning [98].
A. RQ1: What are the roles involved in engineering data-
driven intelligence applications?
In requirements engineering for traditional software de-
velopment, the main roles are business experts, software
requirements engineers and development engineers. A gen-
eral requirements process starts with defining the scope of
the business problem, which identifies the stakeholders by
establishing the extent of the work. The software requirements
engineer further identifies the requirements after requirements
elicitation and specification through communication with the
stakeholders, especially the business expert. When it comes to
requirements of machine learning (or data-driven intelligence)
functionalities, data scientists will take part in the RE process,
and domain experts also play an irreplaceable role in industrial
applications since domain knowledge are always necessary for
understanding relevant theory and scenarios.
We summarize the concerns and challenges in process of
RE for ML in Table II. It is not an exhaustive list, but include
the ones that are most mentioned in the literature related to
data-driven intelligence requirements. For example, fairness is
introduced into the non-functional requirements since machine
learning models can be biased by chosing training datasets in
favor of certain group. And stability becomes more important
than ever as the predictive results generated by machine
learning models are unreliable when there is minor changes
of situation.
The challenges stand for urgent problems to be solved from
each role. For business experts, building a reasonable cognition
on related technologies is quite meaningful, which would give
the proposed business goals more supports. For requirements
engineers, researchers have proposed some novel requirements
modeling methods for machine learning applications in recent
years, considering factors like privacy [183], security [76],
scenarios [47] and goal revision [184]. For the other roles, the
challenges mainly come from multidisciplinary and technical
bottlenecks.
B. RQ2: What are the areas for the engineers to collaborate
during requirements stage?
In RE, there are many proven practices for the elicitation,
modeling, specification, verification and management of re-
quirements. These include goal-oriented modeling and analysis
of functional requirements using KAOS and non-functional
requirements using NFR, actor-based analysis to organiza-
tional structures with iStar, and scenario-based description
of use-system interactions with use cases and use stories.
These approaches well apply to the requirements processes
for current industrial applications. For one example, the Volere
Reqirements Process [185] is generally applicable to any early
requirements stage when we try to understand the business
context, form a system design idea, and verify it.
However, as we discussed in section III-A, the concerns
of each role have changed and more roles must be involved.
Digging into the concerns of each role, we can see the con-
nection between them. For example, the business goals from
business experts should be fulfilled by the prototypes from de-
velopment engineers, while the prototypes must correctly use
the machine learning models from data scientists. We decribe
the connections in Fig. 1, where the roles are represented by
circles, and red lines highlight the analysis process of using
data-driven ML approach to address a problem.
Here we list the most widely discussed collaboration-related
issues covered by the references.
What should be considered if we want to use machine
learning models as expected? This issue covers a wide
range, including the widely concerned topic, XAI (or
trustworthy AI). The collaboration on this issue gener-
ally happens between requirements engineers and data
scientists. [77]
How can software architectures be designed to enable
robust integration of machine learning models? This issue
exists because there is a huge gap between software
development technologies and data science. The architec-
tures design considerations have to include data quality,
uncertainty, privacy and so on. Obviously this belongs
to the partnership of development engineers and data
scientists. [85]
How can the process of requirements analysis be adaptive
to machine learning systems? Due to big gap between tra-
ditional software and machine learning systems, existing
requirements methods have to be improved accordingly.
摘要:

RequirementsEngineeringforMachineLearning:AReviewandReectionZhongyiPei,LinLiu,ChenWang,JianminWangNationalEngineeringResearchCenterforBigDataSoftwareSchoolofSoftware,TsinghuaUniversityBeijing,China{peizhyi,linliu,wang_chen,jimwang}@tsinghua.edu.cnAbstract—Today,manyindustrialprocessesareundergoingd...

展开>> 收起<<
Requirements Engineering for Machine Learning A Review and Reflection Zhongyi Pei Lin Liu Chen Wang Jianmin Wang.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:5.23MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注