OSS Mentor A framework for improving developers contributions via deep reinforcement learning Jiakuan Fan

2025-05-01 0 0 608.28KB 8 页 10玖币
侵权投诉
OSS Mentor: A framework for improving developers’ contributions via
deep reinforcement learning
Jiakuan Fan
School of Data Science and Engineering
East China Normal University
Shanghai, China
jkfan@stu.ecnu.edu.cn
Haoyue Wang
School of Data Science and Engineering
East China Normal University
Shanghai, China
51195100024@stu.ecnu.edu.cn
Wei Wang
School of Data Science and Engineering
East China Normal University
Shanghai, China
wwang@dase.ecnu.edu.cn
Ming Gao
School of Data Science and Engineering
East China Normal University
Shanghai, China
mgao@dase.ecnu.edu.cn
Shengyu Zhao
college of Electronical and Information Engineering
Tongji University
Shanghai, China
frank zsy@tongji.edu.cn
Abstract—In open source project governance, there has been
a lot of concern about how to measure developers’ contributions.
However, extremely sparse work has focused on enabling develop-
ers to improve their contributions, while it is significant and valu-
able. In this paper, we introduce a deep reinforcement learning
framework named Open Source Software(OSS) Mentor, which
can be trained from empirical knowledge and then adaptively
help developers improve their contributions. Extensive experi-
ments demonstrate that OSS Mentor significantly outperforms
excellent experimental results. Moreover, it is the first time that
the presented framework explores deep reinforcement learning
techniques to manage open source software, which enables us
to design a more robust framework to improve developers’
contributions.
Index Terms—open source software, contribution measure-
ment, contribution enhancement, deep reinforcement learning
I. INTRODUCTION
In recent years, open source software represented by Apache
has achieved great success [17]. Developers with different
development experiences in different regions gather together
spontaneously because of their interests, prestige, and employ-
ment needs [18]. Developers of open source software generally
contribute to the project by sharing experience [12], debugging
code [1], and submitting functional patches [2]. Many studies
have always used project contribution as an important indicator
to evaluate the status of developers [7].
There are many use cases in which we need to compare
and recognize different developers’ contributions. While tra-
ditional value-based software engineering [3], [4], [8] focuses
on creating economic value as a way to prioritize resource
allocation and scheduling, other measurements of value may
be more relevant in some of the use cases. One example is
that instructors need a tool with which to evaluate individual
students’ code contributions to group projects (besides non-
code contributions). Such measurement of code contributions
has nothing to do with economic returns. As a second example,
an engineering manager may need a quantitative measurement
of team members’ performance. Additionally, for open-source
software projects, developers’ contributions heavily influence
collaboration, coordination, and leadership [11], [14].
Therefore, modeling all the data of the top contributors of
the project and analyzing the correlation between all actions
quantitatively so that they can provide insights for improving
the contributions of developers, which is the main research
significance of this topic. Increasing developers’ contributions
help software engineering project managers to set up project
teams based on developers’ profiles, thereby improving the
productivity and code quality of the teams’ development. As
far as we know, there are no guidances on how to improve the
contributions of developers in open source projects, and more
on how to participate in open source projects. Because there
are many ways for developers to contribute, and it is difficult
to get a general guide.
At present, when developers want to participate in open
source projects, the more common guidances are contribution
guidelines. The contribution guidelines are textual documen-
tation files, which embody a software project’s contribution
process and document the contribution expectations of project
maintainers. However, there has yet to be an exploration of
what contribution guidelines contain and whether projects
adhere to the workflows they prescribe. Currently, almost
no one is doing anything similar to improve developers’
contributions in open source software, while it is significant
and valuable.
In this paper, we first define the model for the contribution,
which reflects the mutual important relationships between
actions over time, with considering all the possible actions
(both coded and non-coded) from the perspective of the whole
project. Further, we propose the Open Source Software(OSS)
Mentor framework to help developers maximize their contri-
butions by translating the actual problem into a reinforcement
learning problem. In addition, we significantly improve the
performance of the algorithm by enhancing the utilization
of parameters during training. The main contributions of our
paper:
arXiv:2210.13990v1 [cs.SE] 24 Oct 2022
1. We propose a data-based contribution evaluation model,
which can dynamically measure developers’ contributions
based on changes in data.
2. We address the challenge of how to improve developers’
contribution, which is an extremely rare and significant work
at present.
3. It is the first time that the presented framework explores
deep reinforcement learning techniques to manage open source
software, which enables us to design a more robust framework
to improve developers’ contributions.
4. We have performed extensive experiments, proving the
remarkable success of our proposed framework.
The main structure of the paper is as follows: In chapter
2 we focuses on our proposed framework OSS Mentor. In
Chapter 3 we validate our model. In Chapter 4 we present the
related work, and the discussion and summary sections are
presented in Chapters 5 and 6.
II. OSS MENTOR FRAMEWORK
In this section, firstly, we give a framework for contribution
assessment and show the overall architectural diagram of
the proposed model. Immediately afterward, we quantify the
essential elements of reinforcement learning in the context of
practical problems. After that, we illustrate the algorithmic
flow of the model based on the previous foundation. Finally,
we describe the training process of the model.
A. Overview
Definition of contribution. Previously, the work
on measuring developers’ contribution was basically at the
visual level, and the method of quantification was basically to
directly count the number of issues, issue comment, PR, PR
comment, etc., and to quantify the contribution by adding up
the empirical empowerment [16]. But this method is unreliable
because there is no analysis of the project’s data to obtain
results that conform to objective laws, and it does not reflect
the characteristics of the respective projects and changes over
time. However, in our work, we first introduced the concept of
entropy to measure how recognizable the developer’s actions
are to the developer, and then analyzed project-wide data to
determine the weighting relationship between actions from an
objective perspective. The entropy is calculated as shown in
the formula:
H(X) =
n
X
i=1
P(xi)logP (xi)(1)
P(xi)represents the probability of the event xi. In infor-
mation theory, entropy represents the degree of discrete infor-
mation, and the higher the degree of discrete, the greater the
entropy and the greater the amount of information represented.
So entropy is greatest with the discrete degree presenting
an average distribution. In our work, action events actually
executed by developers in open source projects are selected,
and the discrete degree of action is measured by calculating
the entropy value of the action event. If the higher the entropy
value of H(X) on the i-th action dimension, the greater the
amount of information, then the less discernible the i-th action
is to the developer, which means that everyone is more inclined
to perform it. Next, we use the entropy method to determine
the degree of importance of each action in an open source
project.
However, a prerequisite for information entropy is the
assumption of independence between actions. And in practice,
because of the inter-information problem between actions (e.g.
there is a strong correlation between issue and issue comment),
it does not directly satisfy the entropy-weighted computational
system. To solve the inter-information problem, we replace
information entropy with conditional entropy. The formula is
shown in the figure:
Hi(Y|X) = X
xX,yY
pi(x, y)log pi(x, y)
pi(x)(2)
We overcome the problem of the assumption that actions in
open source projects are not independent between each other
by introducing conditional probabilities, e.g., issue and issue
comment are not independent between two actions. With the
above method, we can calculate the weight vector Wifor each
action dimension of the project. Finally, we get the calculation
of the contribution:
C(i,k)=
T
X
t=1
W(i,k)At
(i,k)(3)
where At
(i,k)denotes how many times the action is executed
on the t-th step of the k-th episode of the i-th project. W(i,k)
is computed from the conditional entropy model, and it has
been normalized, which means PT
t=1 Wt= 1.
Our work on the determination of the weights is extremely
significant. First of all, the determination of the weights is
based entirely on data, unlike previous work [16] which is
artificially determined through expert experience. Second, and
most importantly, the weights are dynamic. That is, changes
in project data over time and changes in project status, among
other factors, can cause the weights to be updated. Therefore,
we define weights that reflect not only the mutual importance
relationships between actions over time, but also the important
relationships that are specific to the actions between different
projects.
P roposed model. This is the first time that deep
reinforcement learning has been explored in the field of open
source project governance. The goal of the model is to max-
imize the cumulative contribution of the developer after mul-
tiple executions of the action. The detailed flow of the model
is given in Figure 1. First, the environment section is a pre-
trained model that uses the contribution quantification method
(Equation 3) to pre-train the weight vector Wi, and contains
the contributors’ action dataset Ei=e1
i, e2
i, e3
i,· · · , en
i.
Developer as an Agent selects action at
d,i in state st
iaccording
to its own policy. At this point, at
d,i matches the sequence of
actions at
e,i with the corresponding contributor that matches
the current state of the developer’s ability from a project Pi
摘要:

OSSMentor:Aframeworkforimprovingdevelopers'contributionsviadeepreinforcementlearningJiakuanFanSchoolofDataScienceandEngineeringEastChinaNormalUniversityShanghai,Chinajkfan@stu.ecnu.edu.cnHaoyueWangSchoolofDataScienceandEngineeringEastChinaNormalUniversityShanghai,China51195100024@stu.ecnu.edu.cnWeiW...

展开>> 收起<<
OSS Mentor A framework for improving developers contributions via deep reinforcement learning Jiakuan Fan.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:608.28KB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注