Code Librarian A Software Package Recommendation System Lili Tao Alexandru-Petre Cazan Senad Ibraimoski and Sean Moran

2025-04-24 0 0 439.61KB 3 页 10玖币

侵权投诉

Code Librarian: A Software Package

Recommendation System

Lili Tao, Alexandru-Petre Cazan, Senad Ibraimoski and Sean Moran

JP Morgan Chase

Email: {lili.tao,alexandru-petre.cazan,senad.ibraimoski,sean.j.moran}@jpmchase.com

Abstract—The use of packaged libraries can signiﬁcantly

shorten the software development life cycle by improving the

quality and readability of code. In this paper, we present a

recommendation engine called Code Librarian for open source

libraries. A candidate library package is recommended for a

given context if: 1) it has been frequently used with the imported

libraries in the program; 2) it has similar functionality to the

imported libraries in the program; 3) it has similar functionality

to the developer’s implementation, and 4) it can be used efﬁciently

in the context of the provided code. We apply the state of the art

CodeBERT-based model for analysing the context of the source

code to deliver relevant library recommendations to users.

Index Terms—artiﬁcial intelligence, software engineering, rec-

ommender systems

I. INTRODUCTION

Reusing existing software libraries brings many beneﬁts,

including the acceleration of software development and an

increase in the quality and readability of code. In this paper,

we introduce Code Librarian, a software library recommenda-

tion system that uses machine learning techniques to suggest

relevant open source libraries based on the context of the code

already written by a developer [1], [2]. For Python developers

there are more than 350,000 libraries [3] available on PyPi and

new library packages are frequently added. In addition, there

is rapid evolution of standard library practices across various

tasks. For example, in the ﬁeld of Natural Language Pro-

cessing (NLP) the commonly used libraries quickly expanded

from scikit-learn and genism to bertopic,top2vec,octis, based

on recent advances in NLP. Librarian is an intelligent coding

assistant that helps developers ﬁnd and reuse quality code and

components.

II. APPROACH AND METHODOLOGY

Figure 1 shows the approach: a) recommendation of com-

plementary libraries by learning which libraries are used most

frequently with those imported; b) recommendation of replace-

able libraries that can replace functionally similar code.

A. Complementary library recommendation

Learning embeddings for library packages: To discover

complimentary libraries we learnt a contextual embedding of

libraries based on their co-occurrence in the same scripts. We

followed [4] for learning the vector representation of library

packages in which a skip-gram model [5] is used to learn

embeddings for libraries based on their usage context. A pair

of imported libraries are deemed a positive example when the

target library co-occurred with the context library within a ﬁle

of at least one project. A negative pair are libraries that were

rarely imported together in any source ﬁle of any project in

the dataset. Cosine similarity between the embeddings is used

to ﬁnd very similar, and therefore, complimentary libraries.

Out of sample extension: For new library packages not

included in the training data, rather than re-train the model,

an embedding of the new package is learnt by projecting it

into the latent space. The new embedding can be calculated

by the weighted average of Nco-occurring packages in the

same ﬁle, with the weight representing the number of times

the pair appeared together: PN

i=1 wiPi

i=1 wi, where weights wiis the

number of times the unseen library co-occurred with library

B. Alternative library recommendation

Understanding the topic and functionality of source code

assists with the selection of relevant libraries. We leverage

CodeBERT [6] to learn the contextual representation of the

code and capture the semantic connection between natural

language and programming language. CodeBERT is applied

to generate a text description of each function or Jupyter

notebook cell for IPython notebook ﬁles. We concatenate the

text descriptions and use those as a query. The query is used

to retrieve matching libraries based on their description using

a vector-space retrieval method (bag-of-words, TF-IDF).

C. Deployment of Librarian

We developed a demo shown in Figure 2. CodeBERT was

packaged and deployed on AWS Sagemaker, while the main

application was deployed as a service on a Kubernetes cluster.

The user receives library recommendations after uploading a

Jupyter notebook ﬁle. For a more seamless user experience

we built a Jupyter notebook extension which subscribes to

cell change events and recommends libraries in real time. On

every cell update event, the current notebook sourcecode is

sent to a CodeBERT model for inference and the results are

shown in a sub-panel (Figure 3).

III. EXPERIMENTAL RESULTS

The proposed system has been evaluated on 375,128 pub-

licly available Python ﬁles from GitHub, and on 11,893 ﬁles

from a proprietary repository.

Complementary library recommendation: to evaluate com-

plementary library recommendation, we randomly remove one

arXiv:2210.05406v2 [cs.SE] 7 Feb 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CodeLibrarian:ASoftwarePackageRecommendationSystemLiliTao,Alexandru-PetreCazan,SenadIbraimoskiandSeanMoranJPMorganChaseEmail:flili.tao,alexandru-petre.cazan,senad.ibraimoski,sean.j.morang@jpmchase.comAbstractTheuseofpackagedlibrariescansignicantlyshortenthesoftwaredevelopmentlifecyclebyimprovingth...

展开>> 收起<<

Code Librarian A Software Package Recommendation System Lili Tao Alexandru-Petre Cazan Senad Ibraimoski and Sean Moran.pdf

共3页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Code Librarian A Software Package Recommendation System Lili Tao Alexandru-Petre Cazan Senad Ibraimoski and Sean Moran

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: