Predicting the Citation Count and CiteScore of Journals One Year in Advance

2025-05-02 0 0 1.21MB 43 页 10玖币

侵权投诉

Predicting the Citation Count and CiteScore of Journals

One Year in Advance?

William L. Crofta,∗

, J¨org-R¨udiger Sacka

aSchool of Computer Science, Carleton University. Ottawa, Canada.

Abstract

Prediction of the future performance of academic journals is a task that can

beneﬁt a variety of stakeholders including editorial staﬀ, publishers, indexing

services, researchers, university administrators and granting agencies. Using

historical data on journal performance, this can be framed as a machine

learning regression problem. In this work, we study two such regression tasks:

1) prediction of the number of citations a journal will receive during the next

calendar year, and 2) prediction of the Elsevier CiteScore a journal will be

assigned for the next calendar year. To address these tasks, we ﬁrst create

a dataset of historical bibliometric data for journals indexed in Scopus. We

propose the use of neural network models trained on our dataset to predict

the future performance of journals. To this end, we perform feature selection

and model conﬁguration for a Multi-Layer Perceptron and a Long Short-

Term Memory. Through experimental comparisons to heuristic prediction

baselines and classical machine learning models, we demonstrate superior

performance in our proposed models for the prediction of future citation and

CiteScore values.

Keywords: Impact metrics, Predictive modeling, Neural networks, Long

Short-Term Memory, CiteScore

?Funding: This work was supported by the Natural Sciences and Engineering Research

Council of Canada (NSERC) [grant number RGPIN-2016-06253].

∗Corresponding author

Email addresses: leecroft@cmail.carleton.ca (William L. Croft),

sack@scs.carleton.ca (J¨org-R¨udiger Sack)

Preprint submitted to Journal of Informetrics October 25, 2022

arXiv:2210.12908v1 [cs.DL] 24 Oct 2022

1. Introduction

Academic journals are one of the primary channels through which sci-

entiﬁc research is published and disseminated to both research communities

and wider audiences. The performance of such journals is a matter of inter-

est to a variety of stakeholders for diverse reasons. While information on the

past and present performance of journals is crucial in this regard, the ability

to predict future performance can also be highly valuable in the derivation

of complementary information as it lends itself to a deeper understanding of

the state of the journals. The more detail that can be provided on the prob-

able future performance of a journal, the easier it becomes to make decisions

that are impacted by the future state of the journal. There are numerous

situations in which this type of information is of value:

•Editorial boards for journals are directly responsible for ensuring a con-

sistent level of quality in the journals they manage. Decisions around

journal management and editorial board composition are made with

the future performance of the journal explicitly in mind.

•Indexing services must ensure that only journals of suﬃcient quality

are accepted for indexing. Knowledge about the future performance

of journals therefore helps to inform decision of whether new journals

should be accepted and whether currently indexed journals should be

removed.

•Publishers must monitor the performance of journals (both their own

and those of other publishers) to inform strategic decision-making.

Knowledge about the trajectory of journal performance and its pro-

jection into the future can help to make decisions on the acquisition of

existing journals and the launching of new journals.

•Some granting agencies maintain lists of journals categorized by strength.

These lists are updated regularly and early indications of which jour-

nals to move up or down might be helpful for granting agencies and

applicants.

•To academic institutions and countries, the quality of relevant journals

may serve as an indicator in the assessment of their scientiﬁc output.

Information on the expected performance of journals may help to in-

form decisions on budget allocation.

•For research groups and authors, information on journal quality is often

factored into the decision on where to submit their work. Due to the

lengthy duration of most review and publication processes, the future

performance of a journal is highly relevant in making this decision.

•To librarians building collections of journals, the performance of a jour-

nal may help to assess its merit for selection. Information on future

performance helps in the selection of journals leading to a sustainable

collection.

The information needed to quantitatively measure the performance of a

journal for a future point in time is necessarily incomplete since the events

contributing to any relevant performance metrics have not yet fully occurred.

It is therefore necessary to use predictive modeling to produce this informa-

tion. In this paper, we study the task of predicting the future performance of

journals. To this end, we construct a dataset on the historical performance

of journals and develop predictive models trained on this data.

1.1. Problem Domain

We consider two predictive tasks for the purpose of projecting the future

performance of journals.

Task 1: For a given journal, predict how many citations it will receive in

the next calendar year.

Alone, the number of citations received may not be indicative of journal

performance. However, this value oﬀers great ﬂexibility in its usage as it

can be fed into various calculations or models to assess journal performance.

Simple examples include examination of the predicted value in relation to the

average number of yearly publications of the journal, or plotting the citations

received by the journal during past years with the predicted value used to

extrapolate the plotted series. The number of citations may also be used as

an input feature to more sophisticated models that examine multiple aspects

of a journal.

Task 2: For a given journal, predict its Elsevier CiteScore Elsevier (2020a)

for the next calendar year.

As an impact-based metric, CiteScore values are useful indicators of the

performance of journals. We have chosen CiteScore over other impact-based

measures such as Journal Impact Factor due to its transparent and repro-

ducible calculation process. Further information on the strengths and weak-

nesses of various measures of impact is provided in Section 2.1.

While journal performance often does not change greatly from one year to

the next, there is still ample room for improvement over simple heuristic pre-

dictions (e.g., predicting the same value for the next year or a value following

from a linear trend), as later demonstrated in our experimental comparisons

(see Sections 7 and 8). Furthermore, the task of predicting values one year

in advance is an important step towards making longer range predictions.

We formalize the two predictive tasks as regression problems. For each,

we consider the use of various input features on the historical performance

of journals to predict the desired value. We investigate the potential of

deep learning approaches in this context and propose two such models which

achieve good performance.

1.2. Contributions and Paper Outline

The primary contributions of this work are as follows:

•We have collected and formed a dataset covering publication metrics

for over 24,000 journals indexed in Scopus during the period of 2000

through 2020 (Section 4).

•We have applied four standard machine learning algorithms (Linear

Regression, Decision Tree, Random Forest and k-Nearest Neighbors) to

establish baselines for the predictive tasks we have set out (Subections

5.3 and 6).

•We have identiﬁed and conﬁgured two deep neural network models

appropriate for our predictive tasks, one of which is particularly well-

suited to handling the type of timeseries data we have compiled (Subec-

tions 5.4 and 6).

•We have conducted experimental evaluations of our proposed models,

demonstrating improvements over the selected baselines (Section 7).

We begin by providing a review of relevant literature in Section 2. In

Section 3, we provide preliminary background information on concepts used

throughout the paper. We describe the data that we have collected and

the process we have applied to build a dataset from it in Section 4. We then

describe the features we have selected for the predictive tasks and the machine

learning models we have identiﬁed to handle these tasks in Section 5. We

provide details on the experiments we have performed for the conﬁguration of

the models in Section 6. In Section 7, we detail the experimental comparisons

we have conducted on ﬁnalized conﬁgurations of the predictive models and

the results we have obtained. We conclude the paper in Section 9.

2. Literature Review

The evaluation of performance in academic journals is dependent upon

the deﬁnition of metrics that provide meaningful information on various prop-

erties of journals. Among such metrics, those which examine the impact of

journals are often associated with some indication of the quality of journals.

We provide here a review of some of the most commonly-used measures of

impact and the works that have designed predictive tasks around them. We

additionally provide a brief review of predictive tasks related to paper-level

measures of impact.

2.1. Journal Bibliometrics

The ﬁeld of bibliometrics pertains to the statistical analysis of publi-

cations and their channels of publication (i.e., sources such as journals)

Manolopoulos & Katsaros (2017); Roldan-Valadez et al. (2019); Garc´ıa-Villar

& Garc´ıa-Santos (2021). In the context of academic journals, bibliometrics

often focuses on measures built around numbers of citations and publications

to assess the quality of journals. While general terms such as performance or

quality are often used when describing the goal of journal-centric bibliomet-

rics, these notions are typically vague and ill-deﬁned. The more precise intent

of many bibliometric measures is to capture scientiﬁc impact, generally em-

ploying some form of a ratio of citations received over documents published

Kim Kihong (2018). The notion of an “impact factor” in scientiﬁc literature

was ﬁrst introduced in the context of the proposal by Garﬁeld to develop a

citation index to track the linkages between indexed publications and their

citations Garﬁeld (2006). Impact factor refers to the eﬀect a publication or

a corpus of published literature has on the research community, observed

through the lens of citations attributed to the published work. Although the

measure of impact has a precise meaning that was not originally intended to

be employed as a general reﬂection of quality, it has found widespread usage

as a proxy for journal quality Prathap (2012).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

PredictingtheCitationCountandCiteScoreofJournalsOneYearinAdvance?WilliamL.Crofta,,Jorg-RudigerSackaaSchoolofComputerScience,CarletonUniversity.Ottawa,Canada.AbstractPredictionofthefutureperformanceofacademicjournalsisataskthatcanbenetavarietyofstakeholdersincludingeditorialsta,publishers,indexi...

展开>> 收起<<

Predicting the Citation Count and CiteScore of Journals One Year in Advance.pdf

共43页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Predicting the Citation Count and CiteScore of Journals One Year in Advance

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: