Predicting the Citation Count and CiteScore of Journals One Year in Advance

2025-05-02 0 0 1.21MB 43 页 10玖币
侵权投诉
Predicting the Citation Count and CiteScore of Journals
One Year in Advance?
William L. Crofta,
, J¨org-R¨udiger Sacka
aSchool of Computer Science, Carleton University. Ottawa, Canada.
Abstract
Prediction of the future performance of academic journals is a task that can
benefit a variety of stakeholders including editorial staff, publishers, indexing
services, researchers, university administrators and granting agencies. Using
historical data on journal performance, this can be framed as a machine
learning regression problem. In this work, we study two such regression tasks:
1) prediction of the number of citations a journal will receive during the next
calendar year, and 2) prediction of the Elsevier CiteScore a journal will be
assigned for the next calendar year. To address these tasks, we first create
a dataset of historical bibliometric data for journals indexed in Scopus. We
propose the use of neural network models trained on our dataset to predict
the future performance of journals. To this end, we perform feature selection
and model configuration for a Multi-Layer Perceptron and a Long Short-
Term Memory. Through experimental comparisons to heuristic prediction
baselines and classical machine learning models, we demonstrate superior
performance in our proposed models for the prediction of future citation and
CiteScore values.
Keywords: Impact metrics, Predictive modeling, Neural networks, Long
Short-Term Memory, CiteScore
?Funding: This work was supported by the Natural Sciences and Engineering Research
Council of Canada (NSERC) [grant number RGPIN-2016-06253].
Corresponding author
Email addresses: leecroft@cmail.carleton.ca (William L. Croft),
sack@scs.carleton.ca (J¨org-R¨udiger Sack)
Preprint submitted to Journal of Informetrics October 25, 2022
arXiv:2210.12908v1 [cs.DL] 24 Oct 2022
1. Introduction
Academic journals are one of the primary channels through which sci-
entific research is published and disseminated to both research communities
and wider audiences. The performance of such journals is a matter of inter-
est to a variety of stakeholders for diverse reasons. While information on the
past and present performance of journals is crucial in this regard, the ability
to predict future performance can also be highly valuable in the derivation
of complementary information as it lends itself to a deeper understanding of
the state of the journals. The more detail that can be provided on the prob-
able future performance of a journal, the easier it becomes to make decisions
that are impacted by the future state of the journal. There are numerous
situations in which this type of information is of value:
Editorial boards for journals are directly responsible for ensuring a con-
sistent level of quality in the journals they manage. Decisions around
journal management and editorial board composition are made with
the future performance of the journal explicitly in mind.
Indexing services must ensure that only journals of sufficient quality
are accepted for indexing. Knowledge about the future performance
of journals therefore helps to inform decision of whether new journals
should be accepted and whether currently indexed journals should be
removed.
Publishers must monitor the performance of journals (both their own
and those of other publishers) to inform strategic decision-making.
Knowledge about the trajectory of journal performance and its pro-
jection into the future can help to make decisions on the acquisition of
existing journals and the launching of new journals.
Some granting agencies maintain lists of journals categorized by strength.
These lists are updated regularly and early indications of which jour-
nals to move up or down might be helpful for granting agencies and
applicants.
To academic institutions and countries, the quality of relevant journals
may serve as an indicator in the assessment of their scientific output.
Information on the expected performance of journals may help to in-
form decisions on budget allocation.
2
For research groups and authors, information on journal quality is often
factored into the decision on where to submit their work. Due to the
lengthy duration of most review and publication processes, the future
performance of a journal is highly relevant in making this decision.
To librarians building collections of journals, the performance of a jour-
nal may help to assess its merit for selection. Information on future
performance helps in the selection of journals leading to a sustainable
collection.
The information needed to quantitatively measure the performance of a
journal for a future point in time is necessarily incomplete since the events
contributing to any relevant performance metrics have not yet fully occurred.
It is therefore necessary to use predictive modeling to produce this informa-
tion. In this paper, we study the task of predicting the future performance of
journals. To this end, we construct a dataset on the historical performance
of journals and develop predictive models trained on this data.
1.1. Problem Domain
We consider two predictive tasks for the purpose of projecting the future
performance of journals.
Task 1: For a given journal, predict how many citations it will receive in
the next calendar year.
Alone, the number of citations received may not be indicative of journal
performance. However, this value offers great flexibility in its usage as it
can be fed into various calculations or models to assess journal performance.
Simple examples include examination of the predicted value in relation to the
average number of yearly publications of the journal, or plotting the citations
received by the journal during past years with the predicted value used to
extrapolate the plotted series. The number of citations may also be used as
an input feature to more sophisticated models that examine multiple aspects
of a journal.
Task 2: For a given journal, predict its Elsevier CiteScore Elsevier (2020a)
for the next calendar year.
As an impact-based metric, CiteScore values are useful indicators of the
performance of journals. We have chosen CiteScore over other impact-based
3
measures such as Journal Impact Factor due to its transparent and repro-
ducible calculation process. Further information on the strengths and weak-
nesses of various measures of impact is provided in Section 2.1.
While journal performance often does not change greatly from one year to
the next, there is still ample room for improvement over simple heuristic pre-
dictions (e.g., predicting the same value for the next year or a value following
from a linear trend), as later demonstrated in our experimental comparisons
(see Sections 7 and 8). Furthermore, the task of predicting values one year
in advance is an important step towards making longer range predictions.
We formalize the two predictive tasks as regression problems. For each,
we consider the use of various input features on the historical performance
of journals to predict the desired value. We investigate the potential of
deep learning approaches in this context and propose two such models which
achieve good performance.
1.2. Contributions and Paper Outline
The primary contributions of this work are as follows:
We have collected and formed a dataset covering publication metrics
for over 24,000 journals indexed in Scopus during the period of 2000
through 2020 (Section 4).
We have applied four standard machine learning algorithms (Linear
Regression, Decision Tree, Random Forest and k-Nearest Neighbors) to
establish baselines for the predictive tasks we have set out (Subections
5.3 and 6).
We have identified and configured two deep neural network models
appropriate for our predictive tasks, one of which is particularly well-
suited to handling the type of timeseries data we have compiled (Subec-
tions 5.4 and 6).
We have conducted experimental evaluations of our proposed models,
demonstrating improvements over the selected baselines (Section 7).
We begin by providing a review of relevant literature in Section 2. In
Section 3, we provide preliminary background information on concepts used
throughout the paper. We describe the data that we have collected and
the process we have applied to build a dataset from it in Section 4. We then
4
describe the features we have selected for the predictive tasks and the machine
learning models we have identified to handle these tasks in Section 5. We
provide details on the experiments we have performed for the configuration of
the models in Section 6. In Section 7, we detail the experimental comparisons
we have conducted on finalized configurations of the predictive models and
the results we have obtained. We conclude the paper in Section 9.
2. Literature Review
The evaluation of performance in academic journals is dependent upon
the definition of metrics that provide meaningful information on various prop-
erties of journals. Among such metrics, those which examine the impact of
journals are often associated with some indication of the quality of journals.
We provide here a review of some of the most commonly-used measures of
impact and the works that have designed predictive tasks around them. We
additionally provide a brief review of predictive tasks related to paper-level
measures of impact.
2.1. Journal Bibliometrics
The field of bibliometrics pertains to the statistical analysis of publi-
cations and their channels of publication (i.e., sources such as journals)
Manolopoulos & Katsaros (2017); Roldan-Valadez et al. (2019); Garc´ıa-Villar
& Garc´ıa-Santos (2021). In the context of academic journals, bibliometrics
often focuses on measures built around numbers of citations and publications
to assess the quality of journals. While general terms such as performance or
quality are often used when describing the goal of journal-centric bibliomet-
rics, these notions are typically vague and ill-defined. The more precise intent
of many bibliometric measures is to capture scientific impact, generally em-
ploying some form of a ratio of citations received over documents published
Kim Kihong (2018). The notion of an “impact factor” in scientific literature
was first introduced in the context of the proposal by Garfield to develop a
citation index to track the linkages between indexed publications and their
citations Garfield (2006). Impact factor refers to the effect a publication or
a corpus of published literature has on the research community, observed
through the lens of citations attributed to the published work. Although the
measure of impact has a precise meaning that was not originally intended to
be employed as a general reflection of quality, it has found widespread usage
as a proxy for journal quality Prathap (2012).
5
摘要:

PredictingtheCitationCountandCiteScoreofJournalsOneYearinAdvance?WilliamL.Crofta,,Jorg-RudigerSackaaSchoolofComputerScience,CarletonUniversity.Ottawa,Canada.AbstractPredictionofthefutureperformanceofacademicjournalsisataskthatcanbene tavarietyofstakeholdersincludingeditorialsta ,publishers,indexi...

展开>> 收起<<
Predicting the Citation Count and CiteScore of Journals One Year in Advance.pdf

共43页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:43 页 大小:1.21MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 43
客服
关注