1 How reliable are unsupervised author disambiguation algorithms in assessment of research organization performance Giovanni Abramo 1 Ciriaco Andrea DAngelo2

2025-04-28 0 0 496.37KB 22 页 10玖币
侵权投诉
1
How reliable are unsupervised author disambiguation algorithms in assessment of
research organization performance?
Giovanni Abramo,1 Ciriaco Andrea D’Angelo2
1 giovanni.abramo@iasi.cnr.it
Laboratory for Studies in Research Evaluation, Institute for System Analysis and
Computer Science (IASI-CNR), National Research Council of Italy (Italy)
ORCID: 0000-0003-0731-3635
2 dangelo@dii.uniroma2.it
Department of Engineering and Management, University of RomeTor Vergata(Italy)
&
Laboratory for Studies in Research Evaluation, Institute for System Analysis and
Computer Science (IASI-CNR), National Research Council of Italy (Italy)
ORCID: 0000-0002-6977-6611
Abstract
The paper examines extent of bias in the performance rankings of research organisations
when the assessments are based on unsupervised author-name disambiguation algorithms.
It compares the outcomes of a research performance evaluation exercise of Italian
universities using the unsupervised approach by Caron and van Eck (2014) for derivation
of the universities’ research staff, with those of a benchmark using the supervised
algorithm of D'Angelo, Giuffrida, and Abramo (2011), which avails of input data. The
methodology developed could be replicated for comparative analyses in other
frameworks of national or international interest, meaning that practitioners would have a
precise measure of the extent of distortions inherent in any evaluation exercises using
unsupervised algorithms. This could in turn be useful in informing policy-makers
decisions on whether to invest in building national research staff databases, instead of
settling for the unsupervised approaches with their measurement biases.
Keywords
Research assessment; evaluative scientometrics; author name disambiguation; FSS;
ORP; CWTS; universities; Italy.
Acknowledgement
We are indebted to the Centre for Science and Technology Studies (CWTS) at Leiden
University for providing us with access to the in-house WoS database from which we
extracted data at the basis of our elaborations.
2
1. Introduction
The tools of performance assessment play a fundamental role in the strategic planning
and analysis of national and regional research systems, member organizations and
individuals. At the level of research organizations, assessment serves in identifying fields
of strength and weakness, which in turn inform competitive strategies, organizational
restructuring, resource allocation, and individual incentive systems. For regions and
countries, knowledge of strengths and weaknesses relative to others, and also the
comparative performances of one’s own research institutions, enables formulation of
informed research policies and selective allocation of public funding across fields and
institutions. By assessing performance before versus after, institutions and governments
can evaluate the effectiveness of their strategic actions and implementation of policy. The
communication of the results from research assessment exercises, applied at any level,
stimulates the assessed subjects towards continuous improvement. Such assessments also
serve in reducing information asymmetries between the suppliers (researchers,
institutions, territories) and the end users of research (companies, students, investors). At
the macro-economic level, this yields twofold beneficial results, resulting in a virtuous
circle: i) in selecting research suppliers, users can make more effective choices; and ii)
suppliers, aiming to attract more users, will be stimulated to improve their research
production. The reduction of asymmetric information is also beneficial within the
scientific communities themselves, particularly in the face of the increasing challenges of
complex interdisciplinary research, by lowering obstacles among prospective partners as
they seek to identify others suited for inclusion in team-building.
Over recent years, the stakeholders of research systems have demanded more timely
assessment, capable of informing in ever more precise, reliable and robust manner.
Bibliometrics, and in particular evaluative bibliometrics, has the great advantage of
enabling large-scale research evaluations with levels of accuracy, costs and timescales far
more advantageous than traditional peer-review (Abramo, D'Angelo, & Reale, 2019), as
well as possibilities for informing small-scale peer-review evaluations. For years, in view
of the needs expressed by policy makers, research managers and stakeholders in general,
scholars have continuously improved the indicators and methods of evaluative
bibliometrics. In our opinion, however, the factor holding us back from a great leap
forward is the lack of input data, which in almost all nations has been very difficult to
assemble.
In all production systems, the comparative performance of any unit is always given
by the ratio of outputs to inputs. In the case of research systems, the inputs or production
factors consist basically of labor (the researchers) and capital (all resources other than
labor, e.g. equipment, facilities, databases, etc.). For any research unit, therefore,
comparison to another demands that we are informed of the component researchers, and
the resources they draw on for conducting their research. In addition, bias in results would
occur unless also informed of the prevailing research discipline of each researcher, since
output is in part a function of discipline (Sorzano, Vargas, Caffarena-Fernández, & Iriarte,
2014; Piro, Aksnes & Rørstad, 2013; Lillquist & Green, 2010; Sandström & Sandström,
2009; Iglesias & Pecharromán, 2007; Zitt, Ramanana-Rahari, & Bassecoulard, 2005):
scholars of blood diseases, for example, publish an average of about five times as much
as scholars of legal medicine (D'Angelo & Abramo, 2015). Finally, the measure of the
researcher's contribution to each scientific output should also take into account the
number of co-authors, and in some cases their position in the author list (Waltman & Van
3
Eck, 2015; Abramo, D'Angelo, & Rosati, 2013; Aksnes, Schneider, & Gunnarsson, 2012;
Huang, Lin, & Chen, 2011; Gauffriau & Larsen, 2005; van Hooydonk, 1997; Rinia, De
Lange, & Moed, 1993).
Yet for many years, regardless of all these requirements, organizations have regularly
published research institution performance rankings that are coauthor-, size- and field-
dependent, among which the most renowned would be the Academic Ranking of World
Universities (ARWU),1 issued by Shanghai Jiao Tong University, the Times Higher
Education World University Rankings,2 and the QS World University Rankings.3 Despite
the strong distortions in these rankings, many decision-makers persist in giving them
serious credit. One of the most recent gestures of the sort came in May 2022, when the
British government, intending well for those seeking immigration but without a job offer,
offered early-career High Potential Individuals” the possibility of a visa, subject to
graduation with the past five years from an eligible university: meaning any university
placing near top of the above – highly distorted – rankings.4
To get around the obstacle of missing input data, some bibliometricians have seen a
solution in the so-called “size-independent” indicators of research performance - among
these the mean normalized citation score (MNCS), proposed by the CWTS of the
University of Leiden (Waltman et al., 2011; Moed, 2010). As we have pointed out,
however, these indicators have strong limitations (Abramo & D'Angelo, 2016a, 2016b),
and result in performance scores and ranks that are different from those obtained using
other indicators, such as FSS (fractional scientific strength),5 which do account for
inputs, albeit with certain unavoidable assumptions. But the FSS indicator has thus far
been applied in only two countries, both with advantages of government records on
inputs: extensively, in Italy, for the evaluation of performance at the level of individuals
(Abramo & D'Angelo, 2011) and then aggregated at the levels of research field and
university (Abramo, D'Angelo, & Di Costa, 2011), and to a lesser extent in Norway, with
additional assumptions (Abramo, Aksnes, & D'Angelo, 2020).
For policy-makers and administrators, but also all interested others, the question then
becomes: "in demanding and/or using large-scale assessments of the positioning of
research institution performance, what margin of error is acceptable in the measure of
their scores and ranks?" To give an idea of the potential margins of error: a comparison
of research-performance scores and ranks of Italian universities by MNCS and FSS
revealed that 48.4% of universities shifted quartiles under these two indicators, and that
31.3% of universities in the top quartile by FSS fell into lower quartiles by MNCS
(Abramo & D'Angelo, 2016c).
Italy is an almost completely unique case in the provision of the data on research staff
at universities, as necessary for institutional performance evaluation. Here, at the close of
each year, the Ministry of University and Research (MUR) updates a database of all
university faculty members, listing the first and last names of each researcher, their
gender, institutional affiliation, field classification and academic rank.6 The Norwegian
1 https://www.shanghairanking.com/rankings/arwu (last accessed 20/06/2022).
2 https://www.timeshighereducation.com/world-university-rankings (last accessed 20/06/2022).
3 https://www.topuniversities.com/university-rankings/world-university-rankings/2022 (last accessed
20/06/2022).
4 https://www.gov.uk/government/publications/high-potential-individual-visa-global-universities-list.
5 A thorough explanation of the theory and assumptions underlying FSS can be found in Abramo and
D’Angelo (2014), and in the more recent Abramo, Aksnes, and D’Angelo (2020).
6 http://cercauniversita.cineca.it/php5/docenti/cerca.php, last accessed on 20/06/2022.
4
Research Personnel Register also offers a useful database of statistics,7 including notation
of the capital cost of research per man-year aggregated at area level, based on regular
reports from the institutions to the Nordic Institute for Studies in Innovation, Research
and Education (NIFU).8
The challenge facing practitioners is then how to apply output-input indicators of
research performance aligned with microeconomic theory of production (like FSS), in all
those countries where databases of personnel are not maintained. One possibility is to
trace the research personnel of the institutions indirectly, through their publications, using
bibliographic repertories such as Scopus or Web of Science (WoS), and referring
exclusively to bibliometric metadata, apply algorithms for disambiguation of authors'
names and reconciling of the institutions' names.
Computer scientists and bibliometricians have developed several unsupervised
algorithms for disambiguation, at national and international levels (Rose & Kitchin, 2019;
Backes, 2018; Hussain & Asghar, 2018; Zhu et al., 2017; Liu et al., 2015; Caron & van
Eck, 2014; Schulz et al., 2014; Wu, Li, Pei, & He, 2014; Wu & Ding, 2013; Cota,
Gonçalves, and Laender, 2007). The term “unsupervised” signifies that the algorithms
operate without manually labelled data, instead approaching the author-name
disambiguation problem as a clustering task, where each cluster would contain all the
publications written by a specific author. Tekles & Bornmann (2020), using a large
validation set containing more than one million author mentions, each annotated with a
Researcher ID (an identifier maintained by the researchers), compared a set of such
unsupervised disambiguation approaches. The best performing algorithm resulted as the
one by Caron and van Eck (2014), hereinafter “CvE”. As discussed above, however, the
conduct of performance comparisons at organizational level requires more than just
precision in unambiguously attributing publications to each author. At that point we also
need precise identification of the research staff of each organization,9 the fields of
research, etc. And so if the aim is to apply bibliometrics for the comparative evaluation
of organizational research performance, the goodness of the algorithms should be
assessed on the basis of the precision with which they actually enable measurement of
such performance.
To answer the research question, we therefore compare measures of the research
performance of universities in the Italian academic system, which arise from the
application of the previously conformed CvE unsupervised algorithm, with those arising
from the use of the supervised algorithm by D'Angelo, Giuffrida, & Abramo (2011),
hereinafter “DGA”. Over more than a decade, this algorithm has been applied by the
authors for feeding and continuous updating of the Public Research Observatory (ORP)
of Italy, a database derived under license from Clarivate Analytics' WoS Core Collection.
It indexes the scientific production of Italian academics at individual level, achieving 97%
harmonic average of precision and recall (F-measure),10 thanks to the operation of the
DGA algorithm, which avails of a series of “certain” metadata available in the MUR
database on university personnel, including their institutional affiliation, academic rank,
7 https://www.nifu.no/en/statistics-indicators/4897-2/, last accessed on 20/06/2022
8 http://www.foustatistikkbanken.no/nifu/?language=en, last accessed on 20/06/2022.
9 The affiliation in the byline, in some cases multiple, is not always reliable to unequivocally identify the
organisation to which the author belongs.
10 The most frequently used indicators to measure the reliability of bibliometric datasets are precision and
recall, which originate from the field of information retrieval (Hjørland, 2010). Precision is the fraction of
retrieved instances that are relevant while recall is the fraction of relevant instances that are retrieved.
5
years of tenure, field of research, and gender (for details see D'Angelo, Giuffrida, &
Abramo, 2011),
Given the maturity of the ORP, developed and refined year by year through the
manual correction of the rare false cases, it can be considered a reliable benchmark against
which to measure the deviations referable to an evaluation conducted using CvE. The
deviations, as we shall see in some detail, are attributable to causes further than simply
its lesser abilities in correctly disambiguating authorship. The aim of our work, however,
is not to criticize CvE, but to give bibliometricians, practitioners, and especially decision
makers, an idea of the extent of distortions in the research performance ranks of research
institutions at overall and area level when forced to use unsupervised algorithms of this
kind, rather than supervised ones based on research staff databases, such as DGA.
The paper is organized as follows: Section 2 presents the methodology and describes
the data and methods used. In Section 3 we show the results of the analysis. Section 4
concludes, summarizing and also commenting the results, particularly for practitioners
and scholars who may wish to replicate the exercise in other geographical and institutional
frameworks.
2. Data and methods
The assessment of the comparative research performance of an organization cannot
proceed without survey of the scientific activity of its individual researchers, since
evaluations that operate directly at an aggregate level, without accounting for the sectoral
distribution of input, produce results with unacceptable error (Abramo & D'Angelo,
2011). Analyses at micro level, however, presuppose precise knowledge of the research
staff of the organization, as well as for all “competitororganizations eligible for
comparative evaluation. As explained above, the current study aims to compare the
outcomes of the evaluation of research performance by Italian universities, based on two
different bibliometric datasets:
The first one, hereinafter "ORP", relying on the DGA supervised heuristic
approach, which "integrates" the Italian National Citation Report (indexing all
WoS articles by those authors who indicated Italy as affiliation country), with data
retrieved from the database maintained by the MUR,11 indexing the full name,
academic rank, research field and institutional affiliation of all researchers at
Italian universities, at the close of each year.
The second one, hereinafter “CWTS, relying on the CvE unsupervised approach,
a rule-based scoring and oeuvre identification method for disambiguation of
authors used for the WoS in-house database of the Centre for Science and
Technology Studies (CWTS) at Leiden University.
Much fuller descriptions of the DGA and CvE approaches can be found in D’Angelo
and van Eck (2020).
In ORP:
a priori, the availability of MUR data allows precise knowledge of the members
of research staff of national universities;
the census of their scientific production is then carried out by applying the DGA
algorithm to the Italian WoS publications.
11 http://cercauniversita.cineca.it/php5/docenti/cerca.php, last accessed 20/06/2022.
摘要:

1Howreliableareunsupervisedauthordisambiguationalgorithmsinassessmentofresearchorganizationperformance?GiovanniAbramo,1CiriacoAndreaD’Angelo21giovanni.abramo@iasi.cnr.itLaboratoryforStudiesinResearchEvaluation,InstituteforSystemAnalysisandComputerScience(IASI-CNR),NationalResearchCouncilofItaly(Ital...

展开>> 收起<<
1 How reliable are unsupervised author disambiguation algorithms in assessment of research organization performance Giovanni Abramo 1 Ciriaco Andrea DAngelo2.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:496.37KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注