Toward improved inference for Krippendors Alpha agreement coecient John Hughes

2025-05-06 0 0 3.79MB 18 页 10玖币
侵权投诉
Toward improved inference for Krippendorff’s Alpha agreement
coefficient
John Hughes
Lehigh University
Bethlehem, PA, USA 18015
October 25, 2022
Abstract
In this article I recommend a better point estimator for Krippendorff’s Alpha agreement coefficient,
and develop a jackknife variance estimator that leads to much better interval estimation than does
the customary bootstrap procedure or an alternative bootstrap procedure. Having developed the new
methodology, I analyze nominal data previously analyzed by Krippendorff, and two experimentally ob-
served datasets: (1) ordinal data from an imaging study of congenital diaphragmatic hernia, and (2)
United States Environmental Protection Agency air pollution data for the Philadelphia, Pennsylvania
area. The latter two applications are novel. The proposed methodology is now supported in version 2.0
of my open source R package, krippendorffsalpha, which supports common and user-defined distance
functions, and can accommodate any number of units, any number of coders, and missingness. Interval
computation can be parallelized.
1 Introduction
Krippendorff’s α(Hayes and Krippendorff, 2007) is a well-known methodology for statistically assessing
agreement. Although αis non-parametric, the customary αestimator is motivated by an estimator of the
intraclass correlation coefficient in the one-way mixed-effects analysis of variance (ANOVA) model (Ravis-
hanker et al., 2021), a much studied fully parametric model. In this article I leverage α’s connection with
the one-way mixed-effects ANOVA model to explore the customary approach to inference for Krippendorff’s
α, finding that both the point estimator and interval estimation have substantial drawbacks for smaller, yet
realistic, sample sizes. Then I consider a better point estimator; and propose a jackknife variance estimator
(Hinkley, 1977) that yields interval estimates having very nearly their desired coverage rates, even in un-
favorable conditions. I evaluate the various procedures not only formally but also by way of extensive and
realistic simulation studies. Finally, I analyze data previously analyzed by Krippendorff and others, along
with two experimentally observed datasets: (1) radiologist-assigned grades in an imaging study of congenital
diaphragmatic hernia (CDH), and (2) United States Environmental Protection Agency (EPA) PM2.5data
from seven geographically dispersed air sensors in or near Philadelphia, Pennsylvania.
2 Measuring agreement
An inter-coder agreement coefficient—which takes a value in the unit interval, with 0 indicating no agreement
and 1 indicating perfect agreement—is a statistical measure of the extent to which two or more coders agree
regarding the same units of analysis. The agreement problem has a long history and is important in many
fields of inquiry, and numerous agreement statistics have been proposed.
The earliest agreement coefficients were S(Bennett et al., 1954), π(Scott, 1955), and κ(Cohen, 1960).
Bennett et al. (1954) proposed the Sscore as a measure of the extent to which two methods of communication
provide identical information. Scott (1955) proposed the πcoefficient for measuring agreement between two
coders. Cohen (1960) criticized πand proposed the κcoefficient as an alternative to π—although Smeeton
1
arXiv:2210.13265v1 [stat.ME] 24 Oct 2022
(1985) noted that Francis Galton mentioned a κ-like statistic in his 1892 book, Finger Prints. Fleiss (1971)
proposed multi-κ, a generalization of Scott’s πfor measuring agreement among more than two coders. Conger
(1980) and Davies and Fleiss (1982) likewise generalized κto the multi-coder setting. Other generalizations
of κ, e.g., weighted κ(Cohen, 1968), have also been proposed. The κcoefficient and its generalizations can
fairly be said to dominate the field and are still widely used despite their well-known shortcomings (Feinstein
and Cicchetti, 1990; Cicchetti and Feinstein, 1990).
Other oft-used measures of agreement are Gwet’s AC1and AC2(Gwet, 2008) and Krippendorff’s α
(Hayes and Krippendorff, 2007), the latter of which is the subject of this article. An even newer agreement
methodology is Sklar’s ω(Hughes, 2022), a parametric Gaussian copula-based framework. For more com-
prehensive reviews of the literature on agreement, I refer the interested reader to the article by Banerjee
et al. (1999), the article by Artstein and Poesio (2008), and the book by Gwet (2014).
3 A motivating example
To fix ideas, let us consider an example dataset that was previously analyzed by Krippendorff (2013). The
dataset, which comprises 41 nominal codes assigned to a dozen units of analysis by four coders, is shown
below. The dots represent missing values.
c1c2c3c4
u11 1 1
u22 2 3 2
u33 3 3 3
u43 3 3 3
u52 2 2 2
u61 2 3 4
u74 4 4 4
u81 1 2 1
u92 2 2 2
u10 5 5 5
u11 1 1
u12 3• •
Figure 1: Nominal scores previously analyzed by Krippendorff, for twelve units and four coders. The dots
represent missing values.
Because this dataset is small and the codes are nominal, it is easy to hypothesize by inspection that
agreement is high. Indeed, eight of the units exhibit perfect agreement, and two of the remaining units
exhibit near-perfect agreement. The only unit about which the coders evidently disagreed is unit 6. And of
course the final unit carries no information regarding agreement. These facts taken together suggest that an
estimated agreement coefficient for these data should not be too far from 1, unless the estimator in question
is strongly influenced by the disagreement over unit 6.
Before analyzing these data I should mention that I will interpret results according to the agreement
scale given in Table 1 (Landis and Koch, 1977). Although this scale is well-established, agreement scales
remain a subject of debate (Taber, 2018), and so the following scale—indeed, any agreement scale—should
be applied circumspectly.
Applying the customary Krippendorff’s αmethodology to these data, with the discrete metric d2(x, y) =
1{x6=y}as the distance function, yields point estimate ˆα= 0.743 and 95% confidence interval (0.459,1.000).
This estimate of αindicates substantial agreement, and the interval suggests that these data are consistent
with agreement ranging from fair to perfect. If one repeats the analysis having removed unit 6, the point esti-
mate changes to 0.857, and the interval becomes (0.679,1.000). Thus we see that unit 6 was (perhaps unduly)
influential since the new results indicate near-perfect agreement (point estimate) and at least substantial
agreement (interval estimate).
2
Table 1: Guidelines for interpreting values of an agreement coefficient.
Range of Agreement Interpretation
α0.2 Slight Agreement
0.2< α 0.4 Fair Agreement
0.4< α 0.6 Moderate Agreement
0.6< α 0.8 Substantial Agreement
α > 0.8 Near-Perfect Agreement
I will return to these data in Section 9, where I will apply my proposed methodology and compare those
results to these.
4 The customary Krippendorff’s αmethodology
Hughes (2021a) showed that Krippendorff’s αfinds its origin in the well-known one-way mixed-effects
ANOVA model. In this section I will review Hughes’ demonstration, and then elaborate on it for the
purposes of this article.
4.1 Krippendorff’s αand the one-way mixed-effects ANOVA model
The one-way mixed-effects ANOVA model is given by
Yij =µ+τi+εij ,(i= 1,2, . . . , a) (j= 1,2, . . . , ni)
where
Yij is the jth score (of niscores) for the ith unit (of aunits of analysis);
µRis the population mean score;
τi
ind
Normal(0, σ2
τ) are random unit effects such that σ2
τ0;
εij
ind
Normal(0, σ2
ε) are errors such that σ2
>0; and
the unit effects are independent of the errors.
Since the scores for the ith unit share the unit effect τi, said scores are dependent. Specifically, for j6=j0,
cov(Yij , Yij0) = σ2
τ, and
α= cor(Yij , Yij0) = σ2
τ
σ2
τ+σ2
ε
.(1)
This correlation among the scores for a given unit is usually called the intraclass correlation coefficient (ICC).
I denote the ICC as ‘α’ precisely because the ICC is the population parameter for Krippendorff’s αwhen
the data conform to the one-way mixed-effects ANOVA model. To reveal this connection it suffices to show
that Krippendorff’s estimator, which I denote as ˆα, is an estimator of α.
First, note that αcan be written as
α= 1 σ2
ε
σ2
τ+σ2
ε
.
This suggests the estimator
ˆα= 1 c
σ2
ε
\
σ2
τ+σ2
ε
,
3
which we can completely specify by identifying estimators c
σ2
εand \
σ2
τ+σ2
ε. For the one-way mixed-effects
ANOVA model, the customary estimator of the error variance σ2
εis the so called mean squared error (MSE):
c
σ2
ε=MSE =SSE
Na=Pa
i=1 Pni
j=1(Yij ¯
Yi)2
Na,
where SSE denotes the error sum of squares, N=Piniis the total sample size, and ¯
Yiis the sample
mean for the ith unit. MSE is both the method of moments (MoM) estimator and the maximum likelihood
estimator of the error variance for the balanced design (i.e., when ni=nfor all i). For the unbalanced
design, MSE is once again the MoM estimator of σ2
ε, but the maximum likelihood estimator of σ2
εis not
available in closed form. For both designs MSE is unbiased for σ2
ε.
Now, to estimate the total variance σ2
τ+σ2
ε, Krippendorff uses
\
σ2
τ+σ2
ε=MSTc=SSTc
N1=Pa
i=1 Pni
j=1(Yij ¯
Y•• )2
N1,
where SSTcdenotes the corrected (for the population mean) total sum of squares and ¯
Y•• denotes the mean
for the entire sample. This estimator seems quite natural given that
EMSTc=NPin2
i
N
N1σ2
τ+σ2
εσ2
τ+σ2
ε,
with equality only when σ2
τ= 0 (or ni= 1 for all i, which makes no sense). In any case, we arrive at
Krippendorff’s point estimator:
ˆα= 1 MSE
MSTc
.(2)
This estimator is the customary estimator for Krippendorff’s αwhen squared Euclidean distance d2(x, y) =
(xy)2is employed as the measure of discrepancy. Hughes (2021a) showed how this form of ˆαcan give rise
to the non-parametric form of Krippendorff’s α, which is incidentally a modified multi-response permutation
procedure (Mielke and Berry, 2007). The non-parametric form of αsimply makes d2a parameter whose
value is chosen by the practitioner based on the type of outcomes to be analyzed—e.g., the discrete metric
d2(x, y)=1{x6=y}for nominal observations, distance function d2(x, y) = {(xy)/(x+y)}2for ratio
observations, etc.
It is important to note that αis an agreement coefficient for all types of outcomes and suitable distance
functions d2. However, αis not a well-defined population parameter for every sensible choice of d2. For
example, when the observations are categorical and the discrete metric is used, ˆαis surely an estimator
of agreement, but the population parameter that ˆαestimates cannot be described precisely. This reminds
one of the gfactor (Warne and Burningham, 2019), a construct that has been defined operationally as that
which is measured by various cognitive tests.
4.2 Bias of the customary point estimator
Note that MSTcis biased downward, and the magnitude of the bias grows as the (average) number of coders
increases (for fixed N). This implies that ˆα, which already has a negative bias, becomes much more biased
as the shape of the data matrix goes from tall to square to short ( → → ). This is shown in Figure 2,
where the simulated outcomes were Gaussian and three balanced designs were used: (1) 16 units and 4
coders, (2) 8 units and 8 coders, and (3) 4 units and 16 coders.
We see that, as the aspect ratio of the data matrix increases from 1/4 to 1 to 4, the percent bias increases
dramatically. For the 16 ×4 data matrix the maximum percent bias is nearly 10%. The maximum percent
bias then increases to approximately 15% and 30% for the square and short matrices, respectively. This
unappealing behavior can be remedied (Section 5).
4
摘要:

TowardimprovedinferenceforKrippendor 'sAlphaagreementcoecientJohnHughesLehighUniversityBethlehem,PA,USA18015October25,2022AbstractInthisarticleIrecommendabetterpointestimatorforKrippendor 'sAlphaagreementcoecient,anddevelopajackknifevarianceestimatorthatleadstomuchbetterintervalestimationthandoest...

展开>> 收起<<
Toward improved inference for Krippendors Alpha agreement coecient John Hughes.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:3.79MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注