Toward improved inference for Krippendors Alpha agreement coecient John Hughes

2025-05-06 0 0 3.79MB 18 页 10玖币

侵权投诉

Toward improved inference for Krippendorﬀ’s Alpha agreement

coeﬃcient

John Hughes

Lehigh University

Bethlehem, PA, USA 18015

October 25, 2022

Abstract

In this article I recommend a better point estimator for Krippendorﬀ’s Alpha agreement coeﬃcient,

and develop a jackknife variance estimator that leads to much better interval estimation than does

the customary bootstrap procedure or an alternative bootstrap procedure. Having developed the new

methodology, I analyze nominal data previously analyzed by Krippendorﬀ, and two experimentally ob-

served datasets: (1) ordinal data from an imaging study of congenital diaphragmatic hernia, and (2)

United States Environmental Protection Agency air pollution data for the Philadelphia, Pennsylvania

area. The latter two applications are novel. The proposed methodology is now supported in version 2.0

of my open source R package, krippendorffsalpha, which supports common and user-deﬁned distance

functions, and can accommodate any number of units, any number of coders, and missingness. Interval

computation can be parallelized.

1 Introduction

Krippendorﬀ’s α(Hayes and Krippendorﬀ, 2007) is a well-known methodology for statistically assessing

agreement. Although αis non-parametric, the customary αestimator is motivated by an estimator of the

intraclass correlation coeﬃcient in the one-way mixed-eﬀects analysis of variance (ANOVA) model (Ravis-

hanker et al., 2021), a much studied fully parametric model. In this article I leverage α’s connection with

the one-way mixed-eﬀects ANOVA model to explore the customary approach to inference for Krippendorﬀ’s

α, ﬁnding that both the point estimator and interval estimation have substantial drawbacks for smaller, yet

realistic, sample sizes. Then I consider a better point estimator; and propose a jackknife variance estimator

(Hinkley, 1977) that yields interval estimates having very nearly their desired coverage rates, even in un-

favorable conditions. I evaluate the various procedures not only formally but also by way of extensive and

realistic simulation studies. Finally, I analyze data previously analyzed by Krippendorﬀ and others, along

with two experimentally observed datasets: (1) radiologist-assigned grades in an imaging study of congenital

diaphragmatic hernia (CDH), and (2) United States Environmental Protection Agency (EPA) PM2.5data

from seven geographically dispersed air sensors in or near Philadelphia, Pennsylvania.

2 Measuring agreement

An inter-coder agreement coeﬃcient—which takes a value in the unit interval, with 0 indicating no agreement

and 1 indicating perfect agreement—is a statistical measure of the extent to which two or more coders agree

regarding the same units of analysis. The agreement problem has a long history and is important in many

ﬁelds of inquiry, and numerous agreement statistics have been proposed.

The earliest agreement coeﬃcients were S(Bennett et al., 1954), π(Scott, 1955), and κ(Cohen, 1960).

Bennett et al. (1954) proposed the Sscore as a measure of the extent to which two methods of communication

provide identical information. Scott (1955) proposed the πcoeﬃcient for measuring agreement between two

coders. Cohen (1960) criticized πand proposed the κcoeﬃcient as an alternative to π—although Smeeton

arXiv:2210.13265v1 [stat.ME] 24 Oct 2022

(1985) noted that Francis Galton mentioned a κ-like statistic in his 1892 book, Finger Prints. Fleiss (1971)

proposed multi-κ, a generalization of Scott’s πfor measuring agreement among more than two coders. Conger

(1980) and Davies and Fleiss (1982) likewise generalized κto the multi-coder setting. Other generalizations

of κ, e.g., weighted κ(Cohen, 1968), have also been proposed. The κcoeﬃcient and its generalizations can

fairly be said to dominate the ﬁeld and are still widely used despite their well-known shortcomings (Feinstein

and Cicchetti, 1990; Cicchetti and Feinstein, 1990).

Other oft-used measures of agreement are Gwet’s AC1and AC2(Gwet, 2008) and Krippendorﬀ’s α

(Hayes and Krippendorﬀ, 2007), the latter of which is the subject of this article. An even newer agreement

methodology is Sklar’s ω(Hughes, 2022), a parametric Gaussian copula-based framework. For more com-

prehensive reviews of the literature on agreement, I refer the interested reader to the article by Banerjee

et al. (1999), the article by Artstein and Poesio (2008), and the book by Gwet (2014).

3 A motivating example

To ﬁx ideas, let us consider an example dataset that was previously analyzed by Krippendorﬀ (2013). The

dataset, which comprises 41 nominal codes assigned to a dozen units of analysis by four coders, is shown

below. The dots represent missing values.

c1c2c3c4

u11 1 •1

u22 2 3 2

u33 3 3 3

u43 3 3 3

u52 2 2 2

u61 2 3 4

u74 4 4 4

u81 1 2 1

u92 2 2 2

u10 •5 5 5

u11 • • 1 1

u12 •3• •

Figure 1: Nominal scores previously analyzed by Krippendorﬀ, for twelve units and four coders. The dots

represent missing values.

Because this dataset is small and the codes are nominal, it is easy to hypothesize by inspection that

agreement is high. Indeed, eight of the units exhibit perfect agreement, and two of the remaining units

exhibit near-perfect agreement. The only unit about which the coders evidently disagreed is unit 6. And of

course the ﬁnal unit carries no information regarding agreement. These facts taken together suggest that an

estimated agreement coeﬃcient for these data should not be too far from 1, unless the estimator in question

is strongly inﬂuenced by the disagreement over unit 6.

Before analyzing these data I should mention that I will interpret results according to the agreement

scale given in Table 1 (Landis and Koch, 1977). Although this scale is well-established, agreement scales

remain a subject of debate (Taber, 2018), and so the following scale—indeed, any agreement scale—should

be applied circumspectly.

Applying the customary Krippendorﬀ’s αmethodology to these data, with the discrete metric d2(x, y) =

1{x6=y}as the distance function, yields point estimate ˆα= 0.743 and 95% conﬁdence interval (0.459,1.000).

This estimate of αindicates substantial agreement, and the interval suggests that these data are consistent

with agreement ranging from fair to perfect. If one repeats the analysis having removed unit 6, the point esti-

mate changes to 0.857, and the interval becomes (0.679,1.000). Thus we see that unit 6 was (perhaps unduly)

inﬂuential since the new results indicate near-perfect agreement (point estimate) and at least substantial

agreement (interval estimate).

Table 1: Guidelines for interpreting values of an agreement coeﬃcient.

Range of Agreement Interpretation

α≤0.2 Slight Agreement

0.2< α ≤0.4 Fair Agreement

0.4< α ≤0.6 Moderate Agreement

0.6< α ≤0.8 Substantial Agreement

α > 0.8 Near-Perfect Agreement

I will return to these data in Section 9, where I will apply my proposed methodology and compare those

results to these.

4 The customary Krippendorﬀ’s αmethodology

Hughes (2021a) showed that Krippendorﬀ’s αﬁnds its origin in the well-known one-way mixed-eﬀects

ANOVA model. In this section I will review Hughes’ demonstration, and then elaborate on it for the

purposes of this article.

4.1 Krippendorﬀ’s αand the one-way mixed-eﬀects ANOVA model

The one-way mixed-eﬀects ANOVA model is given by

Yij =µ+τi+εij ,(i= 1,2, . . . , a) (j= 1,2, . . . , ni)

where

•Yij is the jth score (of niscores) for the ith unit (of aunits of analysis);

•µ∈Ris the population mean score;

•τi

ind

∼Normal(0, σ2

τ) are random unit eﬀects such that σ2

τ≥0;

•εij

ind

∼Normal(0, σ2

ε) are errors such that σ2

>0; and

•the unit eﬀects are independent of the errors.

Since the scores for the ith unit share the unit eﬀect τi, said scores are dependent. Speciﬁcally, for j6=j0,

cov(Yij , Yij0) = σ2

τ, and

α= cor(Yij , Yij0) = σ2

σ2

τ+σ2

.(1)

This correlation among the scores for a given unit is usually called the intraclass correlation coeﬃcient (ICC).

I denote the ICC as ‘α’ precisely because the ICC is the population parameter for Krippendorﬀ’s αwhen

the data conform to the one-way mixed-eﬀects ANOVA model. To reveal this connection it suﬃces to show

that Krippendorﬀ’s estimator, which I denote as ˆα, is an estimator of α.

First, note that αcan be written as

α= 1 −σ2

σ2

τ+σ2

This suggests the estimator

ˆα= 1 −c

σ2

τ+σ2

which we can completely specify by identifying estimators c

σ2

εand \

σ2

τ+σ2

ε. For the one-way mixed-eﬀects

ANOVA model, the customary estimator of the error variance σ2

εis the so called mean squared error (MSE):

σ2

ε=MSE =SSE

N−a=Pa

i=1 Pni

j=1(Yij −¯

Yi•)2

N−a,

where SSE denotes the error sum of squares, N=Piniis the total sample size, and ¯

Yi•is the sample

mean for the ith unit. MSE is both the method of moments (MoM) estimator and the maximum likelihood

estimator of the error variance for the balanced design (i.e., when ni=nfor all i). For the unbalanced

design, MSE is once again the MoM estimator of σ2

ε, but the maximum likelihood estimator of σ2

εis not

available in closed form. For both designs MSE is unbiased for σ2

ε.

Now, to estimate the total variance σ2

τ+σ2

ε, Krippendorﬀ uses

σ2

τ+σ2

ε=MSTc=SSTc

N−1=Pa

i=1 Pni

j=1(Yij −¯

Y•• )2

N−1,

where SSTcdenotes the corrected (for the population mean) total sum of squares and ¯

Y•• denotes the mean

for the entire sample. This estimator seems quite natural given that

EMSTc=N−Pin2

N−1σ2

τ+σ2

ε≈σ2

τ+σ2

ε,

with equality only when σ2

τ= 0 (or ni= 1 for all i, which makes no sense). In any case, we arrive at

Krippendorﬀ’s point estimator:

ˆα= 1 −MSE

MSTc

.(2)

This estimator is the customary estimator for Krippendorﬀ’s αwhen squared Euclidean distance d2(x, y) =

(x−y)2is employed as the measure of discrepancy. Hughes (2021a) showed how this form of ˆαcan give rise

to the non-parametric form of Krippendorﬀ’s α, which is incidentally a modiﬁed multi-response permutation

procedure (Mielke and Berry, 2007). The non-parametric form of αsimply makes d2a parameter whose

value is chosen by the practitioner based on the type of outcomes to be analyzed—e.g., the discrete metric

d2(x, y)=1{x6=y}for nominal observations, distance function d2(x, y) = {(x−y)/(x+y)}2for ratio

observations, etc.

It is important to note that αis an agreement coeﬃcient for all types of outcomes and suitable distance

functions d2. However, αis not a well-deﬁned population parameter for every sensible choice of d2. For

example, when the observations are categorical and the discrete metric is used, ˆαis surely an estimator

of agreement, but the population parameter that ˆαestimates cannot be described precisely. This reminds

one of the gfactor (Warne and Burningham, 2019), a construct that has been deﬁned operationally as that

which is measured by various cognitive tests.

4.2 Bias of the customary point estimator

Note that MSTcis biased downward, and the magnitude of the bias grows as the (average) number of coders

increases (for ﬁxed N). This implies that ˆα, which already has a negative bias, becomes much more biased

as the shape of the data matrix goes from tall to square to short ( → → ). This is shown in Figure 2,

where the simulated outcomes were Gaussian and three balanced designs were used: (1) 16 units and 4

coders, (2) 8 units and 8 coders, and (3) 4 units and 16 coders.

We see that, as the aspect ratio of the data matrix increases from 1/4 to 1 to 4, the percent bias increases

dramatically. For the 16 ×4 data matrix the maximum percent bias is nearly 10%. The maximum percent

bias then increases to approximately 15% and 30% for the square and short matrices, respectively. This

unappealing behavior can be remedied (Section 5).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardimprovedinferenceforKrippendor'sAlphaagreementcoecientJohnHughesLehighUniversityBethlehem,PA,USA18015October25,2022AbstractInthisarticleIrecommendabetterpointestimatorforKrippendor'sAlphaagreementcoecient,anddevelopajackknifevarianceestimatorthatleadstomuchbetterintervalestimationthandoest...

展开>> 收起<<

Toward improved inference for Krippendors Alpha agreement coecient John Hughes.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Toward improved inference for Krippendors Alpha agreement coecient John Hughes

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: