Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

2025-05-06 0 0 431.66KB 22 页 10玖币

侵权投诉

Adjusting for informative cluster size in pseudo-value

based regression approaches with clustered time to event

data

Samuel Anyaso-Samuel, Somnath Datta

Department of Biostatistics, University of Florida

Abstract

Informative cluster size (ICS) arises in situations with clustered data where a latent

relationship exists between the number of participants in a cluster and the outcome mea-

sures. Although this phenomenon has been sporadically reported in statistical literature

for nearly two decades now, further exploration is needed in certain statistical methodolo-

gies to avoid potentially misleading inferences. For inference about population quantities

without covariates, inverse cluster size reweightings are often employed to adjust for ICS.

Further, to study the eﬀect of covariates on disease progression described by a multi-

state model, the pseudo-value regression technique has gained popularity in time-to-event

data analysis. We seek to answer the question: ”How to apply pseudo-value regression

to clustered time-to-event data when cluster size is informative?” ICS adjustment by the

reweighting method can be performed in two steps; estimation of marginal functions of the

multistate model and ﬁtting the estimating equations based on pseudo-value responses,

leading to four possible strategies. We present theoretical arguments and thorough sim-

ulation experiments to ascertain the correct strategy for adjusting for ICS. A further

extension of our methodology is implemented to include informativeness induced by the

intra-cluster group size. We demonstrate the methods in two real-world applications:

(i) to determine predictors of tooth survival in a periodontal study, and (ii) to iden-

tify indicators of ambulatory recovery in spinal cord injury patients who participated in

locomotor-training rehabilitation.

1 Introduction

Researchers often encounter complex time-to-event data that characterize disease progression

in biomedical studies. Multistate models (which are a form of multivariate survival models)

are traditional statistical tools for describing the transitions and state occupation of patients

with such event history data. The multistate system could be subject to right-censoring and/or

left-truncation. Usually, additional covariate information is also available, and researchers may

be equally interested in studying the eﬀects of these covariates on the probability of occupying

a disease state at a given time point. Although traditional methods such as the Cox model may

be employed for such analysis, the pseudo-value regression1approach that directly models the

covariate eﬀects on marginal temporal functions of a multistate process is a robust alternative.

Moreover, the event history data may be correlated among subjects belonging to distinct

clusters, and the cluster size could be potentially informative. This is fairly common in dental

arXiv:2210.13410v2 [stat.ME] 23 Mar 2023

studies when data are collected on all available teeth, where the teeth in a patient’s (clus-

ter) mouth are correlated due to shared behavioral and genetic patterns, and the number of

surviving or available teeth is indicative of the patient’s oral health. Several methodological

papers2–6 have shown that failure to adjust for informative cluster size (ICS) leads to poten-

tially invalid results. This has not been widely recognized in medical research. Moreover, for

population-level inference, these papers propose inverse cluster size reweightings to adjust for

ICS. Bakoyannis and colleagues7,8 (and the references stated therein) provide a recent review of

the analysis of cluster-correlated multistate models. Using the inverse cluster size reweighting,

they proposed nonparametric marginal estimators7and two-sample tests7,8 for the transition

and state occupation probabilities that adjusts for ICS. However, their methods is inapplicable

when inference about the association of a set of covariates and the transition outcomes is the

object of the analysis. In this paper, we give novel insights into the problem of ICS, especially,

in the context of pseudo-value regression for cluster-correlated multistate models. This in turn

provides novel extensions of the pseudo-value regression in clustered data with ICS (and also

informative intra-cluster group (ICG) size).

The regression analysis proceeds by obtaining jackknife pseudo-values that are based on

nonparametric estimates of a marginal function of a multistate process. Subsequently, the

pseudo-values are used as the responses in a generalized estimating equation (GEE) to conduct

inference based on available covariates. For noninformative cluster sizes, the pseudo-value

regression9was extended to model the cumulative incidence of transplant-related mortality as

a function of patient and transplant characteristics. Adjusting for ICS in the context of the

pseudo-value regression is generally not so obvious since such adjustments should be considered

in two steps: (i) estimation of the marginal function and (ii) estimating equations for the

regression analysis. This leads to four possible strategies for ICS adjustment - no reweightings

in either step, reweighting in the ﬁrst step alone, reweighting in the second step alone, and

reweightings in both steps. We study the appropriate strategies for ICS adjustments leading to

accurate methods for the regression analysis. We provide theoretical arguments and simulation

studies to justify these methods.

To illustrate the proposed methods, we applied them to two real-world datasets. First, we

analyze the periodontal data studied by McGuire and Nunn10. The objective of the analysis

was to model tooth survival as a function of clinical and behavioral factors. However, a simple

Kaplan-Meier calculation11 showed that patients with more teeth at baseline have a higher

tooth survival probability, indicating potential ICS.

Next, we analyze a data set from a multicenter study of spinal cord injury (SCI) patients

receiving activity-based rehabilitation12. The study aimed at identifying prognostic factors

of ambulatory recovery. Based on their walking speed, which is evaluated periodically after

receiving standardized therapy sessions, the event history of the patients is described by a

progressive illness-death model. A potential reason for the informativeness of the cluster size

in this study stems from the recruitment of patients with worse prognoses in several centers.

The rest of the paper is organized as follows. In Section 2, we present a nonparamet-

ric marginal estimator of a temporal function for clustered multistate data when the cluster

size is informative. Here, the temporal function of interest is the state occupation probability

(SOP). The pseudo-value regression is discussed in Section 3. Detailed simulation exercises are

presented in Section 4. We illustrate the proposed methods using the real-world applications

in Section 5. In Section 6, we consider an extension of our approach to a multilevel design.

Additional results are placed in the web-based supplementary material showing that our recom-

mended strategy works in more complex informative cluster size settings based on group sizes

within clusters as well. The main body of the paper ends with a discussion in Section 7. The

appendix contains a theoretical investigation into the matter of ICS adjustment for clustered

pseudo-value regression where a linear model structure is assumed to simplify calculations and

get the main message across.

2 Non-parametric estimators for the marginal state oc-

cupation probabilities

2.1 Notation

Consider the stochastic process X(t)∈ S with a ﬁnite set of states S={1, ..., Q}for a general

multistate model, where X(t) corresponds to the state a patient is currently in at time t≥0. For

continuous time, we assume the data may be subject to right censoring and/or left-truncation.

For i= 1, ..., n and positive integer k, let T∗

ik indicate the time for the kth transition for the ith

individual where T∗

i0≡0 and T∗

ik =∞if the ith individual enters an absorbing state before the

kth transition is made. Let Ciand Lirespectively denote the right-censoring and left truncation

time, that are independent of X(t). Let T∗

i= supk{T∗

ik :T∗

ik <∞} be the time for the last

transition for the ith individual.

2.2 Marginal estimators for independent data

The transition probability matrix P(s, t) gives the probability of transitioning between every

state in S; the ``0th element of P(s, t) is given by P``0(s, t) = Pr{X(t) = `0|X(s) = `}for

s<t. Primarily, we are interested in estimating and modeling the SOPs which are given by

π`(t) = Pr{X(t) = `}=P`0∈S π`0(0)P`0`(0, t),where π`0(0) are the initial state occupation

probabilities.

In a right-censored and left-truncated experiment, the transition counts of patients moving

from state `to `0in the time interval [0, t], denoted by N``0(t), and the number of individuals

at risk of transitioning out of state `at time t, denoted by M`(t) are crucial to developing

estimators for the multistate parameters. These are given as follows

N``0(t) =

i=1 X

k≥1

I{T∗

ik ≤t, Ci≥T∗

ik, Li< t, X(T∗

ik) = `, X(T∗

ik−1) = `0}/K(T∗

ik−) (1)

and

M`(t) =

i=1 X

k≥1

I{T∗

ik−1< t ≤T∗

ik, Ci≥t, Li< t, X(T∗

ik) = `}/K(t−) (2)

where I(·) is the indicator function and K(·) is the Kaplan-Meier estimator of the survival

function of the censoring variable (such that Ci’s are the failure times that are right-censored by

T∗

i). A careful examination will show that N``0(t) and M`(t) are calculable based on observed

data (that is, right-censored and/or left-truncated data). The Nelson-Aalen estimator of the

cumulative transition intensity matrix denoted by b

Ais a necessary quantity for obtaining the

estimator of the SOP. The elements of b

Aare given by

A``0(t) = 









IM`(u)>0M`(u)−1dN``0(u)`6=`0,

−X

`06=`b

A``0(t)`=`0,(3)

Further, the Aalen-Johansen estimator of the transition probability matrix is obtained via the

product integration of b

A(t), thus,

P(s, t) = Y

(s,t]

{I+ d b

A(u)}.(4)

where Idenotes the Q×Qidentity matrix. Then, the estimator of the SOP is given by

bπ`(t) =

`0=1 bπ`0(0) b

P`0`(0, t),(5)

where bπ`0(0) is essentially the initial proportion of individuals in state `0. Andersen et al.13 give

a detailed description of the development of these estimators.

Historically, the consistency and large-sample properties of the estimators given above were

established under the assumption that the transitions between states were Markov13. However,

the estimator of the SOP given by (5) remains valid even when the process is non-Markovian14.

2.3 Marginal estimators for clustered data when cluster size is in-

formative

Consider the situation where the event history of a cluster of patients is described by a multistate

model and the event history of patients belonging to the same cluster are correlated while the

processes for patients from separate clusters are independent. Suppose the data consist of m

clusters indexed by iwith j= 1, .., nipatients in cluster iand total sample size n=Pm

i=1 ni.

Also, consider the situation where the cluster size is potentially informative, that is, there

exists a latent relationship between the cluster sizes and observed transition outcomes. Under

such a scenario, the cluster size niis a random variable. Recently, Bakoyannis7proposed

a weighted nonparametric estimator of the SOP that accounts for ICS and established the

large sample property using empirical process theory. Inference around this estimator is based

on a population formed by randomly selecting a cluster unit from a randomly selected cluster.

Following the notation presented in Section 2.2, we present the weighted estimator of the SOPs.

Let XIJ (t)∈ S denote the stochastic process for a typical patient indexed by Jthat belongs

to a typical cluster indexed by I, where I∼ U{1, m}, and given I=i,J∼ U{1, ni};Udenotes

the discrete uniform distribution. Primarily, our interest lies in carrying out covariate inference

on the SOP for a randomly chosen patient in a randomly selected cluster. We deﬁne the

marginal SOPs by taking the expectation over all Vi={ni,Xi1,Xi2, ..., Xini}, which gives

π`(t) = E[I{XIJ (t) = `}] = E[1

mPm

i=1

niPni

j=1 I{Xij (t) = `}], t ≥0.This expression deﬁnes

the probability of patient jin cluster ibeing in state `at time t, under the assumption that

the processes Vi, i = 1, ..., m, are IID and the Xij within a distinct cluster iare exchangeable

given ni15.

Using the convention in Section 2.1, let T∗

ijk denote the time for the kth transition for the

jth subject in cluster i, let Cij and Lij respectively denote the right-censoring and left trunca-

tion time that are independent of the multistate process and let T∗

ij = supk{T∗

ijk :T∗

ijk <∞}.

Following 1 and 2, N``0(t) and M`(t) are respectively expressed by a weighted sum over all

individuals (1 ≤i≤m; 1 ≤j≤ni), where the weights, wij = 1 correspond to no reweighting

and wij =1

nicorrespond to inverse cluster size reweighting. We maintain the preceding no-

tations for the counting process and the at-risk set because it is clear from the context. The

reweighted versions of N``0(t) and M`(t) are then plugged into the estimators for the cumulative

integrated hazards, transition probability matrix, and the SOPs given in (3)-(5) of Section 2.2.

By reweighting the contributions of N``0(t) and M`(t) by the inverse cluster size, one ensures

equal total contribution from each clusters. The diﬀerence in the weighting implementations

would not lead to disparate conclusions unless the cluster size is informative.

3 Pseudo-value Regression

The pseudo-value regression approach1is a ﬂexible technique for the direct modeling of covari-

ates eﬀects on temporal marginal functions of a general multistate model. The regression anal-

ysis is based on pseudo-values obtained from the “leave-one-out” jackknife statistic constructed

from nonparametric marginal estimators. These pseudo-values are computed at pre-speciﬁed

time point(s) and are then used as the responses in a generalized linear model. Logan et al.9

extended the pseudo-value approach for the analysis of clustered competing risks data with

non-informative cluster size. In the current setup, we describe the application of the pseudo-

value regression for modeling covariate eﬀects on SOPs in the analysis of clustered data where

the cluster sizes are potentially informative. In this section, we explain the formulation of the

pseudo-values and marginal models for estimating the covariate eﬀects.

3.1 Constructing the Pseudo-values

Let the mean value parameter π`(t) = E{I(X(t) = `)}=P{X(t) = `}denote the SOP,

so that f{X(t)}=I{X(t) = `}. Let bπw

`(t) and bπuw

`(t) denote the marginal (Aalen-Johansen)

estimators of the SOP formulated with wij =1

niand wij = 1, respectively. For i= 1, . . . , m;j=

1, . . . , ni, we consider two methods for constructing the jackknife pseudo-values at a given time

Method 1: For a clustered multistate data with non-informative cluster size, the pseudo-values

at time t≥0 are given by

Yuw

ij (t) = n·bπuw

`(t)−(n−1)bπuw

`,−ij (t),(6)

where bπuw

`,−ij (t) is obtained by omitting the j-th individual in the i-th cluster9.

Method 2: For the ICS-adjusted estimator bπw

`(t), we deﬁne the pseudo-values as

ij (t) = m· {nibπw

`(t)−(ni−1)bπw

`,−ij (t)} − (m−1) ·bπw

`,−i(t),(7)

where bπw

`,−i(t) and bπw

`,−ij (t) are obtained by omitting the i-th cluster and j-th indi-

vidual in the i-th cluster, respectively.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Adjustingforinformativeclustersizeinpseudo-valuebasedregressionapproacheswithclusteredtimetoeventdataSamuelAnyaso-Samuel,SomnathDattaDepartmentofBiostatistics,UniversityofFloridaAbstractInformativeclustersize(ICS)arisesinsituationswithclustereddatawherealatentrelationshipexistsbetweenthenumberofpart...

展开>> 收起<<

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: