Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data

2025-05-06 0 0 431.66KB 22 页 10玖币
侵权投诉
Adjusting for informative cluster size in pseudo-value
based regression approaches with clustered time to event
data
Samuel Anyaso-Samuel, Somnath Datta
Department of Biostatistics, University of Florida
Abstract
Informative cluster size (ICS) arises in situations with clustered data where a latent
relationship exists between the number of participants in a cluster and the outcome mea-
sures. Although this phenomenon has been sporadically reported in statistical literature
for nearly two decades now, further exploration is needed in certain statistical methodolo-
gies to avoid potentially misleading inferences. For inference about population quantities
without covariates, inverse cluster size reweightings are often employed to adjust for ICS.
Further, to study the effect of covariates on disease progression described by a multi-
state model, the pseudo-value regression technique has gained popularity in time-to-event
data analysis. We seek to answer the question: ”How to apply pseudo-value regression
to clustered time-to-event data when cluster size is informative?” ICS adjustment by the
reweighting method can be performed in two steps; estimation of marginal functions of the
multistate model and fitting the estimating equations based on pseudo-value responses,
leading to four possible strategies. We present theoretical arguments and thorough sim-
ulation experiments to ascertain the correct strategy for adjusting for ICS. A further
extension of our methodology is implemented to include informativeness induced by the
intra-cluster group size. We demonstrate the methods in two real-world applications:
(i) to determine predictors of tooth survival in a periodontal study, and (ii) to iden-
tify indicators of ambulatory recovery in spinal cord injury patients who participated in
locomotor-training rehabilitation.
1 Introduction
Researchers often encounter complex time-to-event data that characterize disease progression
in biomedical studies. Multistate models (which are a form of multivariate survival models)
are traditional statistical tools for describing the transitions and state occupation of patients
with such event history data. The multistate system could be subject to right-censoring and/or
left-truncation. Usually, additional covariate information is also available, and researchers may
be equally interested in studying the effects of these covariates on the probability of occupying
a disease state at a given time point. Although traditional methods such as the Cox model may
be employed for such analysis, the pseudo-value regression1approach that directly models the
covariate effects on marginal temporal functions of a multistate process is a robust alternative.
Moreover, the event history data may be correlated among subjects belonging to distinct
clusters, and the cluster size could be potentially informative. This is fairly common in dental
1
arXiv:2210.13410v2 [stat.ME] 23 Mar 2023
studies when data are collected on all available teeth, where the teeth in a patient’s (clus-
ter) mouth are correlated due to shared behavioral and genetic patterns, and the number of
surviving or available teeth is indicative of the patient’s oral health. Several methodological
papers2–6 have shown that failure to adjust for informative cluster size (ICS) leads to poten-
tially invalid results. This has not been widely recognized in medical research. Moreover, for
population-level inference, these papers propose inverse cluster size reweightings to adjust for
ICS. Bakoyannis and colleagues7,8 (and the references stated therein) provide a recent review of
the analysis of cluster-correlated multistate models. Using the inverse cluster size reweighting,
they proposed nonparametric marginal estimators7and two-sample tests7,8 for the transition
and state occupation probabilities that adjusts for ICS. However, their methods is inapplicable
when inference about the association of a set of covariates and the transition outcomes is the
object of the analysis. In this paper, we give novel insights into the problem of ICS, especially,
in the context of pseudo-value regression for cluster-correlated multistate models. This in turn
provides novel extensions of the pseudo-value regression in clustered data with ICS (and also
informative intra-cluster group (ICG) size).
The regression analysis proceeds by obtaining jackknife pseudo-values that are based on
nonparametric estimates of a marginal function of a multistate process. Subsequently, the
pseudo-values are used as the responses in a generalized estimating equation (GEE) to conduct
inference based on available covariates. For noninformative cluster sizes, the pseudo-value
regression9was extended to model the cumulative incidence of transplant-related mortality as
a function of patient and transplant characteristics. Adjusting for ICS in the context of the
pseudo-value regression is generally not so obvious since such adjustments should be considered
in two steps: (i) estimation of the marginal function and (ii) estimating equations for the
regression analysis. This leads to four possible strategies for ICS adjustment - no reweightings
in either step, reweighting in the first step alone, reweighting in the second step alone, and
reweightings in both steps. We study the appropriate strategies for ICS adjustments leading to
accurate methods for the regression analysis. We provide theoretical arguments and simulation
studies to justify these methods.
To illustrate the proposed methods, we applied them to two real-world datasets. First, we
analyze the periodontal data studied by McGuire and Nunn10. The objective of the analysis
was to model tooth survival as a function of clinical and behavioral factors. However, a simple
Kaplan-Meier calculation11 showed that patients with more teeth at baseline have a higher
tooth survival probability, indicating potential ICS.
Next, we analyze a data set from a multicenter study of spinal cord injury (SCI) patients
receiving activity-based rehabilitation12. The study aimed at identifying prognostic factors
of ambulatory recovery. Based on their walking speed, which is evaluated periodically after
receiving standardized therapy sessions, the event history of the patients is described by a
progressive illness-death model. A potential reason for the informativeness of the cluster size
in this study stems from the recruitment of patients with worse prognoses in several centers.
The rest of the paper is organized as follows. In Section 2, we present a nonparamet-
ric marginal estimator of a temporal function for clustered multistate data when the cluster
size is informative. Here, the temporal function of interest is the state occupation probability
(SOP). The pseudo-value regression is discussed in Section 3. Detailed simulation exercises are
presented in Section 4. We illustrate the proposed methods using the real-world applications
in Section 5. In Section 6, we consider an extension of our approach to a multilevel design.
Additional results are placed in the web-based supplementary material showing that our recom-
mended strategy works in more complex informative cluster size settings based on group sizes
2
within clusters as well. The main body of the paper ends with a discussion in Section 7. The
appendix contains a theoretical investigation into the matter of ICS adjustment for clustered
pseudo-value regression where a linear model structure is assumed to simplify calculations and
get the main message across.
2 Non-parametric estimators for the marginal state oc-
cupation probabilities
2.1 Notation
Consider the stochastic process X(t)∈ S with a finite set of states S={1, ..., Q}for a general
multistate model, where X(t) corresponds to the state a patient is currently in at time t0. For
continuous time, we assume the data may be subject to right censoring and/or left-truncation.
For i= 1, ..., n and positive integer k, let T
ik indicate the time for the kth transition for the ith
individual where T
i00 and T
ik =if the ith individual enters an absorbing state before the
kth transition is made. Let Ciand Lirespectively denote the right-censoring and left truncation
time, that are independent of X(t). Let T
i= supk{T
ik :T
ik <∞} be the time for the last
transition for the ith individual.
2.2 Marginal estimators for independent data
The transition probability matrix P(s, t) gives the probability of transitioning between every
state in S; the ``0th element of P(s, t) is given by P``0(s, t) = Pr{X(t) = `0|X(s) = `}for
s<t. Primarily, we are interested in estimating and modeling the SOPs which are given by
π`(t) = Pr{X(t) = `}=P`0∈S π`0(0)P`0`(0, t),where π`0(0) are the initial state occupation
probabilities.
In a right-censored and left-truncated experiment, the transition counts of patients moving
from state `to `0in the time interval [0, t], denoted by N``0(t), and the number of individuals
at risk of transitioning out of state `at time t, denoted by M`(t) are crucial to developing
estimators for the multistate parameters. These are given as follows
N``0(t) =
n
X
i=1 X
k1
I{T
ik t, CiT
ik, Li< t, X(T
ik) = `, X(T
ik1) = `0}/K(T
ik) (1)
and
M`(t) =
n
X
i=1 X
k1
I{T
ik1< t T
ik, Cit, Li< t, X(T
ik) = `}/K(t) (2)
where I(·) is the indicator function and K(·) is the Kaplan-Meier estimator of the survival
function of the censoring variable (such that Ci’s are the failure times that are right-censored by
T
i). A careful examination will show that N``0(t) and M`(t) are calculable based on observed
data (that is, right-censored and/or left-truncated data). The Nelson-Aalen estimator of the
3
cumulative transition intensity matrix denoted by b
Ais a necessary quantity for obtaining the
estimator of the SOP. The elements of b
Aare given by
b
A``0(t) =
Zt
0
IM`(u)>0M`(u)1dN``0(u)`6=`0,
X
`06=`b
A``0(t)`=`0,(3)
Further, the Aalen-Johansen estimator of the transition probability matrix is obtained via the
product integration of b
A(t), thus,
b
P(s, t) = Y
(s,t]
{I+ d b
A(u)}.(4)
where Idenotes the Q×Qidentity matrix. Then, the estimator of the SOP is given by
bπ`(t) =
Q
X
`0=1 bπ`0(0) b
P`0`(0, t),(5)
where bπ`0(0) is essentially the initial proportion of individuals in state `0. Andersen et al.13 give
a detailed description of the development of these estimators.
Historically, the consistency and large-sample properties of the estimators given above were
established under the assumption that the transitions between states were Markov13. However,
the estimator of the SOP given by (5) remains valid even when the process is non-Markovian14.
2.3 Marginal estimators for clustered data when cluster size is in-
formative
Consider the situation where the event history of a cluster of patients is described by a multistate
model and the event history of patients belonging to the same cluster are correlated while the
processes for patients from separate clusters are independent. Suppose the data consist of m
clusters indexed by iwith j= 1, .., nipatients in cluster iand total sample size n=Pm
i=1 ni.
Also, consider the situation where the cluster size is potentially informative, that is, there
exists a latent relationship between the cluster sizes and observed transition outcomes. Under
such a scenario, the cluster size niis a random variable. Recently, Bakoyannis7proposed
a weighted nonparametric estimator of the SOP that accounts for ICS and established the
large sample property using empirical process theory. Inference around this estimator is based
on a population formed by randomly selecting a cluster unit from a randomly selected cluster.
Following the notation presented in Section 2.2, we present the weighted estimator of the SOPs.
Let XIJ (t)∈ S denote the stochastic process for a typical patient indexed by Jthat belongs
to a typical cluster indexed by I, where I∼ U{1, m}, and given I=i,J∼ U{1, ni};Udenotes
the discrete uniform distribution. Primarily, our interest lies in carrying out covariate inference
on the SOP for a randomly chosen patient in a randomly selected cluster. We define the
marginal SOPs by taking the expectation over all Vi={ni,Xi1,Xi2, ..., Xini}, which gives
π`(t) = E[I{XIJ (t) = `}] = E[1
mPm
i=1
1
niPni
j=1 I{Xij (t) = `}], t 0.This expression defines
the probability of patient jin cluster ibeing in state `at time t, under the assumption that
the processes Vi, i = 1, ..., m, are IID and the Xij within a distinct cluster iare exchangeable
given ni15.
4
Using the convention in Section 2.1, let T
ijk denote the time for the kth transition for the
jth subject in cluster i, let Cij and Lij respectively denote the right-censoring and left trunca-
tion time that are independent of the multistate process and let T
ij = supk{T
ijk :T
ijk <∞}.
Following 1 and 2, N``0(t) and M`(t) are respectively expressed by a weighted sum over all
individuals (1 im; 1 jni), where the weights, wij = 1 correspond to no reweighting
and wij =1
nicorrespond to inverse cluster size reweighting. We maintain the preceding no-
tations for the counting process and the at-risk set because it is clear from the context. The
reweighted versions of N``0(t) and M`(t) are then plugged into the estimators for the cumulative
integrated hazards, transition probability matrix, and the SOPs given in (3)-(5) of Section 2.2.
By reweighting the contributions of N``0(t) and M`(t) by the inverse cluster size, one ensures
equal total contribution from each clusters. The difference in the weighting implementations
would not lead to disparate conclusions unless the cluster size is informative.
3 Pseudo-value Regression
The pseudo-value regression approach1is a flexible technique for the direct modeling of covari-
ates effects on temporal marginal functions of a general multistate model. The regression anal-
ysis is based on pseudo-values obtained from the “leave-one-out” jackknife statistic constructed
from nonparametric marginal estimators. These pseudo-values are computed at pre-specified
time point(s) and are then used as the responses in a generalized linear model. Logan et al.9
extended the pseudo-value approach for the analysis of clustered competing risks data with
non-informative cluster size. In the current setup, we describe the application of the pseudo-
value regression for modeling covariate effects on SOPs in the analysis of clustered data where
the cluster sizes are potentially informative. In this section, we explain the formulation of the
pseudo-values and marginal models for estimating the covariate effects.
3.1 Constructing the Pseudo-values
Let the mean value parameter π`(t) = E{I(X(t) = `)}=P{X(t) = `}denote the SOP,
so that f{X(t)}=I{X(t) = `}. Let bπw
`(t) and bπuw
`(t) denote the marginal (Aalen-Johansen)
estimators of the SOP formulated with wij =1
niand wij = 1, respectively. For i= 1, . . . , m;j=
1, . . . , ni, we consider two methods for constructing the jackknife pseudo-values at a given time
t.
Method 1: For a clustered multistate data with non-informative cluster size, the pseudo-values
at time t0 are given by
Yuw
ij (t) = n·bπuw
`(t)(n1)bπuw
`,ij (t),(6)
where bπuw
`,ij (t) is obtained by omitting the j-th individual in the i-th cluster9.
Method 2: For the ICS-adjusted estimator bπw
`(t), we define the pseudo-values as
Yw
ij (t) = m· {nibπw
`(t)(ni1)bπw
`,ij (t)} − (m1) ·bπw
`,i(t),(7)
where bπw
`,i(t) and bπw
`,ij (t) are obtained by omitting the i-th cluster and j-th indi-
vidual in the i-th cluster, respectively.
5
摘要:

Adjustingforinformativeclustersizeinpseudo-valuebasedregressionapproacheswithclusteredtimetoeventdataSamuelAnyaso-Samuel,SomnathDattaDepartmentofBiostatistics,UniversityofFloridaAbstractInformativeclustersize(ICS)arisesinsituationswithclustereddatawherealatentrelationshipexistsbetweenthenumberofpart...

展开>> 收起<<
Adjusting for informative cluster size in pseudo-value based regression approaches with clustered time to event data.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:22 页 大小:431.66KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注