1 Linear mixed model vs two -stage methods Developing prognostic models of diabetic kidney disease progression Brian Kwan12 Lin Liu12 David Strong2 H. Irene Su23 and Loki Natarajan12

2025-04-30 0 0 1.19MB 34 页 10玖币
侵权投诉
1
Linear mixed model vs two-stage methods: Developing prognostic models of diabetic kidney disease
progression
Brian Kwan1,2, Lin Liu1,2, David Strong2, H. Irene Su2,3, and Loki Natarajan1,2
1Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health, University of California, San Diego,
La Jolla, CA, USA;
2Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA;
3Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA.
Corresponding author:
Loki Natarajan, Division of Biostatistics, Department of Family Medicine and Public Health, University of California San Diego,
3855 Health Sciences Dr #0901, La Jolla, CA 92093, USA. Email: lnatarajan@ucsd.edu.
2
Abstract
Identifying prognostic factors for disease progression is a cornerstone of medical research. Repeated
assessments of a marker outcome are often used to evaluate disease progression, and the primary research
question is to identify factors associated with the longitudinal trajectory of this marker. Our work is motivated
by diabetic kidney disease (DKD), where serial measures of estimated glomerular filtration rate (eGFR) are the
longitudinal measure of kidney function, and there is notable interest in identifying factors, such as metabolites,
that are prognostic for DKD progression. Linear mixed models (LMM) with serial marker outcomes (e.g.,
eGFR) are a standard approach for prognostic model development, namely by evaluating the time × prognostic
factor (e.g., metabolite) interaction. However, two-stage methods that first estimate individual-specific eGFR
slopes, and then use these as outcomes in a regression framework with metabolites as predictors are easy to
interpret and implement for applied researchers. Herein, we compared the LMM and two-stage methods, in
terms of bias and mean squared error via analytic methods and simulations, allowing for irregularly spaced
measures and missingness. Our findings provide novel insights into when two-stage methods are suitable
longitudinal prognostic modeling alternatives to the LMM. Notably, our findings generalize to other disease
studies.
Keywords
Biomarker, disease progression, longitudinal study, mixed model, multilevel model, prognostic model
3
1 Introduction
Repeated longitudinal assessment of a marker of disease occurrence or progression is common in
medical studies, e.g., serial measures of prostate specific antigen as a marker of prostate cancer, or repeated
hemoglobin A1C for diabetes control (Lyons and Basu, 2012; O’Brien et al., 2011). Often, interest lies in
identifying baseline factors associated with longitudinal trajectories of these markers, as these factors could
provide early insights into actionable guidelines/treatments for the condition in question. Statistical methods for
modeling these risk factor-longitudinal marker assessments is the focus of this article, with the specific research
question motivated by our prior work in diabetic kidney disease (DKD) (Kwan et al., 2020).
Diabetes is a leading cause of kidney disease and patients with DKD are at high risk of morbidity,
hospitalization, and overall mortality (American Journal of Kidney Diseases, 2018; Grams et al., 2017). Studies
have shown that the human metabolome has considerable potential for characterizing patients with DKD versus
healthy controls (Abbiss et al., 2019; Colhoun and Marcovecchio, 2018; Hirayama et al., 2012; Kalim and
Rhee, 2017; Sharma et al., 2013; Zhang et al., 2015). By incorporating metabolomic analysis into statistical
model development, we could construct prognostic models for early detection of patients at high risk of
developing DKD, potentially leading to earlier and more targeted treatments. Estimated glomerular filtration
rate (eGFR) is a clinically accepted method for measuring kidney function, with higher eGFR indicating better
kidney function (Levey et al., 2009); slope of serial eGFR assessments, interpreted as annual eGFR change, are
widely used to evaluate kidney disease progression. In our previous work (Kwan et al., 2020), we implemented
a two-stage approach for identifying metabolomic predictors of DKD progression via, first estimating eGFR
slope, and then using this slope as the outcome in a regression model with baseline metabolites as predictors.
We used data collected from the Chronic Renal Insufficiency Cohort (CRIC) (Denker et al., 2015; Feldman,
2003; Lash et al., 2009), a racially and ethnically diverse group of adults aged 21 to 74 years with a broad
spectrum of renal disease severity, one of the largest in the US, with comprehensive data on clinical and
metabolite profiles. However, a more conventional and statistically accepted modeling approach is to fit a single
linear mixed model with serial eGFR measures (outcomes) and evaluate the coefficient of the metabolite
(biomarker) × time (year) interaction term, also interpreted as annual eGFR change. Nonetheless, two-stage
4
methods offer the advantage of estimating individual slopes, which are by themselves of interest as a marker of
disease progression, and can be readily implemented as outcomes in standard regression models by researchers,
as evidenced by the plethora of research that uses eGFR slopes as outcomes in DKD research (Anderson et al.,
2020; de Hauteclocque et al., 2014; Heinzel et al., 2018; Koye et al., 2018; Osonoi et al., 2020; Parsa et al.,
2013). Given their widespread use by DKD researchers, in this paper, we aim to provide novel insights into
when two-stage methods are suitable longitudinal prognostic modeling alternatives to the linear mixed model.
In prior statistical investigations, Sayers et al. (2017) conducted a simulation study comparing two-stage
methods with individual slope as a predictor (i.e., independent variable) for a dependent outcome by examining
the bias and coverage of the association between birth length, linear growth and later blood pressure under
several study design scenarios. Our set-up is different in that the slopes are the dependent variable in our
models, and we aim to evaluate a variety of two-stage approaches for assessing the prognostic value of a
covariate for predicting this slope. In particular, using the framework of our previous work (Kwan et al., 2020),
we will consider the baseline metabolite as the predictor for annual eGFR change (slope). In addition,
expanding on the statistical approaches of Sayers et al. (2017), we compare via simulations the linear mixed
effects model to our two-stage methods under an expanded set of study design scenarios that incorporate
irregularly spaced time measures, and missing data and also analytically examine and compare bias and
efficiency across methods. More specifically, in Section 2, we outline our statistical approaches which include a
range of two-stage methods. In Section 3, we describe in detail our simulation process, study design scenarios,
and comparison performance metrics for our statistical approaches. Section 4 showcases analytical derivations
for the relationships between our statistical models. Section 5 presents the simulation results for our statistical
approaches under our set of study design scenarios. Lastly, Section 6 discusses the overall findings, current
limitations, and future directions for this work. We emphasize that although this paper is motivated by the
metabolite-DKD context with the terms metabolite and eGFR serving as predictor and longitudinal outcome in
the following sections, this work applies to any predictor-longitudinal disease modeling application.
2 Statistical Approaches
5
2.1 Linear Mixed Model (LMM) Approach
The linear mixed effects model (Fitzmaurice et al., 2011), ubiquitously used in longitudinal settings,
incorporates fixed and random effects to model individual eGFR trajectories over time. Fixed effects are shared
between all individuals and model the population mean eGFR trajectory. Random effects are unique to each
individual and characterize individual eGFR profiles. Our model, which incorporated fixed effects for
metabolite, time, and their interaction as well as random intercept and slope terms, was expressed as
 
for individual and occasion where  is the eGFR response,  are fixed effects and 
are random effects, is individual ’s baseline metabolite value,  is time in years, and  is the within-
individual error. We assume the random effects  where 
  are
independent of both and  The within-individual error  is assumed to be normally distributed with mean
zero and variance As our investigation primarily focuses on the association between metabolite and annual
rate of eGFR change, the metabolite × time interaction coefficient is our main effect of interest. The
coefficient is interpreted as the population-averaged annual rate of eGFR change for a one-unit higher in
metabolite value.
An advantage to using a linear mixed effects model is that it can incorporate incomplete and unbalanced
longitudinal data among individuals. Therefore, we would be avoiding the bias of using complete-case analysis
as well as not requiring an equal number of available eGFR measurements nor need these measurements be at a
common set of occasions for each individual. A further, more extensive overview, of the method is given in
Chapter 8 of Fitzmaurice et al. (2011) .
2.2 Two-Stage Approaches
Our two-stage methods model the association between metabolite and annual rate of eGFR change in
two stages: (1st) estimate individual eGFR slopes and (2nd) regress eGFR slope on metabolite as the sole
摘要:

1Linearmixedmodelvstwo-stagemethods:DevelopingprognosticmodelsofdiabetickidneydiseaseprogressionBrianKwan1,2,LinLiu1,2,DavidStrong2,H.IreneSu2,3,andLokiNatarajan1,21DivisionofBiostatisticsandBioinformatics,DepartmentofFamilyMedicineandPublicHealth,UniversityofCalifornia,SanDiego,LaJolla,CA,USA;2Moor...

展开>> 收起<<
1 Linear mixed model vs two -stage methods Developing prognostic models of diabetic kidney disease progression Brian Kwan12 Lin Liu12 David Strong2 H. Irene Su23 and Loki Natarajan12.pdf

共34页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:34 页 大小:1.19MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 34
客服
关注