1 Linear mixed model vs two -stage methods Developing prognostic models of diabetic kidney disease progression Brian Kwan12 Lin Liu12 David Strong2 H. Irene Su23 and Loki Natarajan12

2025-04-30 0 0 1.19MB 34 页 10玖币

Linear mixed model vs two-stage methods: Developing prognostic models of diabetic kidney disease

progression

Brian Kwan1,2, Lin Liu1,2, David Strong2, H. Irene Su2,3, and Loki Natarajan1,2

1Division of Biostatistics and Bioinformatics, Department of Family Medicine and Public Health, University of California, San Diego,

La Jolla, CA, USA;

2Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA;

3Department of Obstetrics, Gynecology and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA.

Corresponding author:

Loki Natarajan, Division of Biostatistics, Department of Family Medicine and Public Health, University of California San Diego,

3855 Health Sciences Dr #0901, La Jolla, CA 92093, USA. Email: lnatarajan@ucsd.edu.

Abstract

Identifying prognostic factors for disease progression is a cornerstone of medical research. Repeated

assessments of a marker outcome are often used to evaluate disease progression, and the primary research

question is to identify factors associated with the longitudinal trajectory of this marker. Our work is motivated

by diabetic kidney disease (DKD), where serial measures of estimated glomerular filtration rate (eGFR) are the

longitudinal measure of kidney function, and there is notable interest in identifying factors, such as metabolites,

that are prognostic for DKD progression. Linear mixed models (LMM) with serial marker outcomes (e.g.,

eGFR) are a standard approach for prognostic model development, namely by evaluating the time × prognostic

factor (e.g., metabolite) interaction. However, two-stage methods that first estimate individual-specific eGFR

slopes, and then use these as outcomes in a regression framework with metabolites as predictors are easy to

interpret and implement for applied researchers. Herein, we compared the LMM and two-stage methods, in

terms of bias and mean squared error via analytic methods and simulations, allowing for irregularly spaced

measures and missingness. Our findings provide novel insights into when two-stage methods are suitable

longitudinal prognostic modeling alternatives to the LMM. Notably, our findings generalize to other disease

studies.

Keywords

Biomarker, disease progression, longitudinal study, mixed model, multilevel model, prognostic model

1 Introduction

Repeated longitudinal assessment of a marker of disease occurrence or progression is common in

medical studies, e.g., serial measures of prostate specific antigen as a marker of prostate cancer, or repeated

hemoglobin A1C for diabetes control (Lyons and Basu, 2012; O’Brien et al., 2011). Often, interest lies in

identifying baseline factors associated with longitudinal trajectories of these markers, as these factors could

provide early insights into actionable guidelines/treatments for the condition in question. Statistical methods for

modeling these risk factor-longitudinal marker assessments is the focus of this article, with the specific research

question motivated by our prior work in diabetic kidney disease (DKD) (Kwan et al., 2020).

Diabetes is a leading cause of kidney disease and patients with DKD are at high risk of morbidity,

hospitalization, and overall mortality (American Journal of Kidney Diseases, 2018; Grams et al., 2017). Studies

have shown that the human metabolome has considerable potential for characterizing patients with DKD versus

healthy controls (Abbiss et al., 2019; Colhoun and Marcovecchio, 2018; Hirayama et al., 2012; Kalim and

Rhee, 2017; Sharma et al., 2013; Zhang et al., 2015). By incorporating metabolomic analysis into statistical

model development, we could construct prognostic models for early detection of patients at high risk of

developing DKD, potentially leading to earlier and more targeted treatments. Estimated glomerular filtration

rate (eGFR) is a clinically accepted method for measuring kidney function, with higher eGFR indicating better

kidney function (Levey et al., 2009); slope of serial eGFR assessments, interpreted as annual eGFR change, are

widely used to evaluate kidney disease progression. In our previous work (Kwan et al., 2020), we implemented

a two-stage approach for identifying metabolomic predictors of DKD progression via, first estimating eGFR

slope, and then using this slope as the outcome in a regression model with baseline metabolites as predictors.

We used data collected from the Chronic Renal Insufficiency Cohort (CRIC) (Denker et al., 2015; Feldman,

2003; Lash et al., 2009), a racially and ethnically diverse group of adults aged 21 to 74 years with a broad

spectrum of renal disease severity, one of the largest in the US, with comprehensive data on clinical and

metabolite profiles. However, a more conventional and statistically accepted modeling approach is to fit a single

linear mixed model with serial eGFR measures (outcomes) and evaluate the coefficient of the metabolite

(biomarker) × time (year) interaction term, also interpreted as annual eGFR change. Nonetheless, two-stage

methods offer the advantage of estimating individual slopes, which are by themselves of interest as a marker of

disease progression, and can be readily implemented as outcomes in standard regression models by researchers,

as evidenced by the plethora of research that uses eGFR slopes as outcomes in DKD research (Anderson et al.,

2020; de Hauteclocque et al., 2014; Heinzel et al., 2018; Koye et al., 2018; Osonoi et al., 2020; Parsa et al.,

2013). Given their widespread use by DKD researchers, in this paper, we aim to provide novel insights into

when two-stage methods are suitable longitudinal prognostic modeling alternatives to the linear mixed model.

In prior statistical investigations, Sayers et al. (2017) conducted a simulation study comparing two-stage

methods with individual slope as a predictor (i.e., independent variable) for a dependent outcome by examining

the bias and coverage of the association between birth length, linear growth and later blood pressure under

several study design scenarios. Our set-up is different in that the slopes are the dependent variable in our

models, and we aim to evaluate a variety of two-stage approaches for assessing the prognostic value of a

covariate for predicting this slope. In particular, using the framework of our previous work (Kwan et al., 2020),

we will consider the baseline metabolite as the predictor for annual eGFR change (slope). In addition,

expanding on the statistical approaches of Sayers et al. (2017), we compare via simulations the linear mixed

effects model to our two-stage methods under an expanded set of study design scenarios that incorporate

irregularly spaced time measures, and missing data and also analytically examine and compare bias and

efficiency across methods. More specifically, in Section 2, we outline our statistical approaches which include a

range of two-stage methods. In Section 3, we describe in detail our simulation process, study design scenarios,

and comparison performance metrics for our statistical approaches. Section 4 showcases analytical derivations

for the relationships between our statistical models. Section 5 presents the simulation results for our statistical

approaches under our set of study design scenarios. Lastly, Section 6 discusses the overall findings, current

limitations, and future directions for this work. We emphasize that although this paper is motivated by the

metabolite-DKD context with the terms metabolite and eGFR serving as predictor and longitudinal outcome in

the following sections, this work applies to any predictor-longitudinal disease modeling application.

2 Statistical Approaches

2.1 Linear Mixed Model (LMM) Approach

The linear mixed effects model (Fitzmaurice et al., 2011), ubiquitously used in longitudinal settings,

incorporates fixed and random effects to model individual eGFR trajectories over time. Fixed effects are shared

between all individuals and model the population mean eGFR trajectory. Random effects are unique to each

individual and characterize individual eGFR profiles. Our model, which incorporated fixed effects for

metabolite, time, and their interaction as well as random intercept and slope terms, was expressed as

 

for individual  and occasion  where  is the eGFR response,  are fixed effects and 

are random effects,  is individual ’s baseline metabolite value,  is time in years, and  is the within-

individual error. We assume the random effects  where 

  are

independent of both and  The within-individual error  is assumed to be normally distributed with mean

zero and variance  As our investigation primarily focuses on the association between metabolite and annual

rate of eGFR change, the  metabolite × time interaction coefficient is our main effect of interest. The

coefficient is interpreted as the population-averaged annual rate of eGFR change for a one-unit higher in

metabolite value.

An advantage to using a linear mixed effects model is that it can incorporate incomplete and unbalanced

longitudinal data among individuals. Therefore, we would be avoiding the bias of using complete-case analysis

as well as not requiring an equal number of available eGFR measurements nor need these measurements be at a

common set of occasions for each individual. A further, more extensive overview, of the method is given in

Chapter 8 of Fitzmaurice et al. (2011) .

2.2 Two-Stage Approaches

Our two-stage methods model the association between metabolite and annual rate of eGFR change in

two stages: (1st) estimate individual eGFR slopes and (2nd) regress eGFR slope on metabolite as the sole

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Linearmixedmodelvstwo-stagemethods:DevelopingprognosticmodelsofdiabetickidneydiseaseprogressionBrianKwan1,2,LinLiu1,2,DavidStrong2,H.IreneSu2,3,andLokiNatarajan1,21DivisionofBiostatisticsandBioinformatics,DepartmentofFamilyMedicineandPublicHealth,UniversityofCalifornia,SanDiego,LaJolla,CA,USA;2Moor...

展开>> 收起<<

1 Linear mixed model vs two -stage methods Developing prognostic models of diabetic kidney disease progression Brian Kwan12 Lin Liu12 David Strong2 H. Irene Su23 and Loki Natarajan12.pdf

共34页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Linear mixed model vs two -stage methods Developing prognostic models of diabetic kidney disease progression Brian Kwan12 Lin Liu12 David Strong2 H. Irene Su23 and Loki Natarajan12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: