A copula-based boosting model for time-to-event prediction with dependent censoring Alise Danielle Midtfjord1 Riccardo De Bin1 and Arne Bang Huseby1

2025-04-28 0 0 1.74MB 23 页 10玖币

侵权投诉

A copula-based boosting model for time-to-event prediction with

dependent censoring

Alise Danielle Midtfjord1,*, Riccardo De Bin1, and Arne Bang Huseby1

1Department of Mathematics, University of Oslo, Norway

Abstract

A characteristic feature of time-to-event data analysis is possible censoring of the event time. Most of the

statistical learning methods for handling censored data are limited by the assumption of independent censoring,

even if this can lead to biased predictions when the assumption does not hold. This paper introduces Clayton-

boost, a boosting approach built upon the accelerated failure time model, which uses a Clayton copula to handle

the dependency between the event and censoring distributions. By taking advantage of a copula, the independent

censoring assumption is not needed any more. During comparisons with commonly used methods, Clayton-boost

shows a strong ability to remove prediction bias at the presence of dependent censoring and outperforms the

comparing methods either if the dependency strength or percentage censoring are considerable. The encouraging

performance of Clayton-boost shows that there is indeed reasons to be critical about the independent censoring

assumption, and that real-world data could highly beneﬁt from modelling the potential dependency.

1 Introduction

The analysis of time-to-event data is an important topic in

statistics. It deals with response variables that record the

time until a speciﬁc event happens, for example death due

to a speciﬁc disease or failure of a mechanical component.

While very common in biomedicine [45, 65, 12, 74] and

engineering [54, 60, 7], where it takes the name of survival

analysis and reliability theory, respectively, the analysis

of time-to-event data is used in many other ﬁelds as well,

such as economics [70], ecology [76], and agronomy [69].

A characterizing feature of time-to-event data is the

presence of censored observations, i.e., some statistical

units are not observed until the event of interest appears,

but up to (or from) a diﬀerent time-point. Examples where

we only observe a lower bound of the time are when pa-

tients exit a clinical study before achieved test result or

when substituting a machinery component before its fail-

ure [50]. An example where we only observe an upper

bound of the time, or observe an interval which the times

lie within, is when systems are investigated at a speciﬁc

time point, and components which have failed are identi-

ﬁed at that time [77, 86, 24]. Speciﬁc methods have been

developed to handle data involving censored observations,

including the proportional hazard model [10], the ﬁrst hit-

ting threshold models [19] and the accelerated failure time

models [73].

The development of advanced machine learning in the

recent years has also inﬂuenced the analysis of time-to-

event data. Techniques as neural networks (e.g., [42, 49]),

support vector machine [91, 44], random forests [38, 33],

and many other (for a recent review, see[93] have been

adapted to work in this context. In this paper, we will fo-

cus on boosting techniques, which have also been success-

fully implemented for modelling time-to-event data [59].

For example, boosting counterparts of the proportional

hazard model (e.q., [3]), ﬁrst hitting time model [16] and

accelerated failure time model [81] are available in the lit-

erature, and well implemented in statistical software [14].

As many of the classical models they are based on,

however, currently available boosting algorithms are lim-

ited by the assumption of independent censoring. This

means that they require the censorship mechanism to be

independent of the event of interest. While plausible in

some cases, this assumption may not hold in many other

[11]. Typical examples are patients that drop out from

a study due to some reasons related to the therapy or by

the eﬀect of a competing risk [50]. The consequences of in-

correctly assuming independent censoring range from the

wrong assessment of the eﬀect of a variable to the overesti-

mation/underestimation of the survival probabilities, and

several research papers have clearly stated that the in-

dependent censoring assumption deserves special scrutiny

[41, 51, 20]. Illustrative examples can be found in [36].

Their simulation study shows how much one can overesti-

mate the survival probabilities when incorrectly assuming

independent censoring (Figure 1(a) in [36]) and how much

bias can be introduced in the estimate of the regression

coeﬃcients (Table 1 in [36]). Their application also shows

how much the p-value related to a covariate can vary when

the model is ﬁtted assuming or not assuming independence

censoring, at the point that a risk factor can appear statis-

tically signiﬁcant when it is not (or the other way around).

Many other simulation (see, e.q., [95, 6]) or real data (see,

e.g.,[53]) examples can be found in the literature on this

issue.

While there is much work in the classical statistical

framework on this topic (see, among many other, [52, 35,

80, 84, 55, 39, 17]), less attention has been devoted to ex-

tend statistical learning methods to the case of dependent

censoring. A notable exception is [64], where the survival

version of random forests is adapted to tackle the issue.

The goal of our paper is to help ﬁlling this gap. In partic-

ular, we introduce a gradient boosting model that handles

arXiv:2210.04869v2 [stat.ME] 26 Oct 2022

time-to-event data with dependent censoring. First devel-

oped in the machine learning community [79, 22] and then

translated into the statistical world [23], gradient boost-

ing is one of the most eﬀective machine learning tools cur-

rently available [58]. Especially, recent implementations

like XGBoost [8], LightGBM [43] and CatBoost [18] have

been proved to work extremely well in many situations

[61, 67], and a boosting model which deals with depen-

dent censoring is highly desirable.

We will tackle the problem with the help of copulas

[85]. Copulas are cumulative distribution functions with

uniform marginals, which can be interpreted as the de-

pendence structure between two or more distributions.

In this paper, we use a Clayton copula when modelling

the dependency between the event time and the censor-

ing time. This approach is not new in the literature, and

started with the seminal work of [96]. Here, a generaliza-

tion of the Kaplan-Meier estimator, called copula-graphic,

was developed to handle dependent censoring. Extensions

of that work include [75], which focus on Archimedean

copulas, and [4], which study it in the ﬁxed-design situ-

ation. Among more recent works, we mention [89], that

consider the left-truncation case, and [11], that relax the

assumption that the parameter deﬁning the copula func-

tion is known. For a full description of the use of copula

to model dependent censoring in time-to-event data prob-

lems, we refer the reader to the book of [20].

The rest of the paper is organized as follow. In Section

1 we describe dependent censoring and introduce the basic

concept of our approach, the Clayton Copula, the Acceler-

ated Failure time Model, and the boosting algorithm. The

novel approach is presented in Section 2 and evaluated via

simulation and on real data in Section 3. Section 4 ends

the paper with some remarks.

sectionMethods

1.1 Time-to-event prediction and

likelihood-based inference

In survival analysis, the term survival time, or event time,

refers to the time progressed from an origin to the occur-

rence of an event. While the target variable is commonly

referred to as ”time”, it can also consist of other units such

as cycles, rounds, or even friction, as will be seen in Sec-

tion 3. One common factor for the diﬀerent applications of

time-to-event prediction is the presence of censored data.

There are three main ways of censoring: In right censor-

ing, the event time is known to be higher than a certain

value, while in left censoring, the event time is known to

be lower than a certain value. In interval censoring, the

event time is known to be between two values. This paper

will focus on right censoring, as this is the most common

way of censoring. However, the proposed methodology is

easily extended to handle left- and interval censoring by

making a smaller change in the likelihood function.

In addition to the tree ways of censoring, there are

some diﬀerent characteristics with the censoring. A Type

Icensoring means that the event is censored only if it

happens after a pre-speciﬁed time. This is also called ad-

ministrative censoring, and all remaining subjects at the

speciﬁed time are right censored. A Type II censoring

means that an experiment stops after a pre-speciﬁed num-

ber of events has happened, and the rest of the subjects

are right censored. When the censoring is Independent and

non-informative, every subject has a probability of being

censored which is statistically independent of the event

time. On the contrary, if a censoring is referred to as De-

pendent, the subject is censored by a mechanism related to

the event time. This paper concerns the latter censoring

characteristic and proposes a method for making statisti-

cal learning methods useful for this type of censoring, and

not only in the case of independent and non-informative

censoring, which is often assumed in the literature. Before

going into the details, we review the classical terminology

of time-to-event data, which is further used in this paper.

Consider two random variables: Tis the event time

and Uis the censoring time. The two variables are mu-

tually exclusive, and only one of Tor Uis observed. We

observe Tif the event appears earlier than censoring (T≤

U), or we do not observe Tif censoring happens earlier

than the event (U < T ). An observation iin time-to-

event data consists of (t, δ, x), where tis the event time or

censoring time, depending on which comes ﬁrst, δis the

censoring indicator, which is 1 if event time is observed

and 0 if the observation is censored, and xis the vector

of covariates. To put it diﬀerently, t=min{T, U}and

δ=I(T≤U).

To perform likelihood-based inference on time-to-event

data, one should consider both the case of censored and

complete observations, such that the likelihood becomes

L= Pr(T=t, U > t|x)δPr(T > t, U =t|x)1−δ,(1)

which yields a computationally diﬃcult expression, due to

the two joint probability distribution functions. A nor-

mal assumption made to simplify this expression is the

assumption of independent and non-informative censoring

(as deﬁned in [20]):

•Independent censoring: Event time and censoring

time are independent given the covariates

•Non-informative censoring: The censoring distribu-

tion does not involve any parameters related to the

distribution of the survival times

In real world situations, a censoring is usually non-informative

if it is independent, and in the rest of the paper we as-

sume that independent censoring implies non-informative

censoring. Note that the independent censoring assump-

tion states that Tand Uare conditionally independent

given x. In other words, even if there exists dependency

between Tand U, when the covariates contain all infor-

mation about the dependency, the independent censoring

assumption holds. This speciﬁc case is not explored fur-

ther in this paper, as most real-world situations rarely

provide all the necessary information in the covariates.

Under the assumption of independent censoring, the

likelihood function can be rewritten as

L= [Pr(T=t|x)Pr(U > t|x)]δ[Pr(T > t|x)Pr(U=t|x)]1−δ.

= [fT(t|x)SU(t|x)]δ[ST(t|x)fU(t|x)]1−δ

= [fT(t|x)δST(t|x)1−δ][fU(t|x)1−δSU(t|x)δ],

(2)

where ST(t|x) = Pr(T > t|x), fT(t|x) = −dST(t|x)/dt,

SU(t|x) = Pr(U > t|x), and fU(t|x) = −dSU(t|x)/dt.

When assuming independent censoring, fU(t|x)1−δSU(t|x)δ

is unrelated to the event time. Hence, the likelihood func-

tion is simpliﬁed to

L∝fT(t|x)δST(t|x)1−δ

=fT(t|x)δ(1 −FT(t|x))1−δ,(3)

which only depends on the probability density function

and cumulative distribution function of T. Although, if

the assumption of independent censoring does not hold,

the separation in Eq. (2) does not hold. In this case, the

likelihood function is expressed as

L= Pr(T=t, U > t|x)δPr(T > t, U =t|x)1−δ

=−∂

∂y Pr(T > y, U > t|x)y=tδ

·−∂

∂z Pr(T > t, U > z|x)z=t1−δ

(4)

where the joint survival functions Pr(T > y, U > t|x) and

Pr(T > t, U > z|x) depend on both the random variables

Tand U. This makes Tnon-identiﬁable without mak-

ing further assumptions on the censoring mechanism [88].

However, by assuming a speciﬁc structure on the depen-

dence between Tand U, as done with copula theory, L

can be written in terms of the marginals, as we will see in

Section 1.2. Before going into details on how the copula

function is integrated into the likelihood in Section 2, we

also describe the two other parts which our methodology

is based on, the Accelerated Failure Time model in sub-

section Section 1.3 and gradient boosting in Section 1.4.

1.2 Clayton Copula

Copulas are functions used to describe the dependency be-

tween random variables, and were introduced by [85]. The

theory of copulas is based on Sklar’s theorem, which states

that any multivariate joint distribution can be modelled

by its marginal distribution and a copula function,

F(x1, ..., xD) = C(F1(x1), ..., FD(xD)),(5)

where Fis a joint multivariate cumulative distribution

function, Cis the copula function and Fd, d = 1,· · · , D,

is the marginal cumulative distribution function for the

random variable Xd. Sklar’s theorem also relates a joint

survival function S(x1, ..., xD) with its marginal survival

functions Sd(xd) in an analogues way as a joint distribu-

tion function F(x1, ..., xD) with its margins Fd(xd) [25,

66, 40]. This means that the joint survival function of

the event time and censoring time ST,U (t, u|x) = Pr(T >

t, U > u|x) can be expressed in terms of their marginals

ST(t) = Pr(T > t|x), SU(u) = Pr(U > u|x) and a suitable

bivariate copula C,

ST,U (t, u|x) = Pr(T > t, U > u|x) = C(ST(t|x), SU(u|x)).

(6)

When taking a copula function of the survival functions

instead of the cumulative functions, it is referred to as the

survival copula of Tand U.

A copula function C: [0,1]×[0,1] 7→ [0,1] must satisfy

certain conditions in order to be valid:

1. C(a, 0) = C(0, b) = 0, C(a, 1) = a, and C(1, b) = b

for 0 ≤a≤1 and 0 ≤b≤1,

2. Cis non-decreasing in every argument,

which ensures that Cis the joint cumulative distribution

function of two random variables with uniform marginals.

One copula that satisﬁes these conditions is the Clayton

copula, one of the most prominent Archimedean copulas.

A bivariate Archimedean copula is deﬁned as

Cθ(v1, v2) = φ−1

θφθ(v1) + φθ(v2),(7)

where the function φθ, the generator of the copula, is con-

tinuous and strictly decreasing and θis a parameter that

describes the dependency strength between the two ran-

dom variables v1and v2. The Clayton copula has

φθ=t−θ−1

θ(8)

as generator with inverse function

φ−1

θ(t)=(tθ + 1)−1/θ,(9)

which satisﬁes conditions 1 and 2 as long as θ > 0. An

increasing θleads to a higher dependency, while if θ→0,

v1and v2goes towards independence.

The Clayton copula is the simplest copula among the

Archimedean copulas: It does not require any logarith-

mic or exponential operation and has only one parameter

governing the dependency strength. The Clayton copula

has already been exhaustively explored within the area of

survival analysis, as it has an historical important role in

the introduction of copulas within this ﬁeld [9, 68, 34, 83].

One reason for this is the mathematical simplicity of the

Clayton copula, which makes it preferable when modelling

the dependency between event and censoring time com-

pared to other copulas. The Clayton copula is also es-

pecially interesting within time-to-event prediction as its

survival version has an asymmetric structure and exhibits

greater dependence in the higher tail. An increasing de-

pendence between Tand Uwith higher values seems rea-

sonable in many real-world applications. An example is

the dataset FRICTION described in Section 3.3, where

censoring will happen almost exclusively at higher friction

values. Also for medical use cases, such as the GBSG2

dataset described in 3.3, it is likely to assume that cen-

soring mechanisms happening later in the disease progress

(e.g. death or drop-out) are more related to the disease

progress, while censoring mechanism happening early are

more generally caused by randomness.

If a dataset is known to have a dependency structure

with a greater dependency in the lower tail, it would be

beneﬁcial to explore the use of a Gumbel copula. If the

dependency structure is known to be symmetric, a Frank

copula would be suitable. However, due to the depen-

dency structure of the Clayton Copula and its mathemat-

ical simplicity and important role within survival analysis,

the Clayton Copula is explored in this work, limited to the

case when θ > 0 to satisfy conditions 1 and 2.

1.3 Accelerated Failure Time Model

The most popular regression models for handling censored

data is arguably the Cox Proportional Hazards (Cox PH)

model, highly utilized within biomedicine, followed by the

Accelerated Failure Time (AFT) model, which is more

used within reliability theory. The former assumes a mul-

tiplicative eﬀect of the covariates on the hazard rate. The

main reason why many prefer the Cox PH model is be-

cause it is semi-parametric, and estimation and inference

is possible without making any assumptions of the base-

line distribution. However, Cox PH is based on the pro-

portional hazard assumption, which means that the ratio

of hazards for two individuals is constant over time. If this

assumption does not hold, the Cox PH might give biased

predictions or it might not converge.

The AFT model, on the other hand, assumes that the

eﬀect of the covariates is to accelerate or decelerate the

event time by some constant. In this way, the parameters

of the AFT model are easier to interpret than for the Cox

PH, since they directly measure the eﬀect of the covariates

on the event time [94]. Another advantage with the AFT

model is that it does not need the proportional hazard

assumption. The main disadvantage with the AFT model

is that it is a parametric model, and one have to make

an assumption on the baseline event function. For a more

exhaustive discussion on the diﬀerences between the two

models, we refer the reader to [94] and [71].

In this work, we choose to work with the AFT model,

as this gives an easier access to the predicted event time

instead of modelling hazard functions. The original Ac-

celerated Failure Time model assumes a linear model for

the logarithm of the event time,

log(T) = βX+E,(10)

where βis the coeﬃcients, Xthe covariate matrix and

E=σZZis the error, that follows a baseline distribution

Z, and has mean 0 and variance σ2

Z. Typical choices for

probability distributions for Zis Gumbel, normal and lo-

gistic, which in terms leads to baseline functions for Tto

be Weibull, log-normal and log-logistic, respectively. Since

we do not want to limit our model to assume linear eﬀects

of the covariates, we generalize the AFT model by sub-

stituting the linear function of the covariates by a general

function

log(T) = h(X) + σZZ, (11)

where h(X) captures the eﬀect of Xon the response, and

can be estimated from any model in broad generality.

1.4 Gradient boosting and XGBoost

As mentioned in the introduction, from a machine learning

point of view this paper focuses on the gradient boosting

approach. As in any supervised learning setting, the goal

here is to construct the function h(x) that, given the vec-

tor of covariates x, provides a good estimate ˆy=h(x) of

the response y, which in our case is log(T). The good-

ness of the estimate is evaluated in terms of a loss func-

tion, L(y, ˆy). When working with time-to-event data, yis

formed by the event time tand the censoring indicator δ.

The idea behind gradient boosting is rather simple: Start-

ing from the null model, the algorithm iteratively improves

by ﬁtting a base learner, we will use a statistical tree, to

the negative gradient of the loss function computed at the

current model. Basically, at each iteration the algorithm

seeks the fastest way to minimize the loss function start-

ing from the current point and ﬁts the base learner to

capture it. The ensemble of all the ﬁts is the ﬁnal model.

Peculiar of gradient boosting, these single improvements

are made artiﬁcially small through a penalization param-

eter, to control the speed of the minimization process and

therefore better explore the model space. See Algorithm

3 for a schematic view of boosting.

Algorithm 1 Gradient boosting

1. Initialize h(x);

2. Update the model by, for k= 1, . . . , K,

2.1 compute the negative gradient of the loss function;

2.2 obtain the improvement hk(x) by ﬁtting the base

learner on the negative gradient;

3. Aggregate the results, h(x) = PK

k=0 hk(x).

To implement the boosting approach we use the eX-

treme Gradient Boosting algorithm [8], with statistical

trees as the base learner. A notable characteristic of XG-

Boost is that it uses, in addition to numerous compu-

tational tricks, a second order approximation of the loss

function in its computations, to speed up the procedure.

In particular, it turns the updating step 2 of Algorithm 3

into the minimization problem

hk(x) = argminhk(x)

L

y,

k−1

j=0

h(x)[j]



+g1hk(x) + 1

2g2h2

k(x)+ Ω(hk(x)),

(12)

where Pk−1

j=0 h(x)[j]is the current (i.e., up to the pre-

vious step) estimate of the model,

g1=∂

∂h(x)L(y, h(x))h(x)=Pk−1

j=0 h(x)[j]

g2=∂2

∂h(x)2L(y, h(x))h(x)=Pk−1

j=0 h(x)[j]

and Ω(hk(x)) is a penalty term that penalizes the base

learner complexity, in this case the tree. hk(x) is indeed

the statistical tree that solves Eq. 12. Noticeably, its

ﬁtting, i.e. the computation of the split points and the

leaf weights, only depends on the loss function through g1

and g2, and its complexity penalised by

Ω(hk(x)) = γTk+1

2λ||wk||2,(13)

where γcontrols the penalty for the number of tree leafs T

and λthe magnitude of the weights w, with || · || denoting

the L2norm. As we mentioned above, it is important for

the boosting algorithm that the update on each step does

not improve the model too much.

Note that the algorithm is very general, and merely

requires the speciﬁcation of the right loss function to ap-

ply XGBoost to a speciﬁc problem (and provide the al-

gorithms with the related ﬁrst and second derivatives).

The square loss L(y, ˆy)=(y−ˆy)2is usually implemented

for Gaussian regression, while the negative log-likelihood

of the binomial distribution for classiﬁcation problems.

Therefore, our main task here is to derive a loss function

that works for data involving dependent censoring. This

is covered in the next section.

2 Clayton-boost

Derivation of the appropriate loss function starts with

modelling the joint survival function in Eq. (4) in terms

of the marginal survival functions. Due to Sklar’s theo-

rem (Eq. 6), we know that the copula function yields a

valid model for the joint survival functions, such that the

likelihood can be expressed as

L=−∂

∂y Cθ(ST(y|x), SU(t|x))y=tδ

·−∂

∂z Cθ(ST(t|x), SU(z|x)z=t1−δ

(14)

Taking the log of the likelihood, `= log(L), and ex-

pressing Eq. (14) in terms of a Clayton copula (Eq. 8)

yields

`=δlog −∂

∂y ST(y|x)−θ+SU(t|x)−θ−1−1/θy=t

+ (1 −δ) log −∂

∂z ST(t|x)−θ+SU(z|x)−θ−1−1/θz=t,

(15)

which is on an computationally easier format, as the log-

likelihood depends only on the marginalized survival func-

tions and the dependency parameter θ. The expression

can be further simpliﬁed by the use of the chain rule

and basic rules for logarithms, and by using the fact that

S(t|x)=1−F(t|x) and ∂

∂y F(y|x) = f(y|x), to become

`=−1 + 1

θlog (1 −FT(t|x))−θ+ (1 −FU(t|x))−θ−1

−1 + θ)δlog 1−FT(t|xi)+δlog fT(t|x)

−1 + θ1−δlog 1−FU(t|x)+1−δlog fU(t|x).

(16)

Maximizing this log-likelihood function is the same as

minimizing its negative, so the loss function to be mini-

mized is

loss =1+ 1

θlog (1−FT(t|x))−θ+(1−FU(t|x))−θ−1+g(δ, t).

(17)

where

g(δ, t) = (1 + θ) log 1−FT(t|x)−log fT(t|x), δ = 1

1 + θlog 1−FU(t|x)−log fU(t|x), δ = 0.

The ﬁrst part of the loss function is independent of δ,

while the contribution of g(δ, t) depends on δand is either

aﬀected by the conditional distribution of Tif we observe

the true event time, or the conditional distribution of Uif

the time is censored.

Unfortunately, the distribution functions of Tand U,

conditional on the covariates, are rarely known. This can

be addressed by assuming a model for the eﬀect of the co-

variates on the event and censoring time, and assuming a

known baseline distribution, as done with an Accelerated

Failure Time model in Eq. (11). To write the loss function

in Eq. (17) in terms of the known baseline distribution Z,

we apply the transformation ω(Z) = exp(h(x) + σZZ) ac-

cording to the AFT model, and the fact that FTt=

FZω−1(t)and fT(t) = fZω−1(t)d

dt ω−1(t) since t=

ω(Z) and ωis an increasing, continuous, monotone func-

tion on Z={z:fZ(z)>0}. Applying this transforma-

tion yields:

FTt=FZlog(t)−h(x)

σZ=FZs(t)

fTt=fZlog(t)−h(x)

σZ1

σZt=fZs(t)

σZt,

(18)

where s(t) = log(t)−h(x)

σZ. This is a similar approach to the

independent censoring AFT implementation in XGBoost

[1], which we later refer to as Std-boost. However, since

our model also include information about the censoring

distribution, we apply the same transformation for the

distributions of the censoring times:

FUt=FVlog(t)−h(x)

σV=FVr(t)

fUt=fVlog(t)−h(x)

σV1

σVt=fVr(t)

σVt,

(19)

where FVand fVare the CDF and PDF of the random

variable V, which is distributed with mean 0, σ2

Vis the

variance of the error term and r(t) = log(t)−h(x)

σV.

Substituting these transformations into Eq. (17) gives

the ﬁnal loss function

loss =1 + 1

θlog (1 −FZ(s(t)|x))−θ

+ (1 −FV(r(t)|x))−θ−1+g(δ, t)(20)

where

g(δ, t) =











1 + θ) log 1−FZ(s(t)|x)

−log fZ(s(t)|x)1

σZt, δ = 1

1 + θlog 1−FV(r(t)|x)

−log fV(r(t)|x)1

σVt, δ = 0.

The loss function can be incorporated into many sta-

tistical and machine learning frameworks, as long as they

allow a customized loss function. In this work, we illus-

trate the eﬀect of the loss function by incorporating it

into a gradient boosting approach described in Section 1.4,

which gives us Clayton-boost, the boosting algorithm for

dependent censoring.

As seen in Eq. (20), Clayton-boost requires the user to

provide the baseline functions for both the event distribu-

tion Zand the censoring distribution V, as well as their

standard deviations σZand σV. When used in practice,

these distributions can be empirically inferred from the

data by looking at the sampling distributions for both the

uncensored and censored event times. Another approach

is to test diﬀerent distributions and standard deviations on

the dataset by using cross validation, as done in [1], and

choosing the distribution which provides the best perfor-

mance. The latter approach can be done when no prelim-

inary information is possible to retrieve from the data, or

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Acopula-basedboostingmodelfortime-to-eventpredictionwithdependentcensoringAliseDanielleMidtfjord1,*,RiccardoDeBin1,andArneBangHuseby11DepartmentofMathematics,UniversityofOslo,NorwayAbstractAcharacteristicfeatureoftime-to-eventdataanalysisispossiblecensoringoftheeventtime.Mostofthestatisticallearning...

展开>> 收起<<

A copula-based boosting model for time-to-event prediction with dependent censoring Alise Danielle Midtfjord1 Riccardo De Bin1 and Arne Bang Huseby1.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A copula-based boosting model for time-to-event prediction with dependent censoring Alise Danielle Midtfjord1 Riccardo De Bin1 and Arne Bang Huseby1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: