PERSONALIZED TREATMENT SELECTION VIA PRODUCT PARTITION MODELS WITH COVARIATES Matteo Pedone

2025-04-26 0 0 838.95KB 31 页 10玖币
侵权投诉
PERSONALIZED TREATMENT SELECTION VIA PRODUCT
PARTITION MODELS WITH COVARIATES
Matteo Pedone
Department of Statistics, Computer Science and Applications
University of Florence
matteo.pedone@unifi.it
Raffaele Argiento
Department of Economics
University of Bergamo
raffaele.argiento@unibg.it
Francesco C. Stingo
Department of Statistics, Computer Science and Applications
University of Florence
francescoclaudio.stingo@unifi.com
ABSTRACT
Precision medicine is an approach for disease treatment that defines treatment strategies based on
the individual characteristics of the patients. Motivated by an open problem in cancer genomics,
we develop a novel model that flexibly clusters patients with similar predictive characteristics and
similar treatment responses; this approach identifies, via predictive inference, which one among a
set of treatments is better suited for a new patient. The proposed method is fully model-based,
avoiding uncertainty underestimation attained when treatment assignment is performed by adopting
heuristic clustering procedures, and belongs to the class of product partition models with covariates,
here extended to include the cohesion induced by the Normalized Generalized Gamma process. The
method performs particularly well in scenarios characterized by considerable heterogeneity of the
predictive covariates in simulation studies. A cancer genomics case study illustrates the potential
benefits in terms of treatment response yielded by the proposed approach. Finally, being model-
based, the approach allows estimating clusters’ specific response probabilities and then identifying
patients more likely to benefit from personalized treatment.
1 Introduction
Cancer comprises a collection of complex diseases characterized by heterogeneous cellular alterations across patients
and cancer cells within the same neoplasm (Bedard et al.,2013). Patients with similar clinical diagnoses may show
diverse responses to the same treatment due to tumor heterogeneity. A treatment for a particular diagnosis may
be effective on average, but its effectiveness may vary across subpopulations. In recent years many attempts have
been made to devise personalized treatment strategies that leverage patients’ characteristics, including the tumor’s
genome, to identify the treatment with the highest likelihood of success (Simon,2010). Within this precision medicine
paradigm, there is an increasing interest in discovering individualized treatment rules (ITRs) for patients that show
heterogeneous responses to treatment, e.g., when the treatment effect varies across groups of patients. An ITR is
a decision rule that assigns the patient to the treatment given patient/disease characteristics (Ma et al.,2015). The
optimal ITR is the one that maximizes the population mean outcome. Statistical methodology research in precision
medicine is devoted to developing personalized treatment rules to inform decision-making. The distinctive mark
of statistical inference under the precision medicine paradigm is to leverage heterogeneity to improve therapeutic
strategies (Kosorok and Laber,2019).
Our interest specifically lies in developing frontline treatment selection rules rather than estimating treatment’s causal
effects, as commonly done within the ITR framework. Conventional methods for treatment selection rules are based
on semi- and non-parametric procedures to identify subgroups of patients more likely to benefit from a treatment
leveraging few baseline markers (Bonetti and Gelber,2000;Song and Pepe,2004). The subgroup approach can provide
arXiv:2210.06030v2 [stat.ME] 1 Sep 2023
Personalized Treatment Selection via Product Partition Models with Covariates
valuable information when performed according to a prespecified analysis plan. Nonetheless, stratified subsets of
patients defined by one or few biomarkers are often inadequate to account for patient heterogeneity and ultimately fail
to establish effective treatment selection rules (Pocock et al.,2002). Other approaches account for patient heterogeneity
by including covariates (Zhang et al.,2012;Zhao et al.,2012). However, for these methods, the correct definition of
treatment-by-markers interactions is crucial and relies on sensitive assumptions, which are difficult to specify in the
clinical practice and may be limited to generalized linear models (Ma et al.,2016).
To overcome these limitations, Ma et al. (2016,2018,2019) have established a hybrid two-step predictive model for
personalized treatment selection. In the first step, a clustering algorithm based on a pre-defined genomic signature
(predictive markers) is used to obtain a heuristic measure of the patients’ molecular similarity. In the second step,
given this measure of patients’ similarity and a set of prognostic markers, a Bayesian model is used for treatment
selection; specifically, for a new, untreated patient, the model predicts the treatment response probabilities for each
competing treatment. This framework establishes two significant improvements over existing methods. Firstly, the
common assumption of statistical exchangeability among patients is relaxed. Since each tumor is unique, patients are
considered partially exchangeable only to the extent to which their tumors are molecularly similar. Moreover, this
approach utilizes complementary sources of information for treatment selection, integrating predictive and prognostic
characteristics of a patient.
This paper proposes a Bayesian predictive model for personalized treatment selection that builds upon Ma et al. (2019)
and overcomes some of its main limitations. As in Ma et al. (2019), we leverage prognostic determinants and predictive
biomarkers for treatment selection. We propose a fully Bayesian integrative framework for clustering and prediction
that performs all inferential tasks in a single model avoiding multi-step procedures; the proposed approach results in
a treatment selection rule that fully accounts for patients’ heterogeneity. Note that in Ma et al. (2019), the patients’
similarities were estimated in the first step and included as known quantities in the second step; moreover, in the first
step, two arbitrary choices had to be made, namely the clustering algorithm and the number of clusters. The proposed
method accounts for the uncertainty in all modeling steps, resulting in improved prediction performances. In particular,
we use a product partition model with covariates (PPMx, M¨
uller et al.,2011) to cluster observations that are similar in
terms of the values of the predictive covariates; specifically, the predictive covariates enter the model through the prior
for the random partition. The resulting partitions are only partially exchangeable, and patients with similar covariates
are a priori more likely to be clustered together. In this paper, we use the cohesion function induced by the Normalized
Generalized Gamma process (NGGP) as a building block of our PPMx model to mitigate the rich-get-richer property
of the Bayesian nonparametric (BNP) priors. Namely, the rich-get-richer is the tendency for a small number of clusters
to become overrepresented as more data points are added to the process, resulting in few large clusters and potentially
many singletons. Despite being well studied in the Bayesian nonparametric literature as a prior inducing a Gibbs-type
random partition (Lijoi et al.,2007), NGGP still has no common use. To the best of our knowledge, this is one of the
first attempts the NGGP is employed as cohesion function in a PPMx model (see Argiento et al.,2022).
We devise a method that, given the patients’ prognostic and predictive markers, assigns them to the treatment with
the highest likelihood of positive response. Prognostic covariates influence disease progression regardless of the
treatments given to the patient, whereas predictive covariates change the likelihood of a positive response to a particular
treatment. Conceptually, our strategy for selecting the optimal treatment for the new, untreated patient can be broken
down into three steps. First, we consider historical patients and cluster them separately for each treatment according to
their predictive markers. In this way, patients that underwent the same treatment are divided into homogeneous clusters
with respect to predictive biomarkers. Then, we compute the utility provided by each competing treatment to the new
untreated patient by assigning the new patient to the subgroup of historical patients with whom he shows the largest
similarity in terms of predictive markers. The utility function relies on the model’s posterior predictive distribution,
which depends on both prognostic and predictive biomarkers. Finally, we select the treatment that ensures the largest
predicted benefit.
We apply the proposed method to a brain cancer dataset (Ma et al.,2019), comprising 158 patients equally assigned to
either standard or targeted treatment. For each patient, prognostic and predictive biomarkers, both consisting of pre-
selected genomics markers, are available in addition to their categorical response to treatment. To facilitate optimal
treatment selection, we assign numerical utilities to each treatment response level. This leads to a median utility score,
which serves as a one-dimensional criterion for treatment selection. Our model shows good predictive performances
and provides a sound framework for the identification and interpretation of clusters of patients.
2 Bayesian Integrative Model
We consider nhistorical patients treated with Talternative treatments, whose predictive and prognostic biomarkers
are measured along with a discrete set of response levels of the clinical outcome. Let a= 1, . . . , T index treatments
2
Personalized Treatment Selection via Product Partition Models with Covariates
and n=PT
a=1 nabe the total number of treated patients, of which naassigned to therapy a. Note that, in our
notation the superscript ais solely a treatment index. The treatment response ya
iof patient iis a categorical variable
with Klevels that encodes the residual disease extent after a clinically relevant post-therapy follow-up period. In
particular, ya
ifollows a multinomial distribution ya
i|πa
i
ind
Multinomial(1,πa
i), for i= 1, . . . , na, with associated
probability vector πa
i= (πa
i1, . . . , πa
iK );πa
ik is the probability of observing outcome kfor the ith patient under
treatment a, for k= 1,...K. These probabilities will depend on za
iand xa
i, the Pand Qdimensional vector of
prognostic and predictive features measured on the ith patient that received treatment a, respectively. We assume
that patients with similar predictive biomarkers and the same prognostic covariates will respond similarly to a given
treatment. To quantify the effectiveness of each competing treatment for patients with similar values of the predictive
biomarkers, we adopt a covariate-dependent random partition model (RPM). For each treatment a= 1, . . . , T , patients
receiving treatment aare partitioned into clusters based on their predictive biomarkers xa. Namely, we make the
random partitions depend on predictive biomarkers. Section 4will describe the covariate-dependent RPM we use to
achieve this goal. In this section, we assume that Pa
na={Sa
1, . . . , Sa
Ca
na}is a given treatment-specific partition of the
indices {1, . . . , na}, where Ca
nais the number of clusters among patients treated with therapy aand na
j=|Sa
j|is the
cardinality of cluster j, for j= 1, . . . , Ca
na. Since we will later treat the partition of the units as a random quantity,
the partition itself and the number of clusters depend on the number of observations, na. Following a common
convention, we identify cluster-specific quantities using the superscript “”. For example, when considering cluster
Sa
j, the response vector is ya⋆
j={ya
i:iSa
j}, while xa⋆
j={xa
i:iSa
j}is the partitioned covariate matrix. We
define the following hierarchical model for the response variables:
ya
i|πa
i
ind
Multinomial(1,πa
i)
πa
1,...,πa
na|ηa⋆
1,...,ηa⋆
Ca
na,Pa
na,β
Ca
na
Y
j=1 Y
iSa
j
Dirichlet(γa
i(ηa⋆
j,β)),(2.1)
where γa
i(ηa⋆
j,β)=(γa
i1(ηa⋆
j1,β1), . . . , γa
iK (ηa⋆
jK ,βK))is a vector of log-linear functions of the prognostic markers
and cluster-specific parameters defined as follows:
log(γa
ik(ηa⋆
jk ,βk)) = ηa⋆
jk +β1kza
i1+· · · +βP kza
iP .(2.2)
Model (2.1) is robust with respect to overdispersion (Corsini and Viroli,2022), which is usually observed in multi-
variate categorical data (Chen and Li,2013). Predictive biomarkers, xa, enter equation (2.2) through the parameter
vectors ηa
1,...,ηa
Ca
naand the partition Pa
nathat depends on the predictive covariates xa, as we will elaborate in Sec-
tion 3. The K-dimensional vectors ηa⋆
1,...,ηa⋆
Ca
naare cluster-specific parameters; high values of ηa⋆
jk correspond to
an high probability of observing response kfor an individual treated with treatment ain cluster j. We enforce ηa⋆
j
to be treatment-specific, and, as a consequence, the partitions {Pa
na}a=1,...,T are independent across treatments. This
construction provides a comparison among competing treatments. In fact, it allows patients with close genetic profiles
that received different treatments to have distinct response probabilities. Finally, β= (β1,...,βK)is a P×Kmatrix
of regression parameters shared across treatments. Prognostic biomarkers enter equation (2.2) as linear terms. Since
prognostic determinants impact the likelihood of achieving a given therapeutic response regardless of the treatment,
the associated coefficients are defined across therapies. Thus, prognostic covariates set a baseline response probability
measure. Since patients should not be regarded as statistically exchangeable with respect to predictive biomarkers (Ma
et al.,2016), we leverage predictive biomarkers to drive the clustering process within each treatment. The resulting
cluster-specific parameters ηa⋆
jassess the benefit offered by a specific treatment on groups of similar patients. Note
that the linear predictor is a function of the prognostic biomarkers only: the predictive covariates enter non-linearly
equation (2.2) only through the cluster- and treatment-specific parameters ηa⋆
j. This construction results in a random
intercept that estimates the adjustment provided by predictive biomarkers to the baseline prognostic response prob-
ability on account of groups of patients with close predictive determinants. Note that, while the Multinomial logit
model (Agresti,2019) could have provided similar predictive performance, its interpretation would have been less
straightforward since the parameters represent log odds ratios with respect to a specific baseline response level.
3 Prior distributions
We assume independent shrinkage priors for the parameters βk. In particular, we adopt horseshoe priors (Carvalho
et al.,2010), which belong to the class of global-local scale mixtures of normals. More in details, for p= 1, . . . , P
and k= 1, . . . , K
βpk
iid
N(0, λ2
pkτ2
k), λpk, τk
iid
HC(0,1),
3
Personalized Treatment Selection via Product Partition Models with Covariates
where HC denotes a half-Cauchy distribution, {λpk}are local shrinkage parameters, and {τk}are global shrinkage
parameters. All coefficients will be nonzero, but only those supported by the data will have large values due to
the heavy tails of the prior. The joint distribution of the clustering and the cluster-specific parameters (Pa
na,ηa⋆
j), is
assumed to be independent across treatments. Therefore we will omit the superscript athroughout Sections 3and 4. In
particular, we assume a product partition model with covariates (PPMx, M¨
uller et al.,2011), that induces independence
across clusters and conditional independence within clusters. We detail our proposal for the PPMx on Pnin Section
4. Here, given Pn, we details the prior for η
j,j= 1, . . . , Cn. We assume conditional independence between clusters,
that is η
jG0, for j= 1, . . . , Cn, where G0is a prior for cluster-specific parameters. Then, the joint law of (Pn,η
j)
is assigned hierarchically as:
η
j
iid
G0,for j= 1, . . . , Cn,Pn|xP P Mx(x).
Specifically, we take G0to be a Kdimensional multivariate normal distribution and assume that η
j|θ,Λiid
NK(θ,Λ1). To achieve more flexibility, we add an extra layer of hierarchy by assuming θ|µ0,Λ, ν0
NK(µ0,(ν0Λ)1)and Λ|s0,Λ0
iid
W(Λ0, s0), where Wis a Wishart distribution, with mean s0Λ0. As custom-
ary hyperparameter choice, we set µ0to be the Kdimensional vector of 0,s0=K+ 2,Λ0to be a K×Kdiagonal
matrix with elements on the diagonal being equal to 10, and ν0= 10. Elicitation for the latter two parameters is
discussed in Supplementary Material A.
4 Bayesian Nonparametric Covariate Driven Clustering
In this section, we introduce the Product Partition Model (PPM) and describe its extension to incorporate the Normal-
ized Generalized Gamma process (NGGP). We follow M¨
uller et al. (2011)’s approach to integrate predictive biomark-
ers into the model, making the random partition dependent on predictive markers. We devise a covariate-dependent
prior on the random partition that enables predictive markers to drive the clustering process. Thereby, we induce clus-
ters of homogeneous observations in terms of predictive biomarkers. The resulting model defines independence across
clusters and exchangeability only within clusters. The joint evaluation of prognostic and predictive covariates guides
the optimal treatment selection, our main inferential goal. Still, only the predictive markers identify patients likely to
benefit from a particular therapy. In this way, we may quantify the extent of benefit offered by a specific treatment on
groups of patients characterized by similar values of the predictive markers. We denote with Pn:= {S1, . . . , SCn}
the partition of the data label set {1, . . . , n}into Cnsubsets Sj, for j= 1, . . . , Cnand with nj=|Sj|being the
cardinality of cluster j. In the seminal paper by Hartigan (1990) the prior on Pnis assigned by letting
p(Pn) = Vn,Cn
Cn
Y
j=1
ρ(Sj),(4.1)
where ρ(·)is referred to as cohesion function, and quantifies the unnormalized probability of each cluster (M¨
uller
et al.,2011). Moreover, Vn,Cnis a normalizing constant assuring that the prior sum up to one over the space of all
partitions of the integers {1, . . . , n}. If ρ(Sj)is only a function of nj=|Sj|, then the resulting model for Pnis
invariant under permutations of the labels of the set of integers {1, . . . , n}. Under this assumption, the resulting model
for Pnfalls in the class of Gibbs-type priors (Gnedin and Pitman,2006). In this framework, the cohesion assume
the analytical expression ρ(Sj) = (1 σ)nj1with σ < 1and (1 σ)nj1being the rising factorials, defined as
(a)n=a(a+1) . . . (a+n1), with (a)0= 1;p(Pn)is denoted as exchangeable partition probability function (eppf)
and the normalizing constant Vn,Cnmust satisfy the triangular recursion Vn,Cn=Vn+1,Cn(nσCn) + Vn+1,Cn+1
for each n > 1and 1knwith the proviso that V1,1= 1. Note that, since ρ(Sj)is an increasing function
of the cluster size nj, heavily populated clusters are more likely. This leads to the rich-get-richer behaviour in the
clustering induced by the BNP prior. The connection between product partition models and Gibbs-type prior has
been deeply investigated since the seminal paper by Quintana and Iglesias (2003), see also De Blasi et al. (2013).
In this paper we choose σ0and introduce a new parameter κ > 0such that: Vn,Cn=1
Γ(n)R
0un1exp
(1)[(κ+u)σκσ](κ+u)n+σCndu. In this way, the law of Pncoincides with the one induced by the Normalized
generalized gamma process (NGGP, Lijoi et al.,2007). The NGGP encompasses the well known Dirichlet process
(DP) when σ= 0. In particular, Lijoi et al. (2007) highlighted the role of σin the predictive mechanism of an
NGGP: p(˜
η∈ · | Pn,η
1,...,η
Cn) = Vn+1,Cn+1
Vn,Cn
G0(·) + Vn+1,Cn
Vn,CnPCn
j=1(njσ)δη
j.The above formula describes
the rule used to assign a new observation to a cluster, where the summand on the left represents the probability of
forming a new group, and the one on the right represents the probability of being assigned to an already observed
4
Personalized Treatment Selection via Product Partition Models with Covariates
group. It is apparent that larger values of σincrease the probability of generating new groups. From our simulation
study (Supplementary Material B.1; see also Lijoi et al.,2007, Section 3.2) large values of σalso reduce the number
of estimated singletons. These behaviours result in mitigating the rich-get-richer property of BNP priors. We also
assume a discrete prior distribution for (κ, σ). In this way, we let the data choose the appropriate reinforcement rate
(Lijoi et al.,2007), and we overcome a critical “trade-off” occurring when κand σare set to a fixed value. Indeed,
both the parameters σand κhave an effect on the number of clusters Cnand on the reinforcing mechanism (see Lijoi
et al.,2007;Favaro et al.,2013;Argiento et al.,2016, for a deep discussion). We mention that both have an increasing
effect on the probability of observing a new cluster and the prior (and posterior) number of clusters. Interestingly,
σalso enters the expression of the weights of existing clusters and, as observed before, reduces the probability of
clusters with few elements. We refer to this double effect of σas the “trade-off” between the number of clusters and
reinforcement. In particular, for (κ, σ)we adopted a discrete prior on a 10×10 grid in (0,15)×(0.0,0.6), such that the
marginal distribution are discrete approximation of κGamma(2,1) and σBeta(5,23), respectively. Extending
the work by M¨
uller et al. (2011), we aim at obtaining a prior for the random partition that encourages two subjects to
co-cluster when they have similar covariates, i.e., predictive biomarkers. In particular, the prior on the random partition
is defined perturbing the cohesion function of a product partition model in equation (4.1) via a similarity function g
inducing the desired dependence on covariates. More in detail, the similarity function gis a non-negative function that
depends on the covariates associated with subjects in each cluster. Let xidenote the covariates for the ith unit, while
x
j= (xi, i Sj)represents the covariates arranged by cluster. The product partition distribution with covariates is
p(Pn)Vn,Cn
Cn
Y
j=1
ρ(nj)g(x
j).(4.2)
The choice of the similarity function is of paramount importance for our modeling. It measures the homogeneity of
covariates arranged by clusters, and thus, the more the covariates take similar values, the larger the value of gmust be.
The default choice, proposed by M¨
uller et al. (2011), defines gas the marginal probability of an auxiliary Bayesian
model. Several alternatives can be taken (see for example Page and Quintana,2018;Argiento et al.,2022), since the
only requirement for gis to be a symmetric non-negative function. We implement the “Double Dipper” similarity
function because it has been shown to work well both in settings with a large number of covariates and in settings
where prediction is the main inferential goal (Page and Quintana,2016,2018):
g(x
j) =
Q
Y
q=1 ZY
iSj
p(xiq|ξ
j)p(ξ
j|x
jq)dξ
j,(4.3)
with p(ξ
j|x
jq)QiSjp(xiq|ξ
j)p(ξ
j). This structure is not due to any probabilistic properties since the covariates
are not considered random, but it measures the similarity of the covariates in cluster Sj. The name comes from the
fact that the covariates are used twice and correspond to the x
js posterior predictive. The model in equation (4.3) is
completed by assuming p(·|ξ
j) = N(·|m
j, v
j), where N(·|m, v)is a Gaussian density with mean mand variance v,
and p(ξ
j) = p(m
j, v
j) = NIG(m
j, v
j|m0, k0, v0, n0)is the Normal-Inverse-Gamma density function. The resulting
similarity function can model scenarios with heterogeneous within-cluster variability. We follow Page and Quintana
(2018) and set the parameters of the Normal-Inverse-Gamma density to the default values m0= 0, k0= 1.0, v0=
1.0, n0= 2; since there is no notion of the xibeing random, parameters ξ
jare not updated. Approaches based
on covariate-dependent random partition perform well if the clustering is not completely driven by covariates. As
the number of covariates increases, similarity functions tend to overwhelm the information provided by the response,
completely driving the clustering process. To counteract this behavior, we calibrate the influence of covariates on
clustering. To this end, with an abuse of notation, gin equation (4.2) is taken to be g(x
j) := g(x
j)1/Q, namely
a small variation of the coarsened similarity function by Page and Quintana (2018). The impact of the cohesion and
similarity functions on the number of clusters is evaluated in a simulation study reported in Supplementary Material B;
in summary, this simulation study demonstrates the effectiveness of the NGGP in controlling the prior mass allocated
to different partitions through the reinforcement mechanism induced by σ. Additionally, we observe that the covariates
included in the prior effectively drive the clustering process, as desired.
5 Posterior Inference and Treatment Selection
We implement an MCMC algorithm to simulate from the posterior distribution of the parameters of interest. The
core part of the MCMC algorithm is the update of cluster membership; the computation associated with the joint law
of (Pa
na,ηa⋆
j)is based on Neal (2000)’s Algorithm 8 with a reuse strategy (Favaro et al.,2013). Conditional on the
updated cluster labels, all the remaining parameters are easily updated with Gibbs sampler or Metropolis-Hastings
5
摘要:

PERSONALIZEDTREATMENTSELECTIONVIAPRODUCTPARTITIONMODELSWITHCOVARIATESMatteoPedoneDepartmentofStatistics,ComputerScienceandApplicationsUniversityofFlorencematteo.pedone@unifi.itRaffaeleArgientoDepartmentofEconomicsUniversityofBergamoraffaele.argiento@unibg.itFrancescoC.StingoDepartmentofStatistics,Co...

展开>> 收起<<
PERSONALIZED TREATMENT SELECTION VIA PRODUCT PARTITION MODELS WITH COVARIATES Matteo Pedone.pdf

共31页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:31 页 大小:838.95KB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 31
客服
关注