
Personalized Treatment Selection via Product Partition Models with Covariates
where HC denotes a half-Cauchy distribution, {λpk}are local shrinkage parameters, and {τk}are global shrinkage
parameters. All coefficients will be nonzero, but only those supported by the data will have large values due to
the heavy tails of the prior. The joint distribution of the clustering and the cluster-specific parameters (Pa
na,ηa⋆
j), is
assumed to be independent across treatments. Therefore we will omit the superscript athroughout Sections 3and 4. In
particular, we assume a product partition model with covariates (PPMx, M¨
uller et al.,2011), that induces independence
across clusters and conditional independence within clusters. We detail our proposal for the PPMx on Pnin Section
4. Here, given Pn, we details the prior for η⋆
j,j= 1, . . . , Cn. We assume conditional independence between clusters,
that is η⋆
j∼G0, for j= 1, . . . , Cn, where G0is a prior for cluster-specific parameters. Then, the joint law of (Pn,η⋆
j)
is assigned hierarchically as:
η⋆
j
iid
∼G0,for j= 1, . . . , Cn,Pn|x∼P P Mx(x).
Specifically, we take G0to be a K−dimensional multivariate normal distribution and assume that η⋆
j|θ,Λiid
∼
NK(θ,Λ−1). To achieve more flexibility, we add an extra layer of hierarchy by assuming θ|µ0,Λ, ν0∼
NK(µ0,(ν0Λ)−1)and Λ|s0,Λ0
iid
∼W(Λ0, s0), where Wis a Wishart distribution, with mean s0Λ0. As custom-
ary hyperparameter choice, we set µ0to be the K−dimensional vector of 0,s0=K+ 2,Λ0to be a K×Kdiagonal
matrix with elements on the diagonal being equal to 10, and ν0= 10. Elicitation for the latter two parameters is
discussed in Supplementary Material A.
4 Bayesian Nonparametric Covariate Driven Clustering
In this section, we introduce the Product Partition Model (PPM) and describe its extension to incorporate the Normal-
ized Generalized Gamma process (NGGP). We follow M¨
uller et al. (2011)’s approach to integrate predictive biomark-
ers into the model, making the random partition dependent on predictive markers. We devise a covariate-dependent
prior on the random partition that enables predictive markers to drive the clustering process. Thereby, we induce clus-
ters of homogeneous observations in terms of predictive biomarkers. The resulting model defines independence across
clusters and exchangeability only within clusters. The joint evaluation of prognostic and predictive covariates guides
the optimal treatment selection, our main inferential goal. Still, only the predictive markers identify patients likely to
benefit from a particular therapy. In this way, we may quantify the extent of benefit offered by a specific treatment on
groups of patients characterized by similar values of the predictive markers. We denote with Pn:= {S1, . . . , SCn}
the partition of the data label set {1, . . . , n}into Cnsubsets Sj, for j= 1, . . . , Cnand with nj=|Sj|being the
cardinality of cluster j. In the seminal paper by Hartigan (1990) the prior on Pnis assigned by letting
p(Pn) = Vn,Cn
Cn
Y
j=1
ρ(Sj),(4.1)
where ρ(·)is referred to as cohesion function, and quantifies the unnormalized probability of each cluster (M¨
uller
et al.,2011). Moreover, Vn,Cnis a normalizing constant assuring that the prior sum up to one over the space of all
partitions of the integers {1, . . . , n}. If ρ(Sj)is only a function of nj=|Sj|, then the resulting model for Pnis
invariant under permutations of the labels of the set of integers {1, . . . , n}. Under this assumption, the resulting model
for Pnfalls in the class of Gibbs-type priors (Gnedin and Pitman,2006). In this framework, the cohesion assume
the analytical expression ρ(Sj) = (1 −σ)nj−1with σ < 1and (1 −σ)nj−1being the rising factorials, defined as
(a)n=a(a+1) . . . (a+n−1), with (a)0= 1;p(Pn)is denoted as exchangeable partition probability function (eppf)
and the normalizing constant Vn,Cnmust satisfy the triangular recursion Vn,Cn=Vn+1,Cn(n−σCn) + Vn+1,Cn+1
for each n > 1and 1≤k≤nwith the proviso that V1,1= 1. Note that, since ρ(Sj)is an increasing function
of the cluster size nj, heavily populated clusters are more likely. This leads to the rich-get-richer behaviour in the
clustering induced by the BNP prior. The connection between product partition models and Gibbs-type prior has
been deeply investigated since the seminal paper by Quintana and Iglesias (2003), see also De Blasi et al. (2013).
In this paper we choose σ≥0and introduce a new parameter κ > 0such that: Vn,Cn=1
Γ(n)R∞
0un−1exp−
(1/σ)[(κ+u)σ−κσ](κ+u)−n+σCndu. In this way, the law of Pncoincides with the one induced by the Normalized
generalized gamma process (NGGP, Lijoi et al.,2007). The NGGP encompasses the well known Dirichlet process
(DP) when σ= 0. In particular, Lijoi et al. (2007) highlighted the role of σin the predictive mechanism of an
NGGP: p(˜
η∈ · | Pn,η⋆
1,...,η⋆
Cn) = Vn+1,Cn+1
Vn,Cn
G0(·) + Vn+1,Cn
Vn,CnPCn
j=1(nj−σ)δη⋆
j.The above formula describes
the rule used to assign a new observation to a cluster, where the summand on the left represents the probability of
forming a new group, and the one on the right represents the probability of being assigned to an already observed
4