Bayesian mixture models inconsistency for the number of clusters Louise Alamichel1 Daria Bystrova1 Julyan Arbel1 Guillaume Kon Kam King2

2025-05-06 0 0 2.18MB 55 页 10玖币
侵权投诉
Bayesian mixture models (in)consistency
for the number of clusters
Louise Alamichel1,, Daria Bystrova1,, Julyan Arbel1& Guillaume Kon Kam King2
1Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France
{louise.alamichel, daria.bystrova, julyan.arbel}@inria.fr
2Universit´e Paris-Saclay, INRAE, MaIAGE, 78350 Jouy-en-Josas, France
guillaume.kon-kam-king@inrae.fr
Abstract
Bayesian nonparametric mixture models are common for modeling complex data.
While these models are well-suited for density estimation, recent results proved pos-
terior inconsistency of the number of clusters when the true number of components is
finite, for the Dirichlet process and Pitman–Yor process mixture models. We extend
these results to additional Bayesian nonparametric priors such as Gibbs-type processes
and finite-dimensional representations thereof. The latter include the Dirichlet multi-
nomial process, the recently proposed Pitman–Yor, and normalized generalized gamma
multinomial processes. We show that mixture models based on these processes are also
inconsistent in the number of clusters and discuss possible solutions. Notably, we show
that a post-processing algorithm introduced for the Dirichlet process can be extended
to more general models and provides a consistent method to estimate the number of
components.
Keywords: Clustering; Finite mixtures; Finite-dimensional BNP representations; Gibbs-type pro-
cess
1 Introduction
Motivation. Mixture models appeared as a natural way to model heterogeneous data,
where observations may come from different populations. Complex probability distributions
can be broken down into a combination of simpler models for each population. Mixture
models are used for density estimation, model-based clustering (Fraley and Raftery,2002)
This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded
by the French program Investissements d’Avenir.
Equal contribution.
1
arXiv:2210.14201v3 [math.ST] 30 May 2024
and regression (M¨uller et al.,1996). Due to their flexibility and simplicity, they are widely
used in many applications such as healthcare (Ram´ırez et al.,2019,Ullah and Mengersen,
2019), econometrics (Fr¨uhwirth-Schnatter et al.,2012), ecology (Attorre et al.,2020) and
many others (further examples in Fr¨uhwirth-Schnatter et al.,2019).
In a mixture model, data X1:n= (X1, . . . , Xn), Xi∈ X Rpare modeled as coming
from a K-components mixture distribution. If the mixing measure Gis discrete, i.e. G=
PK
i=1 wiδθiwith positive weights wisumming to one and atoms θi, then the mixture density
is
fX(x) = Zf(x|θ)G(dθ) =
K
X
k=1
wkf(x|θk),(1)
where f(· | θ) represents a component-specific kernel density parameterized by θ. We denote
the set of parameters by θ1:K= (θ1, . . . , θK), where each θkRd, k = 1, . . . , K. Model (1)
can be equivalently represented through latent allocation variables z1:n= (z1, . . . , zn), zi
{1, . . . , K}. Each zidenotes the component from which observation Xicomes: p(Xi|θk) =
p(Xi|zi=k) with wk=P(zi=k). Allocation variables zidefine a clustering such that
Xiand Xjbelong to the same cluster if zi=zj. Moreover, z1, . . . , zndefine a partition
A= (A1, . . . , AKn) of {1, . . . , n}, where Kndenotes the number of clusters.
It is important to distinguish between the number of components K, which is a model
parameter, and the number of clusters Kn, which is the number of components from which
we observed at least one data point in a dataset of size n(Fuhwirth-Schnatter et al.,2021,
Argiento and De Iorio,2022,Greve et al.,2022). For a data-generating process with K0
components, inference on Kis typically done by considering the number of clusters Knand
the present article investigates to what extent this is warranted.
Although mixture models are widely used in practice, they remain the focus of active
theoretical investigations, owing to multiple challenges related to the estimation of mix-
ture model parameters. These challenges stem from identifiability problems (Fr¨uhwirth-
Schnatter,2006), label switching (Celeux et al.,2000), and computation complexity due to
the large dimension of parameter space.
Another critical question, which is the main focus of this article, regards the number
of components and clusters, and whether it is possible to infer them from the data. This
question is even more crucial when the aim of inference is clustering. The typical approach
to estimating the number of components in a mixture is to fit models of varying complex-
ity and perform model selection using a classic criterion such as the Bayesian Information
Criterion (BIC), the Akaike Information Criterion (AIC), etc. This approach is not entirely
satisfactory in general, because of the need to fit many separate models and the difficulty of
performing a reliable model selection. Therefore, several methods that bypass the need to
2
fit multiple models have been proposed. They define a single flexible model accommodating
various possibilities for the number of components: mixtures of finite mixtures, Bayesian
nonparametric mixtures, and overfitted mixtures. These methods have been prominently
proposed in the Bayesian framework, where the specification of prior information is a pow-
erful and versatile method to avoid overfitting by unduly complex mixture models.
Three types of discrete mixtures. Although we consider discrete mixing measures,
Gcould be any probability distribution (for continuous mixing measures, see for instance
Chapter 10 in Fr¨uhwirth-Schnatter et al.,2019). Depending on the specification of the
mixing measure, there exist three main types of discrete mixture models: finite mixture
models where the number of components Kis considered fixed (known, equal to K0, or
unknown), mixture of finite mixtures (MFM) where Kis random and follows some specific
distribution, and infinite mixtures where Kis infinite. Under a Bayesian approach, the latter
category is often referred to as Bayesian nonparametric (BNP) mixtures.
Specification of the number of components Kis different for the three types of mixtures.
When Kis unknown, the Bayesian approach provides a natural way to define the number of
components by considering it random and adding a prior for Kto the model, as is done for
mixtures of finite mixtures. Inference methods for MFM were introduced by Nobile (1994),
Richardson and Green (1997).
Using Bayesian nonparametric (BNP) priors for mixture modeling is another way to by-
pass the choice of the number of components K. This is achieved by assuming an infinite
number of components, which adapts the number of clusters found in a dataset to the struc-
ture of the data. The most commonly used BNP prior is the Dirichlet process introduced by
Ferguson (1973) and the corresponding Dirichlet process mixture was first introduced by Lo
(1984). The success of the Dirichlet process mixture is based on its ease of implementation
and computational tractability. However, in some cases the Dirichlet process prior may be
restrictive, so more flexible priors such as the Pitman–Yor process can be used. Gibbs-type
processes, introduced by Gnedin and Pitman (2006), form an important general class of
priors, which contain Dirichlet and Pitman–Yor processes and have flexible clustering prop-
erties while maintaining mathematical tractability, see Lijoi and Pr¨unster (2010), De Blasi
et al. (2015) for a review. Compared to the Dirichlet process, Gibbs-type priors exhibit a
predictive distribution which involves more information, that is, sample size and number
of clusters (refer to the sufficientness postulates for Gibbs-type priors of Bacallado et al.,
2017). The class of Gibbs-type priors encompasses BNP processes which are widely used,
for instance in species sampling problems (Lijoi et al.,2007b,Favaro et al.,2009,2012,Ce-
sari et al.,2014,Arbel et al.,2017), survival analysis (Jara et al.,2010), network inference
3
(Caron and Fox,2017,Legramanti et al.,2022), linguistics (Teh and Jordan,2010) and mix-
ture modeling (Ishwaran and James,2001,Lijoi et al.,2005a,2007a). Miller and Harrison
(2018), Fr¨uhwirth-Schnatter et al. (2021), Argiento and De Iorio (2022) study the connec-
tion between the mixtures of finite mixtures and BNP mixtures with Gibbs-type priors. A
common approach to inferring the number of clusters in Bayesian nonparametric models is
through the posterior distribution of the number of clusters.
Finally, finite mixture models are considered when Kis assumed to be finite. We dis-
tinguish two cases, depending on whether the number of components is known or unknown.
The case when the number of components is known, say K=K0, is referred to as the
exact-fitted setting. An appealing way to handle the other case (K0unknown) is to use a
chosen upper bound on K0, i.e. to take the number of components Ksuch that KK0,
yielding the so-called overfitted mixture models. A classic overfitted mixture model is based
on the Dirichlet multinomial process, which is a finite approximation of the Dirichlet process
(see Ishwaran and Zarepour,2002, for instance). Generalizations of the Dirichlet multino-
mial process were recently introduced by Lijoi et al. (2020a,b), which lead to more flexible
overfitted mixture models.
Asymptotic properties of Bayesian mixtures. A minimal requirement for the reli-
ability of a statistical procedure is that it should have reasonable asymptotic properties,
such as consistency. This consideration also plays a role in the Bayesian framework, where
asymptotic properties of the posterior distribution may be studied. In Table 1, we provide
a summary of existing results of posterior consistency for the three types of mixture mod-
els, when it is assumed that data come from a finite mixture and that the kernel f(· | θ)
correctly describes the data generation process (i.e. the so-called well-specified setting). We
denote by K0the true number of components, G0the true mixing measure, and fX
0the true
density written in the form of (1). For finite-dimensional mixtures, Doob’s theorem provides
posterior consistency in density estimation (Nobile,1994). However, this is a more delicate
question for BNP mixtures. Extensive research in this area provides consistency results for
density estimation under different assumptions for Bayesian nonparametric mixtures, such
as for Dirichlet process mixtures (Ghosal et al.,1999,Ghosal and Van Der Vaart,2007,
Kruijer et al.,2010) and other types of BNP priors (Lijoi et al.,2005b). In the case of MFM,
posterior consistency in the number of clusters as well as in the mixing measure follows from
Doob’s theorem and was proved by Nobile (1994). Recently, Miller (2023) provided a new
proof with simplified assumptions.
For finite mixtures and Bayesian nonparametric mixtures, under some conditions of iden-
tifiability, kernel continuity, and uniformity of the prior, Nguyen (2013) proves consistency
4
for mixing measures and provides corresponding contraction rates. These results only guar-
antee consistency for the mixing measure and do not imply consistency of the posterior
distribution of the number of clusters. In contrast, posterior inconsistency of the number of
clusters for Dirichlet process mixtures and Pitman–Yor process mixtures is proved by Miller
and Harrison (2014). To the best of our knowledge, this result was not shown to hold for
other classes of priors. We fill this gap and provide an extension of Miller and Harrison (2014)
results for Gibbs-type process mixtures and some of their finite-dimensional representations.
Inconsistency results for mixture models do not impede real-world applications but sug-
gest that inference about the number of clusters must be taken carefully. On the positive
side, and in the case of overfitted mixtures, Rousseau and Mengersen (2011) establish that
the weights of extra components vanish asymptotically under certain conditions. Additional
results by Chambaz and Rousseau (2008) establish posterior consistency for the mode of
the number of clusters. Guha et al. (2021) propose a post-processing procedure that allows
consistent inference of the number of clusters in mixture models. They focus on Dirichlet
process mixtures and we provide an extension for Pitman–Yor process mixtures and over-
fitted mixtures in this article. Another possibility to solve the problem of inconsistency
is to add flexibility for the prior distribution on a mixing measure through a prior on its
hyperparameters. For Dirichlet multinomial process mixtures, Malsiner-Walli et al. (2016)
observe empirically that adding a prior on the αparameter helps with centering the posterior
distribution of the number of clusters on the true value (see their Tables 1 and 2). A similar
result is proved theoretically by Ascolani et al. (2022) for Dirichlet process mixtures under
mild assumptions.
As a last remark, although we focus on the well-specified case, an important research
line in mixture models revolves around misspecified-kernel mixture models, when data are
generated from a finite mixture of distributions that do not belong to the kernel family
f(· | θ). Miller and Dunson (2019) shows how so-called coarsened posteriors allow performing
inference on the number of components in MFMs with Gaussian kernels when data come
from skew-normal mixtures. Cai et al. (2021) provide theoretical results for MFMs, when
the mixture component family is misspecified, showing that the posterior distribution of the
number of components diverges. Misspecification is of course a topic of critical importance in
practice, however, the well-specified case is challenging enough to warrant its own extensive
investigation.
Contributions and outline. In this rather technical landscape, it can be difficult for
the non-specialist to keep track of theoretical advances in Bayesian mixture models. This
article aims to provide an accessible review of existing results, as well as the following novel
5
摘要:

Bayesianmixturemodels(in)consistencyforthenumberofclustersLouiseAlamichel1,†∗,DariaBystrova1,†,JulyanArbel1&GuillaumeKonKamKing21Univ.GrenobleAlpes,CNRS,Inria,GrenobleINP,LJK,38000Grenoble,France{louise.alamichel,daria.bystrova,julyan.arbel}@inria.fr2Universit´eParis-Saclay,INRAE,MaIAGE,78350Jouy-en...

展开>> 收起<<
Bayesian mixture models inconsistency for the number of clusters Louise Alamichel1 Daria Bystrova1 Julyan Arbel1 Guillaume Kon Kam King2.pdf

共55页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:55 页 大小:2.18MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 55
客服
关注