Bayesian mixture models inconsistency for the number of clusters Louise Alamichel1 Daria Bystrova1 Julyan Arbel1 Guillaume Kon Kam King2

2025-05-06 0 0 2.18MB 55 页 10玖币

侵权投诉

Bayesian mixture models (in)consistency

for the number of clusters

Louise Alamichel1,†∗, Daria Bystrova1,†, Julyan Arbel1& Guillaume Kon Kam King2

1Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France

{louise.alamichel, daria.bystrova, julyan.arbel}@inria.fr

2Universit´e Paris-Saclay, INRAE, MaIAGE, 78350 Jouy-en-Josas, France

guillaume.kon-kam-king@inrae.fr

Abstract

Bayesian nonparametric mixture models are common for modeling complex data.

While these models are well-suited for density estimation, recent results proved pos-

terior inconsistency of the number of clusters when the true number of components is

ﬁnite, for the Dirichlet process and Pitman–Yor process mixture models. We extend

these results to additional Bayesian nonparametric priors such as Gibbs-type processes

and ﬁnite-dimensional representations thereof. The latter include the Dirichlet multi-

nomial process, the recently proposed Pitman–Yor, and normalized generalized gamma

multinomial processes. We show that mixture models based on these processes are also

inconsistent in the number of clusters and discuss possible solutions. Notably, we show

that a post-processing algorithm introduced for the Dirichlet process can be extended

to more general models and provides a consistent method to estimate the number of

components.

Keywords: Clustering; Finite mixtures; Finite-dimensional BNP representations; Gibbs-type pro-

cess

1 Introduction

Motivation. Mixture models appeared as a natural way to model heterogeneous data,

where observations may come from diﬀerent populations. Complex probability distributions

can be broken down into a combination of simpler models for each population. Mixture

models are used for density estimation, model-based clustering (Fraley and Raftery,2002)

∗This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded

by the French program Investissements d’Avenir.

†Equal contribution.

arXiv:2210.14201v3 [math.ST] 30 May 2024

and regression (M¨uller et al.,1996). Due to their ﬂexibility and simplicity, they are widely

used in many applications such as healthcare (Ram´ırez et al.,2019,Ullah and Mengersen,

2019), econometrics (Fr¨uhwirth-Schnatter et al.,2012), ecology (Attorre et al.,2020) and

many others (further examples in Fr¨uhwirth-Schnatter et al.,2019).

In a mixture model, data X1:n= (X1, . . . , Xn), Xi∈ X ⊂ Rpare modeled as coming

from a K-components mixture distribution. If the mixing measure Gis discrete, i.e. G=

i=1 wiδθiwith positive weights wisumming to one and atoms θi, then the mixture density

fX(x) = Zf(x|θ)G(dθ) =

k=1

wkf(x|θk),(1)

where f(· | θ) represents a component-speciﬁc kernel density parameterized by θ. We denote

the set of parameters by θ1:K= (θ1, . . . , θK), where each θk∈Rd, k = 1, . . . , K. Model (1)

can be equivalently represented through latent allocation variables z1:n= (z1, . . . , zn), zi∈

{1, . . . , K}. Each zidenotes the component from which observation Xicomes: p(Xi|θk) =

p(Xi|zi=k) with wk=P(zi=k). Allocation variables zideﬁne a clustering such that

Xiand Xjbelong to the same cluster if zi=zj. Moreover, z1, . . . , zndeﬁne a partition

A= (A1, . . . , AKn) of {1, . . . , n}, where Kndenotes the number of clusters.

It is important to distinguish between the number of components K, which is a model

parameter, and the number of clusters Kn, which is the number of components from which

we observed at least one data point in a dataset of size n(Fr¨uhwirth-Schnatter et al.,2021,

Argiento and De Iorio,2022,Greve et al.,2022). For a data-generating process with K0

components, inference on Kis typically done by considering the number of clusters Knand

the present article investigates to what extent this is warranted.

Although mixture models are widely used in practice, they remain the focus of active

theoretical investigations, owing to multiple challenges related to the estimation of mix-

ture model parameters. These challenges stem from identiﬁability problems (Fr¨uhwirth-

Schnatter,2006), label switching (Celeux et al.,2000), and computation complexity due to

the large dimension of parameter space.

Another critical question, which is the main focus of this article, regards the number

of components and clusters, and whether it is possible to infer them from the data. This

question is even more crucial when the aim of inference is clustering. The typical approach

to estimating the number of components in a mixture is to ﬁt models of varying complex-

ity and perform model selection using a classic criterion such as the Bayesian Information

Criterion (BIC), the Akaike Information Criterion (AIC), etc. This approach is not entirely

satisfactory in general, because of the need to ﬁt many separate models and the diﬃculty of

performing a reliable model selection. Therefore, several methods that bypass the need to

ﬁt multiple models have been proposed. They deﬁne a single ﬂexible model accommodating

various possibilities for the number of components: mixtures of ﬁnite mixtures, Bayesian

nonparametric mixtures, and overﬁtted mixtures. These methods have been prominently

proposed in the Bayesian framework, where the speciﬁcation of prior information is a pow-

erful and versatile method to avoid overﬁtting by unduly complex mixture models.

Three types of discrete mixtures. Although we consider discrete mixing measures,

Gcould be any probability distribution (for continuous mixing measures, see for instance

Chapter 10 in Fr¨uhwirth-Schnatter et al.,2019). Depending on the speciﬁcation of the

mixing measure, there exist three main types of discrete mixture models: ﬁnite mixture

models where the number of components Kis considered ﬁxed (known, equal to K0, or

unknown), mixture of ﬁnite mixtures (MFM) where Kis random and follows some speciﬁc

distribution, and inﬁnite mixtures where Kis inﬁnite. Under a Bayesian approach, the latter

category is often referred to as Bayesian nonparametric (BNP) mixtures.

Speciﬁcation of the number of components Kis diﬀerent for the three types of mixtures.

When Kis unknown, the Bayesian approach provides a natural way to deﬁne the number of

components by considering it random and adding a prior for Kto the model, as is done for

mixtures of ﬁnite mixtures. Inference methods for MFM were introduced by Nobile (1994),

Richardson and Green (1997).

Using Bayesian nonparametric (BNP) priors for mixture modeling is another way to by-

pass the choice of the number of components K. This is achieved by assuming an inﬁnite

number of components, which adapts the number of clusters found in a dataset to the struc-

ture of the data. The most commonly used BNP prior is the Dirichlet process introduced by

Ferguson (1973) and the corresponding Dirichlet process mixture was ﬁrst introduced by Lo

(1984). The success of the Dirichlet process mixture is based on its ease of implementation

and computational tractability. However, in some cases the Dirichlet process prior may be

restrictive, so more ﬂexible priors such as the Pitman–Yor process can be used. Gibbs-type

processes, introduced by Gnedin and Pitman (2006), form an important general class of

priors, which contain Dirichlet and Pitman–Yor processes and have ﬂexible clustering prop-

erties while maintaining mathematical tractability, see Lijoi and Pr¨unster (2010), De Blasi

et al. (2015) for a review. Compared to the Dirichlet process, Gibbs-type priors exhibit a

predictive distribution which involves more information, that is, sample size and number

of clusters (refer to the suﬃcientness postulates for Gibbs-type priors of Bacallado et al.,

2017). The class of Gibbs-type priors encompasses BNP processes which are widely used,

for instance in species sampling problems (Lijoi et al.,2007b,Favaro et al.,2009,2012,Ce-

sari et al.,2014,Arbel et al.,2017), survival analysis (Jara et al.,2010), network inference

(Caron and Fox,2017,Legramanti et al.,2022), linguistics (Teh and Jordan,2010) and mix-

ture modeling (Ishwaran and James,2001,Lijoi et al.,2005a,2007a). Miller and Harrison

(2018), Fr¨uhwirth-Schnatter et al. (2021), Argiento and De Iorio (2022) study the connec-

tion between the mixtures of ﬁnite mixtures and BNP mixtures with Gibbs-type priors. A

common approach to inferring the number of clusters in Bayesian nonparametric models is

through the posterior distribution of the number of clusters.

Finally, ﬁnite mixture models are considered when Kis assumed to be ﬁnite. We dis-

tinguish two cases, depending on whether the number of components is known or unknown.

The case when the number of components is known, say K=K0, is referred to as the

exact-ﬁtted setting. An appealing way to handle the other case (K0unknown) is to use a

chosen upper bound on K0, i.e. to take the number of components Ksuch that K≥K0,

yielding the so-called overﬁtted mixture models. A classic overﬁtted mixture model is based

on the Dirichlet multinomial process, which is a ﬁnite approximation of the Dirichlet process

(see Ishwaran and Zarepour,2002, for instance). Generalizations of the Dirichlet multino-

mial process were recently introduced by Lijoi et al. (2020a,b), which lead to more ﬂexible

overﬁtted mixture models.

Asymptotic properties of Bayesian mixtures. A minimal requirement for the reli-

ability of a statistical procedure is that it should have reasonable asymptotic properties,

such as consistency. This consideration also plays a role in the Bayesian framework, where

asymptotic properties of the posterior distribution may be studied. In Table 1, we provide

a summary of existing results of posterior consistency for the three types of mixture mod-

els, when it is assumed that data come from a ﬁnite mixture and that the kernel f(· | θ)

correctly describes the data generation process (i.e. the so-called well-speciﬁed setting). We

denote by K0the true number of components, G0the true mixing measure, and fX

0the true

density written in the form of (1). For ﬁnite-dimensional mixtures, Doob’s theorem provides

posterior consistency in density estimation (Nobile,1994). However, this is a more delicate

question for BNP mixtures. Extensive research in this area provides consistency results for

density estimation under diﬀerent assumptions for Bayesian nonparametric mixtures, such

as for Dirichlet process mixtures (Ghosal et al.,1999,Ghosal and Van Der Vaart,2007,

Kruijer et al.,2010) and other types of BNP priors (Lijoi et al.,2005b). In the case of MFM,

posterior consistency in the number of clusters as well as in the mixing measure follows from

Doob’s theorem and was proved by Nobile (1994). Recently, Miller (2023) provided a new

proof with simpliﬁed assumptions.

For ﬁnite mixtures and Bayesian nonparametric mixtures, under some conditions of iden-

tiﬁability, kernel continuity, and uniformity of the prior, Nguyen (2013) proves consistency

for mixing measures and provides corresponding contraction rates. These results only guar-

antee consistency for the mixing measure and do not imply consistency of the posterior

distribution of the number of clusters. In contrast, posterior inconsistency of the number of

clusters for Dirichlet process mixtures and Pitman–Yor process mixtures is proved by Miller

and Harrison (2014). To the best of our knowledge, this result was not shown to hold for

other classes of priors. We ﬁll this gap and provide an extension of Miller and Harrison (2014)

results for Gibbs-type process mixtures and some of their ﬁnite-dimensional representations.

Inconsistency results for mixture models do not impede real-world applications but sug-

gest that inference about the number of clusters must be taken carefully. On the positive

side, and in the case of overﬁtted mixtures, Rousseau and Mengersen (2011) establish that

the weights of extra components vanish asymptotically under certain conditions. Additional

results by Chambaz and Rousseau (2008) establish posterior consistency for the mode of

the number of clusters. Guha et al. (2021) propose a post-processing procedure that allows

consistent inference of the number of clusters in mixture models. They focus on Dirichlet

process mixtures and we provide an extension for Pitman–Yor process mixtures and over-

ﬁtted mixtures in this article. Another possibility to solve the problem of inconsistency

is to add ﬂexibility for the prior distribution on a mixing measure through a prior on its

hyperparameters. For Dirichlet multinomial process mixtures, Malsiner-Walli et al. (2016)

observe empirically that adding a prior on the αparameter helps with centering the posterior

distribution of the number of clusters on the true value (see their Tables 1 and 2). A similar

result is proved theoretically by Ascolani et al. (2022) for Dirichlet process mixtures under

mild assumptions.

As a last remark, although we focus on the well-speciﬁed case, an important research

line in mixture models revolves around misspeciﬁed-kernel mixture models, when data are

generated from a ﬁnite mixture of distributions that do not belong to the kernel family

f(· | θ). Miller and Dunson (2019) shows how so-called coarsened posteriors allow performing

inference on the number of components in MFMs with Gaussian kernels when data come

from skew-normal mixtures. Cai et al. (2021) provide theoretical results for MFMs, when

the mixture component family is misspeciﬁed, showing that the posterior distribution of the

number of components diverges. Misspeciﬁcation is of course a topic of critical importance in

practice, however, the well-speciﬁed case is challenging enough to warrant its own extensive

investigation.

Contributions and outline. In this rather technical landscape, it can be diﬃcult for

the non-specialist to keep track of theoretical advances in Bayesian mixture models. This

article aims to provide an accessible review of existing results, as well as the following novel

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Bayesianmixturemodels(in)consistencyforthenumberofclustersLouiseAlamichel1,†∗,DariaBystrova1,†,JulyanArbel1&GuillaumeKonKamKing21Univ.GrenobleAlpes,CNRS,Inria,GrenobleINP,LJK,38000Grenoble,France{louise.alamichel,daria.bystrova,julyan.arbel}@inria.fr2Universit´eParis-Saclay,INRAE,MaIAGE,78350Jouy-en...

展开>> 收起<<

Bayesian mixture models inconsistency for the number of clusters Louise Alamichel1 Daria Bystrova1 Julyan Arbel1 Guillaume Kon Kam King2.pdf

共55页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bayesian mixture models inconsistency for the number of clusters Louise Alamichel1 Daria Bystrova1 Julyan Arbel1 Guillaume Kon Kam King2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: