BAYESIAN REPULSIVE MIXTURE MODELING WITH MATERN POINT PROCESSES A P REPRINT

2025-04-27 0 0 4.05MB 27 页 10玖币

侵权投诉

BAYESIAN REPULSIVE MIXTURE MODELING WITH MAT ´

ERN

POINT PROCESSES

A PREPRINT

Hanxi Sun

Department of Statistics

Purdue University

West Lafayette, IN 47906

hanxi-sun@purdue.edu

Boqian Zhang

Department of Statistics

Purdue University

West Lafayette, IN 47906

zhan1977@purdue.edu

Vinayak Rao∗

Department of Statistics

Purdue University

West Lafayette, IN 47906

varao@purdue.edu

ABSTRACT

Mixture models are a standard tool in statistical analysis, widely used for density modeling and

model-based clustering. Current approaches typically model the parameters of the mixture com-

ponents as independent variables. This can result in overlapping or poorly separated clusters when

either the number of clusters or the form of the mixture components is misspeciﬁed. Such model mis-

speciﬁcation can undermine the interpretability and simplicity of these mixture models. To address

this problem, we propose a Bayesian mixture model with repulsion between mixture components.

The repulsion is induced by a generalized Mat´

ern type-III repulsive point process model, obtained

through a dependent sequential thinning scheme on a primary Poisson point process. We derive a

novel and efﬁcient Gibbs sampler for posterior inference, and demonstrate the utility of the proposed

method on a number of synthetic and real-world problems.

Keywords Poisson process ·Thinning ·Data Augmentation ·Parsimony

1 Introduction

As statistics and machine learning methods ﬁnd wide application in real world problems, practitioners are increas-

ingly seeking to balance statistical ﬁdelity and predictive accuracy with interpretability, parsimony and fairness. Pop-

ular instances of these trade-offs include introducing smoothness, sparsity or low-dimensional structure into statistical

models. In this paper, we focus on interpretability and diversity, through the use of repulsive priors in mixture model-

ing applications. Mixture models are a powerful and ﬂexible class of models, capable of approximating increasingly

complex distributions as the number of mixture components increases. Such models are useful both in density mod-

eling applications as well as in clustering applications [McLachlan and Basford,1988,Banﬁeld and Raftery,1993,

Bensmail et al.,1997], with goals for the latter typically including data exploration, visualization and summarization.

Problems where they have been applied include topic modeling [Blei et al.,2003], genetics [Pagel and Meade,2004,

McLachlan et al.,2002] and computer vision [Friedman and Russell,1997], among many others. However, the ﬂexi-

bility of mixture models often comes at the cost of interpretability and parsimony. For computational tractability, the

parameters of the mixture components are typically modeled as independent and identically distributed draws from

some ‘base distribution’. This can result in overlapping clusters: unless the clusters are very widely separated, the

posterior will typically assign probability to multiple clusters in some neighborhood. The nearly identical locations of

these clusters leads to redundancy, and lack of interpretability. Since mixture models are typically composed of simple

parametric components, even if the data exhibits clear clustered structure, a slight deviation of individual clusters from

the parametric form will again result in overlapping components.

One approach towards addressing this problem is to control the number of mixture components through an appropriate

prior. However, trying to induce interpretability in this indirect fashion can make model speciﬁcation quite challenging,

especially in nonparametric applications where the number of components depends on the dataset size. Further, this

∗Corresponding author

arXiv:2210.04140v1 [stat.ME] 9 Oct 2022

Bayesian Repulsive Mixture Modeling with Mat´

ern Point Processes A PREPRINT

approach is still sensitive to any misspeciﬁcation of the form of the individual components. Another approach is to use

more ﬂexible (e.g. nonparametric) densities for each component of a mixture model [Gassiat,2017], though once again

this raises problems with model speciﬁcation, identiﬁability and computation. A more modern approach is to directly

address the problem of overlapping clusters, enforcing diversity through repulsive priors. Here, rather than being

sampled independently from the base measure, component locations are jointly sampled from a prior distribution

that penalizes realizations where components are situated too close to each other. Such priors typically draw from

the point process literature, examples including Gibbs point processes [Stoyan et al.,1987] and determinantal point

processes [Hough et al.,2006,Lavancier et al.,2015]. Mixture models built on such priors have been shown to provide

simpler, clearer and more interpretable results, often without too much loss of predictive performance [Petralia et al.,

2012,Xu et al.,2016,Bianchini et al.,2018]. Nevertheless, they present computational challenges, since the repulsive

models often involve normalization constants that are intractable to evaluate.

In this work, we propose a new class of repulsive priors based on the Mat´

ern type-III point process. Mat´

ern point

processes are a class of repulsive point processes ﬁrst studied in Mat´

ern [1960,2013]. More recently, Rao et al. [2017]

developed a simple and efﬁcient Markov chain Monte Carlo (MCMC) sampling algorithm for a generalized Mat´

ern

type-III process (see section 2). In this paper, we bring this process to the setting of mixture models, using them as a

repulsive prior over the number of components and their locations. Treating the Mat´

ern realization as a latent, rather

than a fully observed point process, raises computational challenges that the algorithm from Rao et al. [2017] does

not handle. We develop an efﬁcient MCMC sampler for our model and demonstrate the practicality and ﬂexibility of

our proposed repulsive mixture model on a variety of datasets. Our proposed algorithm is also useful in Mat´

ern point

process applications with missing observations, as well as for mixture models without repulsion, as an alternative to

often hard-to-tune reversible jump MCMC methods [Richardson and Green,1997] to sample the unknown number of

components.

We organize this paper as follows. Section 2reviews the generalized Mat´

ern type-III point process, while section 3and

section 4outline our proposed Mat´

ern Repulsive Mixture Model (MRMM) and our novel MCMC algorithm. Section 5

discussed related work on repulsive mixture models, and we apply our model to a number of datasets in section 6.

2 Mat´

ern repulsive point processes

The Poisson process [Kingman,1992] is a completely random point process, where events in disjoint sets are inde-

pendent of each other. To incorporate repulsion between events, Mat´

ern [1960,2013] introduced three spatial point

process models that build on the Poisson process. The three models, called the Mat´

ern hardcore point process of type I,

II and III, only allow point process realizations with pairs of events separated by at least some ﬁxed distance η, where

ηis a parameter of the model. The three models are constructed by applying different dependent thinning schemes

on a primary homogeneous Poisson point process. Despite being theoretically more challenging than the other two

processes, the type-III process has the most natural thinning mechanism, and supports higher densities of points. Rao

et al. [2017] showed how this can easily be generalized to include probabilistic thinning and spatial inhomogeneity.

Furthermore, Rao et al. [2017] showed that posterior inference for a completely observed type-III process can be

carried out in a relatively straightforward manner. These advantages make the generalized Mat´

ern type-III process

superior to the other two as a prior for mixture models. For simplicity, we will refer to this process as the Mat´

ern

process in the rest of this paper.

Formally, the Mat´

ern process is a ﬁnite point process deﬁned on a space Θ, parameterized by a thinning kernel

Kη: Θ ×Θ→[0,1] and a nonnegative intensity function λΘ: Θ →[0,∞). We will ﬁnd it convenient to decompose

the function λΘ(θ)as λΘ(θ) = ¯

λ·pΘ(θ), for a ﬁnite normalizing constant ¯

λ > 0and some probability density

pΘ(θ)on Θ. Simulating this process proceeds in four steps: (1) Simulate the primary process FΘ=θ1, . . . , θ|FΘ|

from a Poisson process with intensity λΘ(·)on Θ. (2) Assign each event θjin FΘan independent random birth-time

uniformly on the interval T= [0,1]. (3) Sequentially visit events in the primary process according to their birth-times

(from the oldest to the youngest) and attempt to thin (delete) them. Speciﬁcally, at step j, the jth oldest event (θ, t)

is thinned by each surviving older primary event (θ0, t0), t0< t with probability Kη(θ, θ0). (4) Write GΘand ˜

GΘfor

the elements of FΘthat survive and are thinned from the previous step, respectively. The set GΘforms the Mat´

ern

process realization.

For a hardcore Mat´

ern process (ﬁgure 1), the thinning kernel satisﬁes Kη(θ, θj) = 1kθ−θjk<η, where ηis the thinning

radius, so that thinning is deterministic: newer events within distance ηof a previously survived event are thinned with

probability 1. Other approaches are probabilistic thinning [Rao et al.,2017], where Kη(θ, θj) = ηp1kθ−θjk<ηR(with

ηp∈[0,1]), or the smoother squared-exponential thinning, where Kη(θ, θj) = exp(−kθ−θjk2

2η).Huber and Wolpert

Bayesian Repulsive Mixture Modeling with Mat´

ern Point Processes A PREPRINT

Figure 1: The generative process of a one-dimensional Mat´

ern process with a hardcore thinning kernel Kη.(1) A

primary Poisson point process FΘwith intensity λΘ(θ)is simulated on Θ,(2) Events in FΘare assigned random birth

times uniformly from T= [0,1],(3) Events in the shadow (i.e. within horizontal distance η) of earlier surviving events

are thinned, (4) Surviving events are projected onto Θto form the Mat´

ern realization GΘ.

[2009] propose soft-core thinning, where each event θjhas its own thinning radius ηjdrawn from some distribution,

and Kη(θ, θj) = 1kθ−θjk<ηj.

Observe that since each event θjhas an independently and uniformly distributed birth-time tjassociated with it, the

set of pairs (θ1, t1),...,(θ|FΘ|, t|FΘ|)is itself distributed as a Poisson process on Θ× T , with intensity λ(θ, t) =

λΘ(θ)1[0,1](t). We write this extended primary process as F. Consistent with our use of FΘto represent the set of

locations of each point in F, we will use FTto represent the set of birth-times. Similarly, we will use Gfor the

extended Mat´

ern events, and GTfor the associated birth-times (and ˜

Gand ˜

GTfor their thinned counterparts).

Following Rao et al. [2017], we will specify the thinning process through a shadow function Hη: Θ × T → [0,1]

parameterized by a possibly vector-valued η. This gives the probability that an event (θ∗, t∗)∈Θ× T is thinned by a

collection of events Gas

Hη((θ∗, t∗) ; G)=1−Y

g∈G

[1 − Hη((θ∗, t∗) ; g)] ,(1)

where for a single event g= (θ, t),Hη((θ∗, t∗) ; (θ, t)) = 1[t,1](t∗)Kη(θ∗, θ). Note that the 1[t,1](t∗)formalizes the

fact that an event (θ∗, t∗)can only be thinned by earlier events. We will write Mat´ernThinK(F, η)for the sequential

thinning process that assigns elements of Fto one of Gor ˜

Gaccording to thinning kernel Kη(algorithm 1in the

appendix), and ProjA(·)for the operator that projects elements of a set on to some subspace A. The generative process

of GΘ∼Mat´ernProcessK(λ, η)can be written as

F|λ∼PoissonProcess (λ(·,·)) ,

G, ˜

GF, Kη∼Mat´ernThinK(F, η), GΘ=ProjΘ(G).

With a Mat´

ern model of point pattern data, one seeks to infer the intensity function λΘ(θ)and the thinning parameters

ηfrom a realization GΘ. Observe that for the Mat´

ern type-III process, an event can only be thinned by a surviv-

ing event, so that the probability of thinning at any location depends only on the set G.Rao et al. [2017] showed

that in fact, conditioned on G, the events in ˜

Gare distributed as an inhomogeneous Poisson process with intensity

λΘ(θ)Hη((θ, t) ; G). This allowed them to develop an efﬁcient Gibbs sampler when Mat´

ern events GΘare fully

observed. This proceeds by sequentially updating the thinned events ˜

G, the Mat´

ern birth times GT, the Poisson in-

tensity λΘand thinning kernel parameter η, each conditioned on the rest. The fact that the thinned events ˜

Gcan be

jointly sampled avoids the need for any birth-death steps in the MCMC algorithm, both simplifying the algorithm and

improving its efﬁciency. We adapt this sampler for our MCMC algorithm, where the fact that the Mat´

ern events are

hidden or partially observed will create new challenges. In the next section, we ﬁrst describe our model that uses the

Mat´

ern process to impose repulsion between components of a ﬁnite mixture model.

3 Mat´

ern repulsive mixture model (MRMM)

We start with a primary Poisson process Fthat includes mixture weights, deﬁning it on an extended space Θ× W × T

with T= [0,1],W= [0,∞). Write its intensity function as

λ(θ, w, t) = ¯

λ·pΘ(θ)·pW(w)·1[0,1](t).(2)

Bayesian Repulsive Mixture Modeling with Mat´

ern Point Processes A PREPRINT

We will set pW(w) = Gamma(w;α, 1), while pΘ(θ)will be a problem-speciﬁc prior over component parameters.

Unlike the Mat´

ern process, we model Fas a Poisson process conditioned to have at least one event. Given F, we will

produce a Mat´

ern realization G=(θ1, w1, t1),...,(θ|G|, w|G|, t|G|)by applying Mat´ernThinK(F, η)for some

kernel Kon Θwith parameter η. Each element (θ, w, t)∈Gwill form a component of a mixture model, with θand w

representing the parameter and unnormalized weight of that component; see also ﬁgure S1 in the appendix. Our model

thus serves as a prior over both the number of components in a mixture model, as well as the component weights and

locations. Since events in Fcan only be thinned by surviving events, our modiﬁed Mat´

ern prior on Fensures the

mixture model has at least one component.

For a set A, write PAfor the sum of its elements. Consistent with the notation of GΘand GT, we write GWfor

ProjW(G). Then, given G, the observed data X={xi, i = 1, . . . , n}is modeled as follows:

xi|G∼X

(θ,w,t)∈G

PGW

pX(·;θ), i = 1, . . . , n. (3)

Here, pX(·;θ)represents some family of probability densities parameterized by θ∈Θ. As an example, if the

observations lie on a Euclidean space, pX(·;θ)could be a normal distribution, with θrepresenting the location and

variance of a component in a Gaussian mixture model. In this case, the density pΘ(θ)might be a Normal-Inverse-

Wishart distribution.

Note that when the w’s are independent Gamma(α, 1) variables, the vector of normalized weights

w1PGW, . . . , w|G|PGWfollows a symmetric Dirichlet distribution with concentration parameter α[Devroye,

1986]. Thus, conditioned on the number of components, we assume a symmetric Dirichlet prior on the weights. If

the thinning kernel Kηequals 0, our model then reduces to a standard mixture model, with i.i.d. component param-

eters, Dirichlet-distributed component weights, and a conditional Poisson distribution on the number of component.

Different settings of Kη(hardcore, probabilistic or squared-exponential thinning) allow different kinds of repulsion

between the component parameters. Observe that repulsion is only between the component parameters θ(and not the

component weights w). Further, in many settings we allow Kηto only depend on a subset of the components of θ.

For instance, writing θ= (θµ, θσ)where θµis the component location and θσis the component variance, a common

requirement is to enforce repulsion only between the component locations, but not their variances. This can easily be

achieved by setting Kηto depend only on θµ.

All that is left to complete a Bayesian model is to specify hyperpriors on the hyperparameters ¯

λand η, as well as any

hyperparameters of the density pΘ(θ). The last is problem speciﬁc, and is no different from models without repulsion.

A natural prior for ¯

λis the Gamma distribution, while the choice of the hyperprior on ηwill depend on the type of

thinning kernel. In general, we recommend at least a mildly informative prior on the thinning parameter, as otherwise,

the posterior can settle on a model without any repulsion. For the hardcore process, where ηis the thinning radius, or

for the squared-exponential thinning kernel, where ηis the lengthscale parameter, we can use a Gamma hyperprior.

For probabilistic thinning, where η= (R, p), we can use a Beta prior on the thinning probability p, and a Gamma prior

on the thinning radius R. For this last model, we observe competitive performance even by simply ﬁxing p= 0.95

and Rto a value slightly larger than the desired repulsive distance. We include further discussion of the choice of

hyperpriors in section 6and the appendix.

Write z= (z1, . . . , zn)for the collection of latent cluster assignments of the data in equation (3), with zi∈

{1,...,|G|}. Following notation in section 2, with hyperpriors omitted for simplicity, the generative process of

MRMM is

F|λ∼PoissonProcess (λ(·)) |F|>0, G, ˜

GF, Kη∼Mat´ernThinK(F, η),(4)

zi|G∼Multinomial w1

PGW

,..., w|G|

PGW, xi|zi, G ∼pX(·;θzi), i = 1, . . . , n.

Note that for convenience, in the last line above we have imposed an arbitrary ordering on the elements of G, and thus

the component identities, though the cluster indicators zireally take values in some categorical space. The proposition

below gives the joint density of all variables, and will be useful for deriving our posterior sampling algorithm.

Proposition 3.1. Write Pλfor the measure of a rate-λ(·)Poisson process on Θ× W × T . Then the measure of the

tuple X,G,˜

Ghas density with respect to Pλ×dxngiven by

pX, G, ˜

Gλ, η=1(|G∪˜

G|>0)

1−eRΘ×W×T −λ(θ,w,t) dθdwdt

g∈G

[1 − Hη(g;G)] Y

˜g∈˜

Hη(˜g;G)

i=1 X

(θ,w,t)∈G

PGW

pX(xi;θ).(5)

Bayesian Repulsive Mixture Modeling with Mat´

ern Point Processes A PREPRINT

4 Posterior inference for MRMM

Given a dataset X={x1, . . . , xn}from MRMM, we are interested in the posterior distribution pG, z,¯

λ, η X,

summarizing information about the component weights and locations (through G), and the cluster assignments

(through z). We construct a Markov chain Monte Carlo (MCMC) sampler to simulate from it. Our sampler is an

auxiliary variable Gibbs sampler, that for computational reasons, also imputes the thinned events ˜

G. The sampler

proceeds by sequentially updating the latent variables ¯

λ, η, G, ˜

Gand zaccording to their conditional posterior distri-

butions. Among these, the most challenging steps are updating Gand ˜

G: both of these are variable-dimension objects,

where not just the values but also the cardinality of the sets must be sampled. Given Mat´

ern events G, sampling ˜

resembles the sampling problem from Rao et al. [2017], where the Mat´

ern realization was completely observed. How-

ever, our modiﬁed prior on F(where |F|must be greater than 0) requires some care, and we provide a different and

cleaner derivation of this update step using Campbell’s theorem [Kingman,1992]. Updating Ggiven the rest is more

challenging, and we further augment our MCMC state space with an independent Poisson process ˜

F, and then update

the triplet (G, ˜

G, ˜

F)to (G∗,˜

G∗,˜

F∗)using a ‘relabeling’ process that keeps the union unchanged. This approach is

simpler than reversible-jump or birth-death approaches, with the augmented Poisson intensity trading-off mixing and

computation, and forming the only new parameter. Below, we present full details of the Gibbs steps.

1) Updating thinned events ˜

G:Given G, the thinned events ˜

Gare independent of the observations:

p(˜

G|¯

λ, η, G, z,X) = p(˜

G|¯

λ, η, G). Furthermore, events in ˜

Gcan only be thinned by events in G, suggesting

that conditioned on G, ¯

λ, η, the events within ˜

Gdo not interact with each other, and form a Poisson process. The result

below formalizes this:

Proposition 4.1. Given all other variables, the conditional distribution of the thinned events ˜

Gis a Poisson process

with intensity λ(·)Hη(·;G).

This result resembles that of Rao et al. [2017], though our derivation in the appendix exploits proposition 3.1 and

works with densities with respect to the rate-λPoisson measure, and is simpler and cleaner. Simulating such a Poisson

process is straightforward: simulate a Poisson process with intensity λ(·)on the whole space Θ× W × T , and then

keep each event ˜gin it with probability Hη(˜g;G)[Lewis and Shedler,1979]. This makes jointly updating the entire

set ˜

Geasy and efﬁcient, without any tuning parameters.

2) Updating the Mat´

ern events G:This step is signiﬁcantly more challenging, since unlike the thinned events, the

Mat´

ern events interact with each other, and with the clustering structure of the data. Consequently, we cannot simply

discard Gand sample a new realization. Instead, we produce a dependent update of G, through a Markov kernel that

targets this conditional distribution.

We ﬁrst discard the cluster assignments z; note these can easily be resampled (see step 3 below). A naive approach

is then to make a pass through the elements of G∪˜

G, reassigning each to either Gor ˜

Gbased on the appropriate

conditional. This forms a standard sequence of Gibbs updates, and does not involve any reversible jump or stochastic

process simulation. At the end of this pass, we have an updated pair (G∗,˜

G∗), with G∗possibly having different

number of elements from G. While this keeps the union G∪˜

Gunchanged, our ability to efﬁciently update ˜

Gmight

suggest fast mixing.

In our experiments however, we observed poor mixing, especially with hardcore thinning. The latter setting forbids

elements of G∗from lying within each others’ shadow, and also requires ˜

G∗to lie in the shadow of G∗, making it hard

to switch an event from the Mat´

ern set to thinned set, or vice versa. To address this, we begin this step by augmenting

our MCMC state-space with an independent rate-γλ(·)Poisson process ˜

F⊂Θ× W × T :

Fγ, λ ∼PoissonProcess (γλ(·)) .(6)

We call γ > 0the augmentation factor, which forms a parameter of our MCMC algorithm. Since ˜

Fis simulated

independently of all other variables, the joint distribution of (G, ˜

G, ˜

F)conditioned on all other variables has the

conditional distribution of (G, ˜

G)as its marginal. Having simulated ˜

F, we cycle through the elements of G∪˜

G∪˜

sequentially relabeling each event as “survived”, “thinned” or “augmented” to produce a new triplet G∗∪˜

G∗∪˜

F∗.

This relabeling is carried to preserve the joint conditional of G∗,˜

G∗,˜

F∗, so that after discarding ˜

F∗, we have updated

(G, ˜

G)while maintaining their conditional distribution.

The augmented Poisson process ˜

Fmore easily allows events to be introduced into, and removed from G, especially

in the hardcore setting. Each relabeling step is straightforward, and requires computing a 3-component probability.

For each e∈G∪˜

G∪˜

F, write G\e,˜

G\eand ˜

F\efor the sets resulting from removing e(only one of these will

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BAYESIANREPULSIVEMIXTUREMODELINGWITHMAT´ERNPOINTPROCESSESAPREPRINTHanxiSunDepartmentofStatisticsPurdueUniversityWestLafayette,IN47906hanxi-sun@purdue.eduBoqianZhangDepartmentofStatisticsPurdueUniversityWestLafayette,IN47906zhan1977@purdue.eduVinayakRaoDepartmentofStatisticsPurdueUniversityWestLafay...

展开>> 收起<<

BAYESIAN REPULSIVE MIXTURE MODELING WITH MATERN POINT PROCESSES A P REPRINT.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

BAYESIAN REPULSIVE MIXTURE MODELING WITH MATERN POINT PROCESSES A P REPRINT

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: