BAYESIAN REPULSIVE MIXTURE MODELING WITH MATERN POINT PROCESSES A P REPRINT

2025-04-27 0 0 4.05MB 27 页 10玖币
侵权投诉
BAYESIAN REPULSIVE MIXTURE MODELING WITH MAT ´
ERN
POINT PROCESSES
A PREPRINT
Hanxi Sun
Department of Statistics
Purdue University
West Lafayette, IN 47906
hanxi-sun@purdue.edu
Boqian Zhang
Department of Statistics
Purdue University
West Lafayette, IN 47906
zhan1977@purdue.edu
Vinayak Rao
Department of Statistics
Purdue University
West Lafayette, IN 47906
varao@purdue.edu
ABSTRACT
Mixture models are a standard tool in statistical analysis, widely used for density modeling and
model-based clustering. Current approaches typically model the parameters of the mixture com-
ponents as independent variables. This can result in overlapping or poorly separated clusters when
either the number of clusters or the form of the mixture components is misspecified. Such model mis-
specification can undermine the interpretability and simplicity of these mixture models. To address
this problem, we propose a Bayesian mixture model with repulsion between mixture components.
The repulsion is induced by a generalized Mat´
ern type-III repulsive point process model, obtained
through a dependent sequential thinning scheme on a primary Poisson point process. We derive a
novel and efficient Gibbs sampler for posterior inference, and demonstrate the utility of the proposed
method on a number of synthetic and real-world problems.
Keywords Poisson process ·Thinning ·Data Augmentation ·Parsimony
1 Introduction
As statistics and machine learning methods find wide application in real world problems, practitioners are increas-
ingly seeking to balance statistical fidelity and predictive accuracy with interpretability, parsimony and fairness. Pop-
ular instances of these trade-offs include introducing smoothness, sparsity or low-dimensional structure into statistical
models. In this paper, we focus on interpretability and diversity, through the use of repulsive priors in mixture model-
ing applications. Mixture models are a powerful and flexible class of models, capable of approximating increasingly
complex distributions as the number of mixture components increases. Such models are useful both in density mod-
eling applications as well as in clustering applications [McLachlan and Basford,1988,Banfield and Raftery,1993,
Bensmail et al.,1997], with goals for the latter typically including data exploration, visualization and summarization.
Problems where they have been applied include topic modeling [Blei et al.,2003], genetics [Pagel and Meade,2004,
McLachlan et al.,2002] and computer vision [Friedman and Russell,1997], among many others. However, the flexi-
bility of mixture models often comes at the cost of interpretability and parsimony. For computational tractability, the
parameters of the mixture components are typically modeled as independent and identically distributed draws from
some ‘base distribution’. This can result in overlapping clusters: unless the clusters are very widely separated, the
posterior will typically assign probability to multiple clusters in some neighborhood. The nearly identical locations of
these clusters leads to redundancy, and lack of interpretability. Since mixture models are typically composed of simple
parametric components, even if the data exhibits clear clustered structure, a slight deviation of individual clusters from
the parametric form will again result in overlapping components.
One approach towards addressing this problem is to control the number of mixture components through an appropriate
prior. However, trying to induce interpretability in this indirect fashion can make model specification quite challenging,
especially in nonparametric applications where the number of components depends on the dataset size. Further, this
Corresponding author
arXiv:2210.04140v1 [stat.ME] 9 Oct 2022
Bayesian Repulsive Mixture Modeling with Mat´
ern Point Processes A PREPRINT
approach is still sensitive to any misspecification of the form of the individual components. Another approach is to use
more flexible (e.g. nonparametric) densities for each component of a mixture model [Gassiat,2017], though once again
this raises problems with model specification, identifiability and computation. A more modern approach is to directly
address the problem of overlapping clusters, enforcing diversity through repulsive priors. Here, rather than being
sampled independently from the base measure, component locations are jointly sampled from a prior distribution
that penalizes realizations where components are situated too close to each other. Such priors typically draw from
the point process literature, examples including Gibbs point processes [Stoyan et al.,1987] and determinantal point
processes [Hough et al.,2006,Lavancier et al.,2015]. Mixture models built on such priors have been shown to provide
simpler, clearer and more interpretable results, often without too much loss of predictive performance [Petralia et al.,
2012,Xu et al.,2016,Bianchini et al.,2018]. Nevertheless, they present computational challenges, since the repulsive
models often involve normalization constants that are intractable to evaluate.
In this work, we propose a new class of repulsive priors based on the Mat´
ern type-III point process. Mat´
ern point
processes are a class of repulsive point processes first studied in Mat´
ern [1960,2013]. More recently, Rao et al. [2017]
developed a simple and efficient Markov chain Monte Carlo (MCMC) sampling algorithm for a generalized Mat´
ern
type-III process (see section 2). In this paper, we bring this process to the setting of mixture models, using them as a
repulsive prior over the number of components and their locations. Treating the Mat´
ern realization as a latent, rather
than a fully observed point process, raises computational challenges that the algorithm from Rao et al. [2017] does
not handle. We develop an efficient MCMC sampler for our model and demonstrate the practicality and flexibility of
our proposed repulsive mixture model on a variety of datasets. Our proposed algorithm is also useful in Mat´
ern point
process applications with missing observations, as well as for mixture models without repulsion, as an alternative to
often hard-to-tune reversible jump MCMC methods [Richardson and Green,1997] to sample the unknown number of
components.
We organize this paper as follows. Section 2reviews the generalized Mat´
ern type-III point process, while section 3and
section 4outline our proposed Mat´
ern Repulsive Mixture Model (MRMM) and our novel MCMC algorithm. Section 5
discussed related work on repulsive mixture models, and we apply our model to a number of datasets in section 6.
2 Mat´
ern repulsive point processes
The Poisson process [Kingman,1992] is a completely random point process, where events in disjoint sets are inde-
pendent of each other. To incorporate repulsion between events, Mat´
ern [1960,2013] introduced three spatial point
process models that build on the Poisson process. The three models, called the Mat´
ern hardcore point process of type I,
II and III, only allow point process realizations with pairs of events separated by at least some fixed distance η, where
ηis a parameter of the model. The three models are constructed by applying different dependent thinning schemes
on a primary homogeneous Poisson point process. Despite being theoretically more challenging than the other two
processes, the type-III process has the most natural thinning mechanism, and supports higher densities of points. Rao
et al. [2017] showed how this can easily be generalized to include probabilistic thinning and spatial inhomogeneity.
Furthermore, Rao et al. [2017] showed that posterior inference for a completely observed type-III process can be
carried out in a relatively straightforward manner. These advantages make the generalized Mat´
ern type-III process
superior to the other two as a prior for mixture models. For simplicity, we will refer to this process as the Mat´
ern
process in the rest of this paper.
Formally, the Mat´
ern process is a finite point process defined on a space Θ, parameterized by a thinning kernel
Kη: Θ ×Θ[0,1] and a nonnegative intensity function λΘ: Θ [0,). We will find it convenient to decompose
the function λΘ(θ)as λΘ(θ) = ¯
λ·pΘ(θ), for a finite normalizing constant ¯
λ > 0and some probability density
pΘ(θ)on Θ. Simulating this process proceeds in four steps: (1) Simulate the primary process FΘ=θ1, . . . , θ|FΘ|
from a Poisson process with intensity λΘ(·)on Θ. (2) Assign each event θjin FΘan independent random birth-time
uniformly on the interval T= [0,1]. (3) Sequentially visit events in the primary process according to their birth-times
(from the oldest to the youngest) and attempt to thin (delete) them. Specifically, at step j, the jth oldest event (θ, t)
is thinned by each surviving older primary event (θ0, t0), t0< t with probability Kη(θ, θ0). (4) Write GΘand ˜
GΘfor
the elements of FΘthat survive and are thinned from the previous step, respectively. The set GΘforms the Mat´
ern
process realization.
For a hardcore Mat´
ern process (figure 1), the thinning kernel satisfies Kη(θ, θj) = 1kθθjk, where ηis the thinning
radius, so that thinning is deterministic: newer events within distance ηof a previously survived event are thinned with
probability 1. Other approaches are probabilistic thinning [Rao et al.,2017], where Kη(θ, θj) = ηp1kθθjkR(with
ηp[0,1]), or the smoother squared-exponential thinning, where Kη(θ, θj) = exp(kθθjk2
2η).Huber and Wolpert
2
Bayesian Repulsive Mixture Modeling with Mat´
ern Point Processes A PREPRINT
Figure 1: The generative process of a one-dimensional Mat´
ern process with a hardcore thinning kernel Kη.(1) A
primary Poisson point process FΘwith intensity λΘ(θ)is simulated on Θ,(2) Events in FΘare assigned random birth
times uniformly from T= [0,1],(3) Events in the shadow (i.e. within horizontal distance η) of earlier surviving events
are thinned, (4) Surviving events are projected onto Θto form the Mat´
ern realization GΘ.
[2009] propose soft-core thinning, where each event θjhas its own thinning radius ηjdrawn from some distribution,
and Kη(θ, θj) = 1kθθjkj.
Observe that since each event θjhas an independently and uniformly distributed birth-time tjassociated with it, the
set of pairs (θ1, t1),...,(θ|FΘ|, t|FΘ|)is itself distributed as a Poisson process on Θ× T , with intensity λ(θ, t) =
λΘ(θ)1[0,1](t). We write this extended primary process as F. Consistent with our use of FΘto represent the set of
locations of each point in F, we will use FTto represent the set of birth-times. Similarly, we will use Gfor the
extended Mat´
ern events, and GTfor the associated birth-times (and ˜
Gand ˜
GTfor their thinned counterparts).
Following Rao et al. [2017], we will specify the thinning process through a shadow function Hη: Θ × T [0,1]
parameterized by a possibly vector-valued η. This gives the probability that an event (θ, t)Θ× T is thinned by a
collection of events Gas
Hη((θ, t) ; G)=1Y
gG
[1 − Hη((θ, t) ; g)] ,(1)
where for a single event g= (θ, t),Hη((θ, t) ; (θ, t)) = 1[t,1](t)Kη(θ, θ). Note that the 1[t,1](t)formalizes the
fact that an event (θ, t)can only be thinned by earlier events. We will write Mat´ernThinK(F, η)for the sequential
thinning process that assigns elements of Fto one of Gor ˜
Gaccording to thinning kernel Kη(algorithm 1in the
appendix), and ProjA(·)for the operator that projects elements of a set on to some subspace A. The generative process
of GΘMat´ernProcessK(λ, η)can be written as
F|λPoissonProcess (λ(·,·)) ,
G, ˜
GF, KηMat´ernThinK(F, η), GΘ=ProjΘ(G).
With a Mat´
ern model of point pattern data, one seeks to infer the intensity function λΘ(θ)and the thinning parameters
ηfrom a realization GΘ. Observe that for the Mat´
ern type-III process, an event can only be thinned by a surviv-
ing event, so that the probability of thinning at any location depends only on the set G.Rao et al. [2017] showed
that in fact, conditioned on G, the events in ˜
Gare distributed as an inhomogeneous Poisson process with intensity
λΘ(θ)Hη((θ, t) ; G). This allowed them to develop an efficient Gibbs sampler when Mat´
ern events GΘare fully
observed. This proceeds by sequentially updating the thinned events ˜
G, the Mat´
ern birth times GT, the Poisson in-
tensity λΘand thinning kernel parameter η, each conditioned on the rest. The fact that the thinned events ˜
Gcan be
jointly sampled avoids the need for any birth-death steps in the MCMC algorithm, both simplifying the algorithm and
improving its efficiency. We adapt this sampler for our MCMC algorithm, where the fact that the Mat´
ern events are
hidden or partially observed will create new challenges. In the next section, we first describe our model that uses the
Mat´
ern process to impose repulsion between components of a finite mixture model.
3 Mat´
ern repulsive mixture model (MRMM)
We start with a primary Poisson process Fthat includes mixture weights, defining it on an extended space Θ× W × T
with T= [0,1],W= [0,). Write its intensity function as
λ(θ, w, t) = ¯
λ·pΘ(θ)·pW(w)·1[0,1](t).(2)
3
Bayesian Repulsive Mixture Modeling with Mat´
ern Point Processes A PREPRINT
We will set pW(w) = Gamma(w;α, 1), while pΘ(θ)will be a problem-specific prior over component parameters.
Unlike the Mat´
ern process, we model Fas a Poisson process conditioned to have at least one event. Given F, we will
produce a Mat´
ern realization G=(θ1, w1, t1),...,(θ|G|, w|G|, t|G|)by applying Mat´ernThinK(F, η)for some
kernel Kon Θwith parameter η. Each element (θ, w, t)Gwill form a component of a mixture model, with θand w
representing the parameter and unnormalized weight of that component; see also figure S1 in the appendix. Our model
thus serves as a prior over both the number of components in a mixture model, as well as the component weights and
locations. Since events in Fcan only be thinned by surviving events, our modified Mat´
ern prior on Fensures the
mixture model has at least one component.
For a set A, write PAfor the sum of its elements. Consistent with the notation of GΘand GT, we write GWfor
ProjW(G). Then, given G, the observed data X={xi, i = 1, . . . , n}is modeled as follows:
xi|GX
(θ,w,t)G
w
PGW
pX(·;θ), i = 1, . . . , n. (3)
Here, pX(·;θ)represents some family of probability densities parameterized by θΘ. As an example, if the
observations lie on a Euclidean space, pX(·;θ)could be a normal distribution, with θrepresenting the location and
variance of a component in a Gaussian mixture model. In this case, the density pΘ(θ)might be a Normal-Inverse-
Wishart distribution.
Note that when the ws are independent Gamma(α, 1) variables, the vector of normalized weights
w1PGW, . . . , w|G|PGWfollows a symmetric Dirichlet distribution with concentration parameter α[Devroye,
1986]. Thus, conditioned on the number of components, we assume a symmetric Dirichlet prior on the weights. If
the thinning kernel Kηequals 0, our model then reduces to a standard mixture model, with i.i.d. component param-
eters, Dirichlet-distributed component weights, and a conditional Poisson distribution on the number of component.
Different settings of Kη(hardcore, probabilistic or squared-exponential thinning) allow different kinds of repulsion
between the component parameters. Observe that repulsion is only between the component parameters θ(and not the
component weights w). Further, in many settings we allow Kηto only depend on a subset of the components of θ.
For instance, writing θ= (θµ, θσ)where θµis the component location and θσis the component variance, a common
requirement is to enforce repulsion only between the component locations, but not their variances. This can easily be
achieved by setting Kηto depend only on θµ.
All that is left to complete a Bayesian model is to specify hyperpriors on the hyperparameters ¯
λand η, as well as any
hyperparameters of the density pΘ(θ). The last is problem specific, and is no different from models without repulsion.
A natural prior for ¯
λis the Gamma distribution, while the choice of the hyperprior on ηwill depend on the type of
thinning kernel. In general, we recommend at least a mildly informative prior on the thinning parameter, as otherwise,
the posterior can settle on a model without any repulsion. For the hardcore process, where ηis the thinning radius, or
for the squared-exponential thinning kernel, where ηis the lengthscale parameter, we can use a Gamma hyperprior.
For probabilistic thinning, where η= (R, p), we can use a Beta prior on the thinning probability p, and a Gamma prior
on the thinning radius R. For this last model, we observe competitive performance even by simply fixing p= 0.95
and Rto a value slightly larger than the desired repulsive distance. We include further discussion of the choice of
hyperpriors in section 6and the appendix.
Write z= (z1, . . . , zn)for the collection of latent cluster assignments of the data in equation (3), with zi
{1,...,|G|}. Following notation in section 2, with hyperpriors omitted for simplicity, the generative process of
MRMM is
F|λPoissonProcess (λ(·)) |F|>0, G, ˜
GF, KηMat´ernThinK(F, η),(4)
zi|GMultinomial w1
PGW
,..., w|G|
PGW, xi|zi, G pX(·;θzi), i = 1, . . . , n.
Note that for convenience, in the last line above we have imposed an arbitrary ordering on the elements of G, and thus
the component identities, though the cluster indicators zireally take values in some categorical space. The proposition
below gives the joint density of all variables, and will be useful for deriving our posterior sampling algorithm.
Proposition 3.1. Write Pλfor the measure of a rate-λ(·)Poisson process on Θ× W × T . Then the measure of the
tuple X,G,˜
Ghas density with respect to Pλ×dxngiven by
pX, G, ˜
Gλ, η=1(|G˜
G|>0)
1eRΘ×W×T λ(θ,w,t) dθdwdt
Y
gG
[1 − Hη(g;G)] Y
˜g˜
G
Hη(˜g;G)
n
Y
i=1 X
(θ,w,t)G
w
PGW
pX(xi;θ).(5)
4
Bayesian Repulsive Mixture Modeling with Mat´
ern Point Processes A PREPRINT
4 Posterior inference for MRMM
Given a dataset X={x1, . . . , xn}from MRMM, we are interested in the posterior distribution pG, z,¯
λ, η X,
summarizing information about the component weights and locations (through G), and the cluster assignments
(through z). We construct a Markov chain Monte Carlo (MCMC) sampler to simulate from it. Our sampler is an
auxiliary variable Gibbs sampler, that for computational reasons, also imputes the thinned events ˜
G. The sampler
proceeds by sequentially updating the latent variables ¯
λ, η, G, ˜
Gand zaccording to their conditional posterior distri-
butions. Among these, the most challenging steps are updating Gand ˜
G: both of these are variable-dimension objects,
where not just the values but also the cardinality of the sets must be sampled. Given Mat´
ern events G, sampling ˜
G
resembles the sampling problem from Rao et al. [2017], where the Mat´
ern realization was completely observed. How-
ever, our modified prior on F(where |F|must be greater than 0) requires some care, and we provide a different and
cleaner derivation of this update step using Campbell’s theorem [Kingman,1992]. Updating Ggiven the rest is more
challenging, and we further augment our MCMC state space with an independent Poisson process ˜
F, and then update
the triplet (G, ˜
G, ˜
F)to (G,˜
G,˜
F)using a ‘relabeling’ process that keeps the union unchanged. This approach is
simpler than reversible-jump or birth-death approaches, with the augmented Poisson intensity trading-off mixing and
computation, and forming the only new parameter. Below, we present full details of the Gibbs steps.
1) Updating thinned events ˜
G:Given G, the thinned events ˜
Gare independent of the observations:
p(˜
G|¯
λ, η, G, z,X) = p(˜
G|¯
λ, η, G). Furthermore, events in ˜
Gcan only be thinned by events in G, suggesting
that conditioned on G, ¯
λ, η, the events within ˜
Gdo not interact with each other, and form a Poisson process. The result
below formalizes this:
Proposition 4.1. Given all other variables, the conditional distribution of the thinned events ˜
Gis a Poisson process
with intensity λ(·)Hη(·;G).
This result resembles that of Rao et al. [2017], though our derivation in the appendix exploits proposition 3.1 and
works with densities with respect to the rate-λPoisson measure, and is simpler and cleaner. Simulating such a Poisson
process is straightforward: simulate a Poisson process with intensity λ(·)on the whole space Θ× W × T , and then
keep each event ˜gin it with probability Hη(˜g;G)[Lewis and Shedler,1979]. This makes jointly updating the entire
set ˜
Geasy and efficient, without any tuning parameters.
2) Updating the Mat´
ern events G:This step is significantly more challenging, since unlike the thinned events, the
Mat´
ern events interact with each other, and with the clustering structure of the data. Consequently, we cannot simply
discard Gand sample a new realization. Instead, we produce a dependent update of G, through a Markov kernel that
targets this conditional distribution.
We first discard the cluster assignments z; note these can easily be resampled (see step 3 below). A naive approach
is then to make a pass through the elements of G˜
G, reassigning each to either Gor ˜
Gbased on the appropriate
conditional. This forms a standard sequence of Gibbs updates, and does not involve any reversible jump or stochastic
process simulation. At the end of this pass, we have an updated pair (G,˜
G), with Gpossibly having different
number of elements from G. While this keeps the union G˜
Gunchanged, our ability to efficiently update ˜
Gmight
suggest fast mixing.
In our experiments however, we observed poor mixing, especially with hardcore thinning. The latter setting forbids
elements of Gfrom lying within each others’ shadow, and also requires ˜
Gto lie in the shadow of G, making it hard
to switch an event from the Mat´
ern set to thinned set, or vice versa. To address this, we begin this step by augmenting
our MCMC state-space with an independent rate-γλ(·)Poisson process ˜
FΘ× W × T :
˜
Fγ, λ PoissonProcess (γλ(·)) .(6)
We call γ > 0the augmentation factor, which forms a parameter of our MCMC algorithm. Since ˜
Fis simulated
independently of all other variables, the joint distribution of (G, ˜
G, ˜
F)conditioned on all other variables has the
conditional distribution of (G, ˜
G)as its marginal. Having simulated ˜
F, we cycle through the elements of G˜
G˜
F,
sequentially relabeling each event as “survived”, “thinned” or “augmented” to produce a new triplet G˜
G˜
F.
This relabeling is carried to preserve the joint conditional of G,˜
G,˜
F, so that after discarding ˜
F, we have updated
(G, ˜
G)while maintaining their conditional distribution.
The augmented Poisson process ˜
Fmore easily allows events to be introduced into, and removed from G, especially
in the hardcore setting. Each relabeling step is straightforward, and requires computing a 3-component probability.
For each eG˜
G˜
F, write G\e,˜
G\eand ˜
F\efor the sets resulting from removing e(only one of these will
5
摘要:

BAYESIANREPULSIVEMIXTUREMODELINGWITHMAT´ERNPOINTPROCESSESAPREPRINTHanxiSunDepartmentofStatisticsPurdueUniversityWestLafayette,IN47906hanxi-sun@purdue.eduBoqianZhangDepartmentofStatisticsPurdueUniversityWestLafayette,IN47906zhan1977@purdue.eduVinayakRaoDepartmentofStatisticsPurdueUniversityWestLafay...

展开>> 收起<<
BAYESIAN REPULSIVE MIXTURE MODELING WITH MATERN POINT PROCESSES A P REPRINT.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:4.05MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注