Bayesian Hyperbolic Multidimensional Scaling Bolun Liu Department of Biostatistics

2025-05-06 0 0 2.79MB 42 页 10玖币
侵权投诉
Bayesian Hyperbolic Multidimensional Scaling
Bolun Liu
Department of Biostatistics
Bloomberg School of Public Health
Johns Hopkins University
Shane Lubold
Department of Statistics
University of Washington
Adrian E. Raftery
Department of Statistics
Department of Sociology
University of Washington
Tyler H. McCormick
Department of Statistics
Department of Sociology
University of Washington
Abstract
Multidimensional scaling (MDS) is a widely used approach to representing high-dimensional,
dependent data. MDS works by assigning each observation a location on a low-dimensional
geometric manifold, with distance on the manifold representing similarity. We propose a
Bayesian approach to multidimensional scaling when the low-dimensional manifold is hyper-
bolic. Using hyperbolic space facilitates representing tree-like structures common in many
settings (e.g. text or genetic data with hierarchical structure). A Bayesian approach pro-
vides regularization that minimizes the impact of measurement error in the observed data
and assesses uncertainty. We also propose a case-control likelihood approximation that al-
lows for efficient sampling from the posterior distribution in larger data settings, reducing
computational complexity from approximately O(n2) to O(n). We evaluate the proposed
method against state-of-the-art alternatives using simulations, canonical reference datasets,
Indian village network data, and human gene expression data. Code to reproduce the result
in the paper is available at https://github.com/peterliu599/BHMDS.
Keywords: Bayesian methods, Hyperbolic geometry, Multidimensional scaling
1 Introduction
Multidimensional scaling (MDS) methods represent high-dimensional data in a low-dimensional
space, using dissimilarities between the observations as a means of identifying positions (Kruskal,
The authors gratefully acknowledge support from NIH grants R01 HD070936 (Raftery) and DP2 MH122405
(McCormick). Correspondence: tylermc@uw.edu.
1
arXiv:2210.15081v3 [stat.ME] 15 Aug 2023
1964). Observations with small dissimilarities will be placed close together, while those with larger
dissimilarities will be placed further apart. A long literature on MDS methodology illustrates
the utility of MDS as a means of summarizing complex, dependent data and for downstream
applications, such as detecting clusters from the dissimilarities (Borg and Groenen,1997;Davison,
1983;Cox and Cox,2001).
In many settings, the observed dissimilarities we wish to apply MDS methods to are likely to
contain measurement error. These errors could arise from misreporting in the context of the social
sciences (e.g. a retrospective behavioral inventory) or from miscalibration of machinery or operator
error in industrial settings. Using a probabilistic model is one way to account for this additional
uncertainty. Takane and Caroll (1981); Groenen (1993); MacKay (1989) and others have proposed
maximum likelihood MDS methods for handling measurement error. The use of these methods
relies on asymptotic theory, which might not apply to sample sizes used in applications, and the
problem requires solving a non-linear optimization problem where the number of parameters grows
quickly as the sample size grows (Cox and Cox,2001). One potential framework to address these
potential issues with MDS is a Bayesian framework. Oh and Raftery (2001) provided a Bayesian
procedure to estimate the configuration of objects given (potentially noisy) dissimilarities between
objects. Extensions of Bayesian MDS to the case of large datasets were discussed in Holbrook et
al. (2021).
Along with the statistical framework, another critical, but often ignored, choice in implement-
ing MDS is the choice of geometry for the low-dimensional manifold. Multidimensional scaling
methods often assume the observed dissimilarities are computed using Euclidean distances among
objects in a Euclidean space. Oh and Raftery (2001), for example, assume a Euclidean distance
model with a Gaussian measurement error and propose a Markov-Chain Monte Carlo algorithm
to compute a Bayesian solution for the object configuration. Yet there is a growing literature
showing that representing objects in other embedding spaces might lead to better representations
and therefore be more useful in downstream tasks (De Sa et al.,2018;Chami et al.,2020;Smith et
al.,2019;Lubold et al.,2020;Keller-Ressel and Nargang,2020). In particular, hyperbolic spaces,
defined in Section 3, have been shown to produce embeddings with lower distortion, especially for
data that is hierarchical or tree-like.
In this work, we combine the hyperbolic MDS methods with a Bayesian procedure. Specifi-
2
cally, we apply the Bayesian MDS procedure from Oh and Raftery (2001) to a hyperbolic space.
Our model posits that the objects of interest are located in positions on a low-dimensional hyper-
bolic space, with the true dissimilarities between objects being the hyperbolic distances between
positions. We further assume that the observed dissimilarities are the true dissimilarities plus a
Gaussian error. We then derive a Markov-chain Monte Carlo (MCMC) method we use to obtain
Bayesian estimates of the object positions, which allows practitioners to calculate distance-based
measures (e.g. distances between positions, distances to the hyperbolic origin, etc.) that are often
the pragmatic goals in applications.
The paper is organized as follows. In Section 3, we formally define the Hyperbolic space
model used in this work and posit a model for observed dissimilarities computed from points in
Hyperbolic space. We then discuss a prior distribution over this space and derive an MCMC
algorithm to draw samples from the posterior in Section 4. Sections 5and 6contain simulations
and applications of our method to real datasets in genomics. We conclude in Section 7.
2 Hyperbolic Geometry
We now discuss the mathematical details of the hyperbolic geometry. The hyperbolic geometry is a
non-Euclidean geometry that has a constant negative curvature and is commonly visualized as the
upper sheet of the unit two-sheet hyperboloid. There exist multiple equivalent hyperbolic models,
such as the Klein model, the Poincar´e disk model, and the Lorentz (Hyperboloid/Minkowski)
model. We use the Lorentz model to parameterize the hyperbolic geometry, which parallels the
representation used in existing hyperbolic MDS algorithms (Keller-Ressel and Nargang,2020;
De Sa et al.,2018). This representation also facilitates convenient priors for our Bayesian model
in Section 3.
To define the Lorentz model, we begin with the definition of the Lorentzian product. For any
x= (x0, . . . , xp) and y= (y0, . . . , yp)Rp+1, the Lorentzian product x,yLis defined as
x,yL≡ −x0y0+
p
X
i=1
xiyi.
The p-dimensional Lorentz model with curvature κ, which we denote by Hp(κ), can be repre-
sented as a collection of coordinates xRp+1 with x0>0 such that its Lorentzian product with
3
itself is 1 and equipped with the hyperbolic distance proportional to the square root of κ. That
is,
Hp(κ)xRp+1 :x0>0,x,xL=1, κ > 0,
equipped with the hyperbolic distance
dHp(κ)(x,y)1
κarccosh (x,yL),(1)
which is the geodesic distance between xand yon Hp(κ). Specifically, the curvature κcontrols
the hyperbolicity of the geometry so that the space becomes more hyperbolic as κbecomes more
negative and becomes flatter as κshrinks to zero (Euclidean geometry has curvature exactly 0).
3 Bayesian Modeling Framework
We now describe our statistical framework for Bayesian Hyperbolic Multidimensional Scaling
(BHMDS), which represents the objects of interest by coordinates in hyperbolic geometry, so that
the hyperbolic distances between objects resemble their true dissimilarity measures. We suppose
the objects dwell on Hp(κ) with coordinates x1,x2,··· ,xn, and denote by δij the dissimilarity
measure between object iand object jas well as the hyperbolic distance between xiand xj:
δij dHp(κ)(xi,xj) = 1
κarccos xi,xjL, i, j = 1,··· , n . (2)
In many settings, the observed dissimilarities contain measurement errors. As in Oh and
Raftery (2001), we represent the observed dissimilarity dij as the true dissimilarity plus a Gaussian
error, with the constraint that the observed dissimilarity is always positive. We therefore assume
that dij follows a truncated normal distribution:
dij Nδij, σ2I(dij >0) , i < j, i, j = 1, . . . , n , (3)
where δij is as defined in (2), xiare unobserved, and σ2is the variance of the measurement error.
Given the statistical model, we now specify priors for X={x1,··· ,xn}and σ2. First note
that, using (3), the likelihood of the unknown parameters Xand σ2is
lD|X, σ2σ2m/2exp "1
2σ2SSR X
i<j
log Φ δij
σ#,(4)
4
where m=n(n1)/2 is the number of dissimilarities, SSR =Pi<j (δij dij)2is the sum
of squared residuals, Φ is the cumulative distribution function of the standard normal random
variable, and D={dij }i,j=1,··· ,n is the matrix of observed dissimilarities. The square root of the
SSR term in the likelihood is often referred to as stress in the MDS literature, meaning that our
approach falls under the umbrella of stress-minimizing approaches to MDS.
We now elaborate on the prior choice for hyperbolic coordinates. Due to the constraint in (2),
hyperbolic coordinate distributions are generally asymmetric and complex in form, often leading to
intractable full conditional posterior distributions. In such settings, Metropolis-Hasting samplers
are computationally challenging. Therefore, the choice of a prior structure has substantial implica-
tions for the feasibility of posterior sampling. We use the hyperbolic wrapped normal distribution
centered on the hyperbolic origin (Nagano et al.,2019) as our hyperbolic prior. This prior distri-
bution equalizes sampling on Hpto sampling on Rpby defining a one-to-one transformation from
Rpto Hp. Specifically, to sample from the hyperbolic prior, we first sample v∼ Np(0,Λ) Rp,
then transform vRpto xHpby transformation T(·) specified by the prior,
x=T(v) = cosh (˜
vL)µp
0+ sinh (˜
vL)˜
v
˜
vL
,(5)
where µp
0= (1,0,··· ,0) Hpis the hyperbolic origin, ˜
v= (0,v)Rp+1, and ˜
vLp˜
v,˜
vL.
The procedure produces, x, which is a sample from the hyperbolic wrapped normal distribution.
The transformation T(·) enables us to reparameterize the likelihood in (4) using V={v1,··· ,vn},
such that
lD|V, σ2=lD|X, σ2σ2m/2exp "1
2σ2SSR X
i<j
log Φ δij
σ#,(6)
where δij =dHp(κ)(T(vi), T (vj)). Imposing a Gaussian prior on Vis equivalent to a hyperbolic
wrapped normal prior on X. The transformation allows us to sample in the space of the Gaussian
prior, meaning we can use a simple random walk Metropolis-Hasting algorithm to sample from
the posterior.
We next define the prior for the observation error variance, σ2. We use a conjugate prior
σ2IG(a, b), the Inverse Gamma distribution with mode b/(a+ 1). For the hyperprior on the
diagonal entries of the auxiliary variable variance matrix Λ = Diag (λ1, . . . , λp), we also assume a
conjugate Inverse Gamma prior, λjIG (α, βj), independently for j= 1,··· , p. We will further
5
摘要:

BayesianHyperbolicMultidimensionalScalingBolunLiu∗DepartmentofBiostatisticsBloombergSchoolofPublicHealthJohnsHopkinsUniversityShaneLuboldDepartmentofStatisticsUniversityofWashingtonAdrianE.RafteryDepartmentofStatisticsDepartmentofSociologyUniversityofWashingtonTylerH.McCormickDepartmentofStatisticsD...

展开>> 收起<<
Bayesian Hyperbolic Multidimensional Scaling Bolun Liu Department of Biostatistics.pdf

共42页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:42 页 大小:2.79MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 42
客服
关注