Factorized Fusion Shrinkage for Dynamic Relational Data Peng Zhao Department of Applied Economics and Statistics University of Delaware

2025-04-27 0 0 1005.07KB 61 页 10玖币
侵权投诉
Factorized Fusion Shrinkage for Dynamic Relational Data
Peng Zhao
Department of Applied Economics and Statistics, University of Delaware
Anirban Bhattacharya, Debdeep Pati and Bani K. Mallick
Department of Statistics, Texas A&M University
July 16, 2024
Abstract
Modern data science applications often involve complex relational data with dynamic
structures. In systems that experience regime changes, such as changes in alliances between
nations after a war or air transportation networks in the wake of the COVID-19 pandemic,
abrupt alterations in the relational dynamics of such data are commonly observed. To address
this scenario, we propose a Factorized Fusion Shrinkage model, which involves dynamically
shrinking each decomposed factor of the low-rank approximation of the data. To achieve
the dynamic shrinkage, we use global-local shrinkage priors applied to successive differences
of the decomposed factors. The adopted prior preserves both the separability of clusters
and the long-range properties of latent factor dynamics, facilitating post-processing such as
cluster analysis and change-point detection. Under specific conditions, we prove that the
associated fractional posterior attains the minimax optimal rate up to logarithmic factors.
For efficient computation, we introduce a structured mean-field variational inference algorithm
that balances optimal posterior inference with computational scalability. Our framework is
versatile and can accommodate a wide range of models, including latent space models for
networks, dynamic matrix factorization, and low-rank tensor models. The efficacy of our
methodology is tested through extensive simulations and real-world data analysis.
Keywords: Bayesian low-rank modeling; post-processing; posterior contraction; structured shrink-
age; tensor data; variational inference
1
arXiv:2210.00091v3 [stat.ME] 12 Jul 2024
1 Introduction
Relational data describes the relationship between two or more sets of variables and is typically
observed as matrices. One of the objectives of relational data analysis is to explain the variation
in the entries of a relational array through unobserved explanatory factors. For example, given
matrix-valued data YRn×p, the static low-rank plus noise model of the form Y=UV +E,
where Ais the transpose for a matrix A, max{rank(U),rank(V)} ≪ min{n, p}and Eis an error
term, has been extensively studied; see Hoff (2007); Chatterjee (2015); Gavish and Donoho (2014);
Donoho et al. (2020) for a flavor and connections with truncated singular value decompositions.
In recent decades, there has been a rapid growth in the interest in analyzing dynamic, complex
data sets. Models for dynamic relational data have found widespread application in dynamic social
network analysis (Sarkar and Moore,2005;Zhu et al.,2016), subspace tracking (Doukopoulos
and Moustakides,2008), traffic prediction (Tan et al.,2016), recommendation system (Zhang
et al.,2021) among others. Compared to static relational data, additional challenges are posed
when the observation matrices possess a dynamic structure. In particular, modeling the evolution
of Yt, t = 1, ..., T over time has been of particular interest. For example, the classical vector
autoregressive (VAR) model (Stock and Watson,2016;L¨utkepohl,2013) has been employed to
model the evolution of the observations directly and Hoff (2015) considered a bi-linear form of the
autoregressive model; Hoff (2011) and Friel et al. (2016) parameterized time-varying latent factors
in terms of static factors with time-varying weights or coefficients for bipartite network models;
Sarkar and Moore (2005) used an AR(1) type of evolution of the latent factors in the context of
latent space models for dynamic networks.
Abrupt changes are important examples of non-stationarity, typically observed in systems that
undergo regime changes due to an intervention, such as alliances between nations before/after a
war (Gibler,2008), voting records before and after an election (Lee et al.,2004), or the impact of
protein networks after a treatment (Hegde et al.,2008). This article presents a novel shrinkage
process for dynamic relational data to handle such abrupt changes: given the dynamic likelihood
Yt=UtV
t+Et,t= 1, ..., T , we introduce time-dependence by shrinking the successive differences
(e.g., uit ui(t1)) between the row vectors {uit},{vit}of Ut,Vtto group-wise sparse vectors.
In particular, we propose a factorized fusion shrinkage (FFS) prior, where group-wise global-
local shrinkage priors on all the transitions of latent factors are applied to promote group-wise
fusion structures. The adopted global-local prior has a sufficient mass around zero, effectively
shrinking transitions to zeros while keeping large enough transitions, promoting fusion structures.
In addition, all components of the transitions share the same local scales from the prior, leading
to interpretable group-wise shrinkage. The proposed model attempts to account for the dynamic
change and dependence across Ytby reflecting the piecewise constant changes among Utand Vt.
2
We also propose a symmetric version of the model which is useful for modeling adjacency matrices
in network models.
The proposed FFS priors differ substantially from those in literature analyzing dynamic re-
lational data, like normal transition priors with a common variance and Gaussian process priors
(Sarkar and Moore,2005;Sun et al.,2014;Sewell and Chen,2015;Durante and Dunson,2014;
Sewell and Chen,2017;Zhang et al.,2018). While the above commonly used dynamic priors tend
to introduce the smoothness effect among dynamic transitions, FFS priors are designed to intro-
duce the stopping and separation effects so that the transitions are either closed to zeros or large.
In particular, the stopping effect of FFS priors can significantly enhance the interpretation of the
dynamics of latent vectors. Furthermore, the separation effect of FFS priors contributes to the de-
tection of clusters among subjects after estimating the latent factors. The detected cluster results
provide information on which subjects have similar effects in generating the relational data. When
performing clustering algorithms such as K-means, the cluster separation δt= mini̸=juit ujt2
for any uit,ujt not in the same cluster plays a crucial role in determining the difficulty level of
the problem (e.g., eigengap for spectral clustering, see Ng et al. (2001)). Generally, the larger δt
is, the easier it is to detect clusters at time t. The separation effect of FFS priors is justified by
encouraging more separation among clusters: suppose at time twe have uit =ujt for subjects i, j,
and at time t+1, subject imoves to another cluster while subject jstays in the same cluster. Then
the cluster separation at time t+1 satisfies δt+1 ≤ ∥uj(t+1) ui(t+1)2=uj(t+1) ujt2=Dujt2,
which highlights the importance of shrinking towards larger transitions. In contrast, priors that
introduce smoothness to the transitions tend to impede the separation of the clusters. Overall, the
model explicitly links the transitions of latent factors with the changing of cluster memberships for
each subject: a zero transition implies no change in cluster membership, while a change in cluster
membership implies a large transition.
Dynamic shrinkage priors have been studied in a wide range of literature (Fr¨uhwirth-Schnatter
and Wagner,2010;Chan et al.,2012;Nakajima and West,2013;Kalli and Griffin,2014;Kowal et al.,
2019). In this article, we consider matrix-valued responses with complex dependence structures
using a structural dynamic shrinkage model for the transition of the latent factors and illustrate
its applicability across a wide range of problems. Additionally, we provide an in-depth theoretical
analysis of the proposed prior. We first derive an informative-theoretic lower bound for the model
under an appropriate parameter space, incorporating both the initial estimation errors of the
matrix and the selection errors due to the sparsity of the transitions. Using the proposed prior
concentration and a fractional posterior device, we demonstrate that the fractional posterior under
the proposed prior can achieve minimax rates up to logarithmic factors. To our knowledge, both
the lower bound and fractional posterior concentration are the first results reported in the related
literature. Note that additional works are needed to prove similar results for the usual posterior.
3
In addition, it has been established that a frequentist approach cannot achieve this fusion type
of minimax optimal rate when 1-regularization is used (Fan and Guan,2018). Therefore, such
near-optimal convergence rates improve upon related models via 1penalized approaches.
Finally, using dynamic network models as an example, we offer theoretical support for the
proposed post-processing technique. While posterior post-processing has been increasingly popular
in the Bayesian literature, theoretical validations are comparatively rare; see Lee and Lee (2021)
for a recent example in a different context. In particular, dynamic comparisons, cluster analysis,
and change point detection are considered. We show that the optimal estimation of the proposed
method can help achieve better performance in these post-processing tasks.
From a computational aspect, we present posterior approximations based on variational infer-
ence for scalability and computational efficiency while noting that MCMC approaches can also
be readily developed. We consider a structured mean-field (SMF) variational inference framework
where the temporal dependence is taken into account. A corresponding coordinate ascent varia-
tional inference (Bishop and Nasrabadi (2006), CAVI) algorithm is developed to incorporate the
temporal dependence into the variational inference where simple closed-form updatings can be
achieved. The proposed algorithm is more efficient than gradient descent or other first-order algo-
rithm types with little increase in the complexity per iteration since the computation can utilize
the banded (or block tri-diagonal) structure of the second-order moments, which incurs a O(T d3)
cost for matrix inversion. The overall complexity is O(npT d3), which is similar to the complexity
described in the related literature, for example, in Matias and Miele (2017). However, the existing
literature discusses cases where the number of time points Tis small (e.g., in Matias and Miele
(2017), the real data contains around 5 time points), whereas our algorithm is specifically designed
for scenarios with a large number of time points (e.g., T= 200). Finally, we extend our SMF
variational inference framework to tensor data by utilizing a CP type of low-rank factorization in
the appendix.
Notation. For a vector x, we use x2,x1,xto represent its 2,1and norms and
xas its transpose. For a matrix A, let AFbe its Frobenius norm. We use and 1to denote
the identity matrix and vector with all ones. Suppose Pand Qare probability measures on a
common probability space with density pand q. We use DKL {p|| q}=Rplog(p/q)to denote
the KL divergence and Dα{p|| q}= log Rpαq1αto denote the R´enyi divergence of order α.
Given sequences anand bn, we denote an=O(bn) or anbnif there exists a constant C > 0 such
that anCbnfor all large enough n. Similarly, we define anbn. In addition, let an=o(bn) to
be limn→∞ an/bn= 0. Let PXdenote a probability distribution with parameter X, and pXdenote
its density. Denote EXas the expectation taken with respect to a variable x. Let N(µ, σ) be the
normal distribution with mean µand variance σwhile N(x;µ, σ) be the density at x.
4
2 Methodology
2.1 Factorized Fusion Shrinkage
First, we lay down the factorized fusion shrinkage (FFS) approach for dynamic matrix-valued data.
Specifically, let Y={Yt}T
t=1 be the observed data, where YtRn×pis an matrix-valued observa-
tion corresponding to the tth time point. For example, in the context of daily air transportation
networks providing flights between cities, where the entries of the matrices represent whether the
airlines provide flights between two cities on a given date, Ytis a matrix with rows and columns
representing the city of origin, city of destination and the time-index truns over days. For such
data, we consider the following dynamic model:
Ytp(Mt;β),Mt=UtV
t, t [T] : = {1, . . . , T },(1)
where uit,vjt Rdfor each t[T], i[n] and j[p] are the row vectors of Utand Vt
respectively; Ep(Yt|Mt) = g(Mt) for some link function gwhich operates elementwise on a
matrix; and βrepresents additional parameters (e.g., variance, subject-specific effects, etc.).
Even though the Singular value decomposition (SVD) vastly reduces the effective number of pa-
rameters for static matrix factorization models, additional structural assumptions are necessitated
in the dynamic setting to reduce model complexity. We achieve this by proposing a parsimonious
yet flexible evolution of latent factor vectors. To that end, we first introduce some notation. For
each define an n×dmatrix Ut= [u1t, ..., unt]with uit Rdfor each i[n] and similarly
Vt= [v1t, ..., vpt]with vjt Rdfor each j[p]. One may interpret uit as d-dimensional vector
of latent factors for the ith data unit of the column of the tmatrix at time t. For example, in
the air transportation network setting, uit represents latent factors corresponding to the ith city
of origin at time t. We consider the following group-wise fusion structure on the evolution of the
latent factors: T
X
t=2
n
X
i=1 {Duit ̸=0} ≤ su,and
T
X
t=2
p
X
j=1 {Dvjt ̸=0} ≤ sv,(2)
where Duit =uit ui(t1) and a̸=0dmeans that aRdis different from a zero vector. In
particular, when Duit =0d, the entire effect of subject iremains unchanged from time point tto
t+ 1. Therefore, when sunand svp, the proposed dynamic fusion structure in equation (2)
sparsely constrains the transitions over time in a group-wise manner and significantly constrains
the active number of parameters across all time points.
We operate in a Bayesian framework and adopt a continuous shrinkage framework to shrink
towards the fusion structure in (2). Specifically, we propose a factorized fusion shrinkage (FFS)
prior, which employs group-wise global-local shrinkage priors to model the transition of the latent
5
摘要:

FactorizedFusionShrinkageforDynamicRelationalDataPengZhaoDepartmentofAppliedEconomicsandStatistics,UniversityofDelawareAnirbanBhattacharya,DebdeepPatiandBaniK.MallickDepartmentofStatistics,TexasA&MUniversityJuly16,2024AbstractModerndatascienceapplicationsofteninvolvecomplexrelationaldatawithdynamics...

展开>> 收起<<
Factorized Fusion Shrinkage for Dynamic Relational Data Peng Zhao Department of Applied Economics and Statistics University of Delaware.pdf

共61页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:61 页 大小:1005.07KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 61
客服
关注