Factorized Fusion Shrinkage for Dynamic Relational Data Peng Zhao Department of Applied Economics and Statistics University of Delaware

2025-04-27 0 0 1005.07KB 61 页 10玖币

侵权投诉

Factorized Fusion Shrinkage for Dynamic Relational Data

Peng Zhao

Department of Applied Economics and Statistics, University of Delaware

Anirban Bhattacharya, Debdeep Pati and Bani K. Mallick

Department of Statistics, Texas A&M University

July 16, 2024

Abstract

Modern data science applications often involve complex relational data with dynamic

structures. In systems that experience regime changes, such as changes in alliances between

nations after a war or air transportation networks in the wake of the COVID-19 pandemic,

abrupt alterations in the relational dynamics of such data are commonly observed. To address

this scenario, we propose a Factorized Fusion Shrinkage model, which involves dynamically

shrinking each decomposed factor of the low-rank approximation of the data. To achieve

the dynamic shrinkage, we use global-local shrinkage priors applied to successive diﬀerences

of the decomposed factors. The adopted prior preserves both the separability of clusters

and the long-range properties of latent factor dynamics, facilitating post-processing such as

cluster analysis and change-point detection. Under speciﬁc conditions, we prove that the

associated fractional posterior attains the minimax optimal rate up to logarithmic factors.

For eﬃcient computation, we introduce a structured mean-ﬁeld variational inference algorithm

that balances optimal posterior inference with computational scalability. Our framework is

versatile and can accommodate a wide range of models, including latent space models for

networks, dynamic matrix factorization, and low-rank tensor models. The eﬃcacy of our

methodology is tested through extensive simulations and real-world data analysis.

Keywords: Bayesian low-rank modeling; post-processing; posterior contraction; structured shrink-

age; tensor data; variational inference

arXiv:2210.00091v3 [stat.ME] 12 Jul 2024

1 Introduction

Relational data describes the relationship between two or more sets of variables and is typically

observed as matrices. One of the objectives of relational data analysis is to explain the variation

in the entries of a relational array through unobserved explanatory factors. For example, given

matrix-valued data Y∈Rn×p, the static low-rank plus noise model of the form Y=UV ′+E,

where A′is the transpose for a matrix A, max{rank(U),rank(V)} ≪ min{n, p}and Eis an error

term, has been extensively studied; see Hoﬀ (2007); Chatterjee (2015); Gavish and Donoho (2014);

Donoho et al. (2020) for a ﬂavor and connections with truncated singular value decompositions.

In recent decades, there has been a rapid growth in the interest in analyzing dynamic, complex

data sets. Models for dynamic relational data have found widespread application in dynamic social

network analysis (Sarkar and Moore,2005;Zhu et al.,2016), subspace tracking (Doukopoulos

and Moustakides,2008), traﬃc prediction (Tan et al.,2016), recommendation system (Zhang

et al.,2021) among others. Compared to static relational data, additional challenges are posed

when the observation matrices possess a dynamic structure. In particular, modeling the evolution

of Yt, t = 1, ..., T over time has been of particular interest. For example, the classical vector

autoregressive (VAR) model (Stock and Watson,2016;L¨utkepohl,2013) has been employed to

model the evolution of the observations directly and Hoﬀ (2015) considered a bi-linear form of the

autoregressive model; Hoﬀ (2011) and Friel et al. (2016) parameterized time-varying latent factors

in terms of static factors with time-varying weights or coeﬃcients for bipartite network models;

Sarkar and Moore (2005) used an AR(1) type of evolution of the latent factors in the context of

latent space models for dynamic networks.

Abrupt changes are important examples of non-stationarity, typically observed in systems that

undergo regime changes due to an intervention, such as alliances between nations before/after a

war (Gibler,2008), voting records before and after an election (Lee et al.,2004), or the impact of

protein networks after a treatment (Hegde et al.,2008). This article presents a novel shrinkage

process for dynamic relational data to handle such abrupt changes: given the dynamic likelihood

Yt=UtV′

t+Et,t= 1, ..., T , we introduce time-dependence by shrinking the successive diﬀerences

(e.g., uit −ui(t−1)) between the row vectors {uit},{vit}of Ut,Vtto group-wise sparse vectors.

In particular, we propose a factorized fusion shrinkage (FFS) prior, where group-wise global-

local shrinkage priors on all the transitions of latent factors are applied to promote group-wise

fusion structures. The adopted global-local prior has a suﬃcient mass around zero, eﬀectively

shrinking transitions to zeros while keeping large enough transitions, promoting fusion structures.

In addition, all components of the transitions share the same local scales from the prior, leading

to interpretable group-wise shrinkage. The proposed model attempts to account for the dynamic

change and dependence across Ytby reﬂecting the piecewise constant changes among Utand Vt.

We also propose a symmetric version of the model which is useful for modeling adjacency matrices

in network models.

The proposed FFS priors diﬀer substantially from those in literature analyzing dynamic re-

lational data, like normal transition priors with a common variance and Gaussian process priors

(Sarkar and Moore,2005;Sun et al.,2014;Sewell and Chen,2015;Durante and Dunson,2014;

Sewell and Chen,2017;Zhang et al.,2018). While the above commonly used dynamic priors tend

to introduce the smoothness eﬀect among dynamic transitions, FFS priors are designed to intro-

duce the stopping and separation eﬀects so that the transitions are either closed to zeros or large.

In particular, the stopping eﬀect of FFS priors can signiﬁcantly enhance the interpretation of the

dynamics of latent vectors. Furthermore, the separation eﬀect of FFS priors contributes to the de-

tection of clusters among subjects after estimating the latent factors. The detected cluster results

provide information on which subjects have similar eﬀects in generating the relational data. When

performing clustering algorithms such as K-means, the cluster separation δt= mini̸=j∥uit −ujt∥2

for any uit,ujt not in the same cluster plays a crucial role in determining the diﬃculty level of

the problem (e.g., eigengap for spectral clustering, see Ng et al. (2001)). Generally, the larger δt

is, the easier it is to detect clusters at time t. The separation eﬀect of FFS priors is justiﬁed by

encouraging more separation among clusters: suppose at time twe have uit =ujt for subjects i, j,

and at time t+1, subject imoves to another cluster while subject jstays in the same cluster. Then

the cluster separation at time t+1 satisﬁes δt+1 ≤ ∥uj(t+1) −ui(t+1)∥2=∥uj(t+1) −ujt∥2=∥Dujt∥2,

which highlights the importance of shrinking towards larger transitions. In contrast, priors that

introduce smoothness to the transitions tend to impede the separation of the clusters. Overall, the

model explicitly links the transitions of latent factors with the changing of cluster memberships for

each subject: a zero transition implies no change in cluster membership, while a change in cluster

membership implies a large transition.

Dynamic shrinkage priors have been studied in a wide range of literature (Fr¨uhwirth-Schnatter

and Wagner,2010;Chan et al.,2012;Nakajima and West,2013;Kalli and Griﬃn,2014;Kowal et al.,

2019). In this article, we consider matrix-valued responses with complex dependence structures

using a structural dynamic shrinkage model for the transition of the latent factors and illustrate

its applicability across a wide range of problems. Additionally, we provide an in-depth theoretical

analysis of the proposed prior. We ﬁrst derive an informative-theoretic lower bound for the model

under an appropriate parameter space, incorporating both the initial estimation errors of the

matrix and the selection errors due to the sparsity of the transitions. Using the proposed prior

concentration and a fractional posterior device, we demonstrate that the fractional posterior under

the proposed prior can achieve minimax rates up to logarithmic factors. To our knowledge, both

the lower bound and fractional posterior concentration are the ﬁrst results reported in the related

literature. Note that additional works are needed to prove similar results for the usual posterior.

In addition, it has been established that a frequentist approach cannot achieve this fusion type

of minimax optimal rate when ℓ1-regularization is used (Fan and Guan,2018). Therefore, such

near-optimal convergence rates improve upon related models via ℓ1penalized approaches.

Finally, using dynamic network models as an example, we oﬀer theoretical support for the

proposed post-processing technique. While posterior post-processing has been increasingly popular

in the Bayesian literature, theoretical validations are comparatively rare; see Lee and Lee (2021)

for a recent example in a diﬀerent context. In particular, dynamic comparisons, cluster analysis,

and change point detection are considered. We show that the optimal estimation of the proposed

method can help achieve better performance in these post-processing tasks.

From a computational aspect, we present posterior approximations based on variational infer-

ence for scalability and computational eﬃciency while noting that MCMC approaches can also

be readily developed. We consider a structured mean-ﬁeld (SMF) variational inference framework

where the temporal dependence is taken into account. A corresponding coordinate ascent varia-

tional inference (Bishop and Nasrabadi (2006), CAVI) algorithm is developed to incorporate the

temporal dependence into the variational inference where simple closed-form updatings can be

achieved. The proposed algorithm is more eﬃcient than gradient descent or other ﬁrst-order algo-

rithm types with little increase in the complexity per iteration since the computation can utilize

the banded (or block tri-diagonal) structure of the second-order moments, which incurs a O(T d3)

cost for matrix inversion. The overall complexity is O(npT d3), which is similar to the complexity

described in the related literature, for example, in Matias and Miele (2017). However, the existing

literature discusses cases where the number of time points Tis small (e.g., in Matias and Miele

(2017), the real data contains around 5 time points), whereas our algorithm is speciﬁcally designed

for scenarios with a large number of time points (e.g., T= 200). Finally, we extend our SMF

variational inference framework to tensor data by utilizing a CP type of low-rank factorization in

the appendix.

Notation. For a vector x, we use ∥x∥2,∥x∥1,∥x∥∞to represent its ℓ2,ℓ1and ℓ∞norms and

x′as its transpose. For a matrix A, let ∥A∥Fbe its Frobenius norm. We use and 1to denote

the identity matrix and vector with all ones. Suppose Pand Qare probability measures on a

common probability space with density pand q. We use DKL {p|| q}=Rplog(p/q)dµ to denote

the KL divergence and Dα{p|| q}= log Rpαq1−αdµ to denote the R´enyi divergence of order α.

Given sequences anand bn, we denote an=O(bn) or an≲bnif there exists a constant C > 0 such

that an≤Cbnfor all large enough n. Similarly, we deﬁne an≳bn. In addition, let an=o(bn) to

be limn→∞ an/bn= 0. Let PXdenote a probability distribution with parameter X, and pXdenote

its density. Denote EXas the expectation taken with respect to a variable x. Let N(µ, σ) be the

normal distribution with mean µand variance σwhile N(x;µ, σ) be the density at x.

2 Methodology

2.1 Factorized Fusion Shrinkage

First, we lay down the factorized fusion shrinkage (FFS) approach for dynamic matrix-valued data.

Speciﬁcally, let Y={Yt}T

t=1 be the observed data, where Yt∈Rn×pis an matrix-valued observa-

tion corresponding to the tth time point. For example, in the context of daily air transportation

networks providing ﬂights between cities, where the entries of the matrices represent whether the

airlines provide ﬂights between two cities on a given date, Ytis a matrix with rows and columns

representing the city of origin, city of destination and the time-index truns over days. For such

data, we consider the following dynamic model:

Yt∼p(Mt;β),Mt=UtV′

t, t ∈[T] : = {1, . . . , T },(1)

where uit,vjt ∈Rdfor each t∈[T], i∈[n] and j∈[p] are the row vectors of Utand Vt

respectively; Ep(Yt|Mt) = g(Mt) for some link function gwhich operates elementwise on a

matrix; and βrepresents additional parameters (e.g., variance, subject-speciﬁc eﬀects, etc.).

Even though the Singular value decomposition (SVD) vastly reduces the eﬀective number of pa-

rameters for static matrix factorization models, additional structural assumptions are necessitated

in the dynamic setting to reduce model complexity. We achieve this by proposing a parsimonious

yet ﬂexible evolution of latent factor vectors. To that end, we ﬁrst introduce some notation. For

each deﬁne an n×dmatrix Ut= [u1t, ..., unt]′with uit ∈Rdfor each i∈[n] and similarly

Vt= [v1t, ..., vpt]′with vjt ∈Rdfor each j∈[p]. One may interpret uit as d-dimensional vector

of latent factors for the ith data unit of the column of the tmatrix at time t. For example, in

the air transportation network setting, uit represents latent factors corresponding to the ith city

of origin at time t. We consider the following group-wise fusion structure on the evolution of the

latent factors: T

t=2

i=1 {Duit ̸=0} ≤ su,and

t=2

j=1 {Dvjt ̸=0} ≤ sv,(2)

where Duit =uit −ui(t−1) and a̸=0dmeans that a∈Rdis diﬀerent from a zero vector. In

particular, when Duit =0d, the entire eﬀect of subject iremains unchanged from time point tto

t+ 1. Therefore, when su≪nand sv≪p, the proposed dynamic fusion structure in equation (2)

sparsely constrains the transitions over time in a group-wise manner and signiﬁcantly constrains

the active number of parameters across all time points.

We operate in a Bayesian framework and adopt a continuous shrinkage framework to shrink

towards the fusion structure in (2). Speciﬁcally, we propose a factorized fusion shrinkage (FFS)

prior, which employs group-wise global-local shrinkage priors to model the transition of the latent

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FactorizedFusionShrinkageforDynamicRelationalDataPengZhaoDepartmentofAppliedEconomicsandStatistics,UniversityofDelawareAnirbanBhattacharya,DebdeepPatiandBaniK.MallickDepartmentofStatistics,TexasA&MUniversityJuly16,2024AbstractModerndatascienceapplicationsofteninvolvecomplexrelationaldatawithdynamics...

展开>> 收起<<

Factorized Fusion Shrinkage for Dynamic Relational Data Peng Zhao Department of Applied Economics and Statistics University of Delaware.pdf

共61页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Factorized Fusion Shrinkage for Dynamic Relational Data Peng Zhao Department of Applied Economics and Statistics University of Delaware

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: