Bayesian Tensor-on-Tensor Regression with Ecient Computation Kunbo Wang Yanxun Xu

2025-05-08 0 0 937.77KB 37 页 10玖币

侵权投诉

Bayesian Tensor-on-Tensor Regression with Eﬃcient

Computation

Kunbo Wang Yanxun Xu*

Department of Applied Mathematics and Statistics, Johns Hopkins University

Abstract

We propose a Bayesian tensor-on-tensor regression approach to predict a multidi-

mensional array (tensor) of arbitrary dimensions from another tensor of arbitrary di-

mensions, building upon the Tucker decomposition of the regression coeﬃcient tensor.

Traditional tensor regression methods making use of the Tucker decomposition either

assume the dimension of the core tensor to be known or estimate it via cross-validation

or some model selection criteria. However, no existing method can simultaneously

estimate the model dimension (the dimension of the core tensor) and other model pa-

rameters. To ﬁll this gap, we develop an eﬃcient Markov Chain Monte Carlo (MCMC)

algorithm to estimate both the model dimension and parameters for posterior infer-

ence. Besides the MCMC sampler, we also develop an ultra-fast optimization-based

computing algorithm wherein the maximum a posteriori estimators for parameters are

computed, and the model dimension is optimized via a simulated annealing algorithm.

The proposed Bayesian framework provides a natural way for uncertainty quantiﬁca-

tion. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-

on-tensor regression model and show its superior performance compared to alternative

methods. We also demonstrate its practical eﬀectiveness by applying it to two real-

world datasets, including facial imaging data and 3D motion data.

1 Introduction

Multi-dimensional arrays, also called tensors, are widely used to represent data with com-

plex structures in diﬀerent ﬁelds such as genomics, neuroscience, computer vision, and graph

analysis. For example, a multi-tissue experiment (Wang et al., 2019) collects gene expres-

sion data in diﬀerent tissues from diﬀerent individuals, leading to three-dimensional arrays

arXiv:2210.11363v1 [stat.ME] 20 Oct 2022

(Genes×T issues×Individuals). Other notable examples include magnetic resonance imag-

ing data (MRI, three-dimensional arrays), functional MRI (fMRI) data (four-dimensional

arrays), and facial images (four-dimensional arrays)(Vasilescu and Terzopoulos, 2002; Hasan

et al., 2011; Guhaniyogi and Spencer, 2021). In this paper, we focus on the task of tensor-on-

tensor regression that predicts one multi-dimensional tensor from another multi-dimensional

tensor, e.g., predicting gene expression across multiple tissues for multiple individuals from

their clinical/omics data with tensor structures.

One simple approach dealing with tensor-on-tensor regression is to turn tensors into

vectors, and then apply classic regression methods. However, such a treatment introduces

high-dimensional unstructured vectors and destroys the correlation structure of data, re-

sulting in a huge number of parameters to be estimated and potentially signiﬁcant loss of

information. For example, to predict a response tensor of dimensions N×Q1×Q2from

a predictor tensor of dimensions N×P1×P2, the classic linear regression method requires

estimating P1×P2×Q1×Q2parameters, which may cause overﬁtting or computational

issues, especially when the number of parameters is larger than the sample size N.

To reduce the number of free parameters while preserving the correlation structure in

modeling tensor data, tensor decomposition techniques have been widely applied (Kolda

and Bader, 2009). The two most commonly-used tensor decomposition methods are the

PARAFAC/CANDECOMP (CP) decomposition (Harshman, 1970) and Tucker decomposi-

tion (Tucker, 1966). The CP decomposition reconstructs a tensor as a linear combination of

rank-1 tensors, each one of which is represented as the outer product of a number of vectors.

On the other hand, the Tucker decomposition factorizes a tensor into a small core tensor

and a set of matrices along each dimension. Both decomposition methods are able to reduce

model dimensionality to a manageable size and make parameter estimation more eﬃcient.

Compared to CP decomposition, Tucker decomposition allows a more ﬂexible correlation

structure processed by the core tensor and the freedom in choosing diﬀerent orders, making

it useful in estimating data with skewed dimensions (Li et al., 2013). In fact, CP decompo-

sition is a special case of Tucker decomposition with the core tensor being superdiagonal.

There is a rich literature on regression methods treating tensors as either predictors or

responses in both frequentist and Bayesian statistics. Guo et al. (2012) and Zhou et al.

(2013) proposed tensor regression models to predict scalar outcomes from tensor predictors

by assuming that the coeﬃcient tensor has a low rank CP decomposition. Li et al. (2013)

later extended the framework by employing Tucker decomposition for the coeﬃcient tensor,

and demonstrated that Tucker decomposition is more suitable to deal with tensor predictors

of skewed dimensions and gains better accuracy in neuroimaging data analysis. Guhaniyogi

et al. (2017) proposed a Bayesian approach to regression with a scalar response on tensor

predictors by developing a multiway Dirichlet generalized double Pareto prior on tensor

margins after applying CP decomposition to the coeﬃcient tensor. Miranda et al. (2018)

developed a Bayesian tensor partition regression model using a generalized linear model with

a sparse inducing normal mixture prior to learn the relationship between a matrix response

(clinical outcomes) and a tensor predictor (imaging data). Li and Zhang (2017) proposed a

parsimonious regression model with tensor response and vector predictors adopting a gen-

eralized sparsity principle based on Tucker decomposition. To detect neuronal activation

in fMRI experiments, Guhaniyogi and Spencer (2021) developed a Bayesian regression ap-

proach with a tensor response on scalar predictors by introducing a novel multiway stick

breaking shrinkage prior distribution on tensor structured regression coeﬃcients.

There exist many scientiﬁc applications that require methods for predicting a tensor

response from another tensor predictor. One typical example in fMRI studies is to detect

brain regions activated by an external stimulus or condition (Zhang et al., 2015). Hoﬀ

(2015) proposed a tensor-on-tensor bilinear regression framework to handle a special case

where the tensor predictor has the same dimension as the tensor response making use of

Tucker decomposition. Billio et al. (2018) introduced a Bayesian tensor autoregressive model

to tackle tensor-on-tensor regression, and used CP decomposition to provide parsimonious

parametrization. Lock (2018) proposed to predict a tensor response from another tensor

predictor by assuming that the coeﬃcient tensor has a low-rank CP factorization. Gahrooei

et al. (2021) extended the work of Lock (2018) to allow multiple tensor inputs under the

Tucker decomposition framework.

Despite advances in methods development for dealing with tensor data, there are some

limitations in the aforementioned methods. First, tensor-on-tensor regression methods based

on CP decomposition, e.g., Lock (2018), require both the response tensor and the predic-

tor tensor to have the same rank in CP decomposition, making them restrictive when the

response and predictor tensors have diﬀerent ranks. Second, the rank in CP decomposition

and the dimension of the core tensor in Tucker decomposition (i.e., model dimension) are

essential for statistical inference in tensor-on-tensor regression models. However, they are ei-

ther assumed known or estimated via cross-validation (Gahrooei et al., 2021) or some model

selection criteria, such as Bayesian information criterion (Guhaniyogi and Spencer, 2021).

To our best knowledge, there is no existing method that can simultaneously estimate the

model dimension and parameters.

In this paper, we develop a novel Bayesian approach for tensor-on-tensor regression based

on Tucker decomposition of the coeﬃcient tensor. The main contributions of this work are

threefold. First, our Bayesian framework is built upon the ﬂexible Tucker decomposition so

that the response tensor and the predictor tensor can have diﬀerent dimensions in the core

tensor. Second, we propose an eﬃcient Markov chain Monte Carlo (MCMC) algorithm to

simultaneously estimate the model dimension (the dimension of the core tensor) and param-

eters. The resulting posterior inference naturally oﬀers us characterization of uncertainty

in parameter estimation and prediction. Third, as an alternative to MCMC, we develop an

ultra-fast computing algorithm, in which the maximum a posteriori (MAP) estimators for

parameters are computed and meanwhile the dimension of the core tensor is optimized via

a simulated annealing (SA) algorithm (Kirkpatrick et al., 1983).

The rest of the article is organized as follows. We start with introducing some preliminar-

ies in section 2. section 3 describes the proposed Bayesian tensor-on-tensor regression model.

We develop an eﬃcient MCMC algorithm to simultaneously estimate the model dimension

and parameters in section 4. An optimization-based ultra-fast computational algorithm for

inference is described in section 5. section 6 evaluates the proposed approach via simulation

studies and comparisons to alternative methods. section 7 provides real data analyses on

facial imaging data and 3D motion data. section 8 concludes with a discussion.

2 Preliminaries

2.1 Notations

We begin with introducing notations and operations that will be used throughout the pa-

per. We use uppercase blackboard bold characters (X) to denote tensors, bold uppercase

characters (X) to denote matrices, and bold lowercase characters (a) to denote vectors. The

order of a tensor is the number of dimensions. For example, X∈RI1×I2×···×INdenotes an

Nth order tensor, where Indenotes the dimension of the nth mode, n= 1, . . . , N. The ith

entry of a vector ais denoted as ai; the element (i, j) of a matrix Xis denoted as Xij ; and

the entries of a tensor are deﬁned by indices enclosed in square brackets: X[i1,··· ,iN], where

in∈ {1,··· , In}for n∈ {1,···N}. The nth element in a sequence of matrices or vectors

is denoted by a subscript in parenthesis. For example, X(n)denotes the nth matrix in a

sequence of matrices, and x(n)denotes the nth vector in a sequence of vectors.

The vectorization of a tensor X∈RI1×I2×···×INtransforms an Nth order tensor into a

column vector vecXsuch that the entry X[i1,··· ,iN]maps to the jth entry of vecX, that is

X[i1,··· ,iN]=vecXj,(1)

where j= 1 + PN

k=1(ik−1) Qk−1

l=1 Il. Similarly, vecXis used to denote the vectorization

of a matrix X∈RI1×I2when N= 2 in (1). Matricization, also known as unfolding, is

the process of transforming a tensor into a matrix. The mode-nmatricization of a tensor

X∈RI1×I2×···×INis denoted by X(n)∈RIn×Jwhere J=Qk6=nIk. The entry X[i1,··· ,iN]of X

maps to the (in, j) element of the resulting matrix X(n), where

j= 1 +

k=1

k6=n

(ik−1)Jkwith Jk=

k−1

l=1

l6=n

Il.

A more general treatment of the tensor matricization is deﬁned as follows. Let R=

{r1,··· , rL}and C={c1,··· , cM}be two sets of indices such that R∪C ={1,··· , N}

and R ∩ C =∅. Then the matricized tensor can be speciﬁed by X(R×C)∈RJ×K, where

J=Qn∈R Inand K=Qn∈C In.And the entry X[i1,··· ,iN]maps to the (j, k) element of the

matrix X(R×C), that is

X[i1,··· ,iN]=X(R×C)jk ,(2)

where

j= 1 +

l=1 "(irl−1)

l−1

l0=1

Irl0#,

and

k= 1 +

m=1 "(icm−1)

m−1

m0=1

Icm0#.

The Kronecker product of matrices U∈RI×J, and V∈RK×Lis denoted by U⊗Vwith

the detailed deﬁnition and properties shown in 8. The product of a tensor and a matrix in

mode nis deﬁned as the n-mode product. The n-mode product of X∈RI1×I2×···×INwith a ma-

trix U∈RJ×Inis denoted by X×nU, resulting in a new tensor Y∈RI1×···×In−1×J×In+1×···×IN

where the [i1,···in−1, j, in+1,···iN] entry is deﬁned by

Y[i1,··· ,in−1,j,in+1,··· ,iN]=

in=1

X[i1,··· ,iN]Ujin.

An important fact regarding the n-mode product is that given matrices U∈RJ1×In,V∈

RJ2×Imwith m6=n, and tensor X∈RI1×I2×···×IN, then

X×nU×mV= (X×nU)×mV= (X×mV)×nU.

For two tensors X∈RI1×···×IN×P1×···×PL, and Y∈RP1×···×PL×J1×···×JM, the contracted

tensor product hX,YiLis deﬁned as

Z=hX,YiL∈RI1×···×IN×J1×···×JM

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BayesianTensor-on-TensorRegressionwithEcientComputationKunboWangYanxunXu*DepartmentofAppliedMathematicsandStatistics,JohnsHopkinsUniversityAbstractWeproposeaBayesiantensor-on-tensorregressionapproachtopredictamultidi-mensionalarray(tensor)ofarbitrarydimensionsfromanothertensorofarbitrarydi-mensions...

展开>> 收起<<

Bayesian Tensor-on-Tensor Regression with Ecient Computation Kunbo Wang Yanxun Xu.pdf

共37页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bayesian Tensor-on-Tensor Regression with Ecient Computation Kunbo Wang Yanxun Xu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: