Bayesian Tensor-on-Tensor Regression with Ecient Computation Kunbo Wang Yanxun Xu

2025-05-08 0 0 937.77KB 37 页 10玖币
侵权投诉
Bayesian Tensor-on-Tensor Regression with Efficient
Computation
Kunbo Wang Yanxun Xu*
Department of Applied Mathematics and Statistics, Johns Hopkins University
Abstract
We propose a Bayesian tensor-on-tensor regression approach to predict a multidi-
mensional array (tensor) of arbitrary dimensions from another tensor of arbitrary di-
mensions, building upon the Tucker decomposition of the regression coefficient tensor.
Traditional tensor regression methods making use of the Tucker decomposition either
assume the dimension of the core tensor to be known or estimate it via cross-validation
or some model selection criteria. However, no existing method can simultaneously
estimate the model dimension (the dimension of the core tensor) and other model pa-
rameters. To fill this gap, we develop an efficient Markov Chain Monte Carlo (MCMC)
algorithm to estimate both the model dimension and parameters for posterior infer-
ence. Besides the MCMC sampler, we also develop an ultra-fast optimization-based
computing algorithm wherein the maximum a posteriori estimators for parameters are
computed, and the model dimension is optimized via a simulated annealing algorithm.
The proposed Bayesian framework provides a natural way for uncertainty quantifica-
tion. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-
on-tensor regression model and show its superior performance compared to alternative
methods. We also demonstrate its practical effectiveness by applying it to two real-
world datasets, including facial imaging data and 3D motion data.
1 Introduction
Multi-dimensional arrays, also called tensors, are widely used to represent data with com-
plex structures in different fields such as genomics, neuroscience, computer vision, and graph
analysis. For example, a multi-tissue experiment (Wang et al., 2019) collects gene expres-
sion data in different tissues from different individuals, leading to three-dimensional arrays
1
arXiv:2210.11363v1 [stat.ME] 20 Oct 2022
(Genes×T issues×Individuals). Other notable examples include magnetic resonance imag-
ing data (MRI, three-dimensional arrays), functional MRI (fMRI) data (four-dimensional
arrays), and facial images (four-dimensional arrays)(Vasilescu and Terzopoulos, 2002; Hasan
et al., 2011; Guhaniyogi and Spencer, 2021). In this paper, we focus on the task of tensor-on-
tensor regression that predicts one multi-dimensional tensor from another multi-dimensional
tensor, e.g., predicting gene expression across multiple tissues for multiple individuals from
their clinical/omics data with tensor structures.
One simple approach dealing with tensor-on-tensor regression is to turn tensors into
vectors, and then apply classic regression methods. However, such a treatment introduces
high-dimensional unstructured vectors and destroys the correlation structure of data, re-
sulting in a huge number of parameters to be estimated and potentially significant loss of
information. For example, to predict a response tensor of dimensions N×Q1×Q2from
a predictor tensor of dimensions N×P1×P2, the classic linear regression method requires
estimating P1×P2×Q1×Q2parameters, which may cause overfitting or computational
issues, especially when the number of parameters is larger than the sample size N.
To reduce the number of free parameters while preserving the correlation structure in
modeling tensor data, tensor decomposition techniques have been widely applied (Kolda
and Bader, 2009). The two most commonly-used tensor decomposition methods are the
PARAFAC/CANDECOMP (CP) decomposition (Harshman, 1970) and Tucker decomposi-
tion (Tucker, 1966). The CP decomposition reconstructs a tensor as a linear combination of
rank-1 tensors, each one of which is represented as the outer product of a number of vectors.
On the other hand, the Tucker decomposition factorizes a tensor into a small core tensor
and a set of matrices along each dimension. Both decomposition methods are able to reduce
model dimensionality to a manageable size and make parameter estimation more efficient.
Compared to CP decomposition, Tucker decomposition allows a more flexible correlation
structure processed by the core tensor and the freedom in choosing different orders, making
it useful in estimating data with skewed dimensions (Li et al., 2013). In fact, CP decompo-
sition is a special case of Tucker decomposition with the core tensor being superdiagonal.
There is a rich literature on regression methods treating tensors as either predictors or
responses in both frequentist and Bayesian statistics. Guo et al. (2012) and Zhou et al.
(2013) proposed tensor regression models to predict scalar outcomes from tensor predictors
by assuming that the coefficient tensor has a low rank CP decomposition. Li et al. (2013)
later extended the framework by employing Tucker decomposition for the coefficient tensor,
and demonstrated that Tucker decomposition is more suitable to deal with tensor predictors
of skewed dimensions and gains better accuracy in neuroimaging data analysis. Guhaniyogi
et al. (2017) proposed a Bayesian approach to regression with a scalar response on tensor
2
predictors by developing a multiway Dirichlet generalized double Pareto prior on tensor
margins after applying CP decomposition to the coefficient tensor. Miranda et al. (2018)
developed a Bayesian tensor partition regression model using a generalized linear model with
a sparse inducing normal mixture prior to learn the relationship between a matrix response
(clinical outcomes) and a tensor predictor (imaging data). Li and Zhang (2017) proposed a
parsimonious regression model with tensor response and vector predictors adopting a gen-
eralized sparsity principle based on Tucker decomposition. To detect neuronal activation
in fMRI experiments, Guhaniyogi and Spencer (2021) developed a Bayesian regression ap-
proach with a tensor response on scalar predictors by introducing a novel multiway stick
breaking shrinkage prior distribution on tensor structured regression coefficients.
There exist many scientific applications that require methods for predicting a tensor
response from another tensor predictor. One typical example in fMRI studies is to detect
brain regions activated by an external stimulus or condition (Zhang et al., 2015). Hoff
(2015) proposed a tensor-on-tensor bilinear regression framework to handle a special case
where the tensor predictor has the same dimension as the tensor response making use of
Tucker decomposition. Billio et al. (2018) introduced a Bayesian tensor autoregressive model
to tackle tensor-on-tensor regression, and used CP decomposition to provide parsimonious
parametrization. Lock (2018) proposed to predict a tensor response from another tensor
predictor by assuming that the coefficient tensor has a low-rank CP factorization. Gahrooei
et al. (2021) extended the work of Lock (2018) to allow multiple tensor inputs under the
Tucker decomposition framework.
Despite advances in methods development for dealing with tensor data, there are some
limitations in the aforementioned methods. First, tensor-on-tensor regression methods based
on CP decomposition, e.g., Lock (2018), require both the response tensor and the predic-
tor tensor to have the same rank in CP decomposition, making them restrictive when the
response and predictor tensors have different ranks. Second, the rank in CP decomposition
and the dimension of the core tensor in Tucker decomposition (i.e., model dimension) are
essential for statistical inference in tensor-on-tensor regression models. However, they are ei-
ther assumed known or estimated via cross-validation (Gahrooei et al., 2021) or some model
selection criteria, such as Bayesian information criterion (Guhaniyogi and Spencer, 2021).
To our best knowledge, there is no existing method that can simultaneously estimate the
model dimension and parameters.
In this paper, we develop a novel Bayesian approach for tensor-on-tensor regression based
on Tucker decomposition of the coefficient tensor. The main contributions of this work are
threefold. First, our Bayesian framework is built upon the flexible Tucker decomposition so
that the response tensor and the predictor tensor can have different dimensions in the core
3
tensor. Second, we propose an efficient Markov chain Monte Carlo (MCMC) algorithm to
simultaneously estimate the model dimension (the dimension of the core tensor) and param-
eters. The resulting posterior inference naturally offers us characterization of uncertainty
in parameter estimation and prediction. Third, as an alternative to MCMC, we develop an
ultra-fast computing algorithm, in which the maximum a posteriori (MAP) estimators for
parameters are computed and meanwhile the dimension of the core tensor is optimized via
a simulated annealing (SA) algorithm (Kirkpatrick et al., 1983).
The rest of the article is organized as follows. We start with introducing some preliminar-
ies in section 2. section 3 describes the proposed Bayesian tensor-on-tensor regression model.
We develop an efficient MCMC algorithm to simultaneously estimate the model dimension
and parameters in section 4. An optimization-based ultra-fast computational algorithm for
inference is described in section 5. section 6 evaluates the proposed approach via simulation
studies and comparisons to alternative methods. section 7 provides real data analyses on
facial imaging data and 3D motion data. section 8 concludes with a discussion.
2 Preliminaries
2.1 Notations
We begin with introducing notations and operations that will be used throughout the pa-
per. We use uppercase blackboard bold characters (X) to denote tensors, bold uppercase
characters (X) to denote matrices, and bold lowercase characters (a) to denote vectors. The
order of a tensor is the number of dimensions. For example, XRI1×I2×···×INdenotes an
Nth order tensor, where Indenotes the dimension of the nth mode, n= 1, . . . , N. The ith
entry of a vector ais denoted as ai; the element (i, j) of a matrix Xis denoted as Xij ; and
the entries of a tensor are defined by indices enclosed in square brackets: X[i1,··· ,iN], where
in∈ {1,··· , In}for n∈ {1,···N}. The nth element in a sequence of matrices or vectors
is denoted by a subscript in parenthesis. For example, X(n)denotes the nth matrix in a
sequence of matrices, and x(n)denotes the nth vector in a sequence of vectors.
The vectorization of a tensor XRI1×I2×···×INtransforms an Nth order tensor into a
column vector vecXsuch that the entry X[i1,··· ,iN]maps to the jth entry of vecX, that is
X[i1,··· ,iN]=vecXj,(1)
where j= 1 + PN
k=1(ik1) Qk1
l=1 Il. Similarly, vecXis used to denote the vectorization
of a matrix XRI1×I2when N= 2 in (1). Matricization, also known as unfolding, is
the process of transforming a tensor into a matrix. The mode-nmatricization of a tensor
4
XRI1×I2×···×INis denoted by X(n)RIn×Jwhere J=Qk6=nIk. The entry X[i1,··· ,iN]of X
maps to the (in, j) element of the resulting matrix X(n), where
j= 1 +
N
X
k=1
k6=n
(ik1)Jkwith Jk=
k1
Y
l=1
l6=n
Il.
A more general treatment of the tensor matricization is defined as follows. Let R=
{r1,··· , rL}and C={c1,··· , cM}be two sets of indices such that R∪C ={1,··· , N}
and R ∩ C =. Then the matricized tensor can be specified by X(R×C)RJ×K, where
J=Qn∈R Inand K=Qn∈C In.And the entry X[i1,··· ,iN]maps to the (j, k) element of the
matrix X(R×C), that is
X[i1,··· ,iN]=X(R×C)jk ,(2)
where
j= 1 +
L
X
l=1 "(irl1)
l1
Y
l0=1
Irl0#,
and
k= 1 +
M
X
m=1 "(icm1)
m1
Y
m0=1
Icm0#.
The Kronecker product of matrices URI×J, and VRK×Lis denoted by UVwith
the detailed definition and properties shown in 8. The product of a tensor and a matrix in
mode nis defined as the n-mode product. The n-mode product of XRI1×I2×···×INwith a ma-
trix URJ×Inis denoted by X×nU, resulting in a new tensor YRI1×···×In1×J×In+1×···×IN
where the [i1,···in1, j, in+1,···iN] entry is defined by
Y[i1,··· ,in1,j,in+1,··· ,iN]=
In
X
in=1
X[i1,··· ,iN]Ujin.
An important fact regarding the n-mode product is that given matrices URJ1×In,V
RJ2×Imwith m6=n, and tensor XRI1×I2×···×IN, then
X×nU×mV= (X×nU)×mV= (X×mV)×nU.
For two tensors XRI1×···×IN×P1×···×PL, and YRP1×···×PL×J1×···×JM, the contracted
tensor product hX,YiLis defined as
Z=hX,YiLRI1×···×IN×J1×···×JM
5
摘要:

BayesianTensor-on-TensorRegressionwithEcientComputationKunboWangYanxunXu*DepartmentofAppliedMathematicsandStatistics,JohnsHopkinsUniversityAbstractWeproposeaBayesiantensor-on-tensorregressionapproachtopredictamultidi-mensionalarray(tensor)ofarbitrarydimensionsfromanothertensorofarbitrarydi-mensions...

展开>> 收起<<
Bayesian Tensor-on-Tensor Regression with Ecient Computation Kunbo Wang Yanxun Xu.pdf

共37页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:37 页 大小:937.77KB 格式:PDF 时间:2025-05-08

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 37
客服
关注