Sampling-Based Decomposition Algorithms for Arbitrary Tensor Networks Osman Asif MalikVivek BharadwajRiley Murray

2025-05-03 0 0 626.81KB 20 页 10玖币
侵权投诉
Sampling-Based Decomposition Algorithms for Arbitrary
Tensor Networks
Osman Asif MalikVivek BharadwajRiley Murray
Abstract
We show how to develop sampling-based alternating least squares (ALS) algorithms for
decomposition of tensors into any tensor network (TN) format. Provided the TN format
satisfies certain mild assumptions, resulting algorithms will have input sublinear per-iteration
cost. Unlike most previous works on sampling-based ALS methods for tensor decomposition,
the sampling in our framework is done according to the exact leverage score distribution of the
design matrices in the ALS subproblems. We implement and test two tensor decomposition
algorithms that use our sampling framework in a feature extraction experiment where we
compare them against a number of other decomposition algorithms.
1 Introduction
Tensor decomposition has emerged as an important tool in data mining and machine learning
[24,4,5,11]. Applications in data mining include network analysis, web mining, topic modeling
and recommendation systems. Tensor decomposition is used widely in machine learning for things
like parameter reduction in neural networks, understanding of deep neural network expressiveness,
supervised learning and feature extraction.
Due to the multidimensional nature of tensors, they are inherently plagued by the curse of di-
mensionality. For example, representing a tensor XRI×···×Iwith Nmodes requires INnumbers.
This exponential dependence on Nmakes its way into algorithms for computing tensor decompo-
sitions. Alternating least squares (ALS) is arguably the most popular and successful approach for
computing a wide range of tensor decompositions. When decomposing a tensor X, each iteration
of ALS involves solving a sequence of least squares problems for which the entries of Xfeature as
the dependent variables. The per-iteration cost of ALS therefore naturally inherits the exponential
dependence on N.
A large number of papers have sought to reduce the cost of tensor decomposition by leveraging
techniques from randomized numerical linear algebra. One particularly interesting line of work
[3,14,9,19,18] seeks to construct ALS algorithms with a per-iteration cost which is sublinear
in the number of tensor entries, i.e., o(IN). Since any algorithm considering all entries of X
immediately incurs a cost of Ω(IN), these works have all resorted to sampling-based techniques.
Lawrence Berkeley National Laboratory, oamalik@lbl.gov
UC Berkeley, vivek bharadwaj@berkeley.edu
UC Berkeley, rjmurray@berkeley.edu
1
arXiv:2210.03828v1 [math.NA] 7 Oct 2022
More precisely, they all sample the ALS subproblems according to the leverage score distribution
(or an approximation thereof). This is done efficiently by taking advantage of the special structure
of the design matrices in the ALS subproblems for tensor decomposition.
The previous works discussed above develop methods for specific tensor decompositions: The
CP decomposition in [3,14,18], Tucker decomposition in [9], and the tensor ring decomposition
in [19,18]. In this paper, we consider all decompositions that can be expressed in tensor network
(TN) format. The TN format allows for a very wide range of decompositions, including the CP,
Tucker, tensor train and tensor ring decompositions. The following summarizes our contributions:
We first show how to efficiently sample rows of any tall-and-skinny matrix which is in TN
format according to the exact leverage score distribution.
We then show how this sampling technique can be used to yield ALS algorithms with a per-
iteration cost which is input sublinear for all TN decompositions that satisfy certain mild
assumptions.
The decomposition framework we present builds on the work in [18]. That paper is notable since
it provided the first sampling-based ALS methods for CP and tensor ring decomposition with a
per-iteration cost depending polynomially on N; the dependence in earlier works was exponential [3,
14,19]. We are able to substantially simplify the scheme in [18] by entirely avoiding the complicated
recursive sketching procedure that it relies on. This makes implementation easier and is also what
ensures that the sampling is done according to the exact leverage score distribution rather than an
approximation of it. The simplification is also what paves the way for generalization to arbitrary
TN formats.
2 Related Work
Cheng et al. [3] develop the first ALS method for CP decomposition with an input sublinear per-
iteration cost. Their method uses a mixture of leverage score and row-norm sampling applied to
the matricized-tensor-times-Khatri–Rao product (MTTKRP) which arises as a key computational
kernel in CP decomposition. Larsen and Kolda [14] use leverage score sampling to reduce the size
of each ALS subproblem. Their method has improved theoretical guarantees compared to those in
[3] as well as improved scalability due to various practical improvements. Malik and Becker [19]
propose a sampling-based ALS method for tensor ring decomposition which also uses leverage score
sampling to reduce the size of the ALS subproblems.
All three papers [3,14,19] require a number of samples which scales exponentially with the
number of input tensor modes, N, for performance guarantees to hold. This translates into a per-
iteration cost which is Ω(RN+1) for the CP decomposition [3,14] and Ω(R2N+2) for the tensor ring
decomposition [19], where Ris the relevant notion of rank. For low-rank decompositions (RI),
the cost will be input sublinear despite this exponential cost, but for higher rank it might not be
(recall that, unlike in matrix decomposition, we can have R > I in tensor decomposition). Malik
[18] develop methods for both CP and tensor ring decomposition which avoid this exponential
dependence on N, instead improving it to a polynomial dependence. This is achieved by sampling
from a distribution much closer to the exact leverage score distribution than the previous works
[3,14,19] do. Since our work builds on and improves the scheme in [18], it also has a polynomial
dependence on Nwhen used for CP and tensor ring decomposition (see Tables 1and 2).
2
Fahrbach et al. [9] develop a method for efficient sampling of ridge regression problems involving
Kronecker product design matrices. They use these techniques to achieve an efficient sampling-based
ALS method for regularized Tucker decomposition.
Efficient sketching of structured matrices is an active research area with applications beyond
tensor decomposition. The recent work by Ma and Solomonik [16] is particularly interesting since
it proposes structured sketching operators for general TN matrices. These operators take the
form of TNs made up of Gaussian tensors and are therefore dense. When used in ALS for tensor
decomposition, their sketching operators therefore need to access all entries of the data tensor in
each least squares solve which leads to a per-iteration cost which is at least linear in the input
tensor size (and therefore exponential in N).
Due to space constraints, we have focused on previous works that develop sampling-based ALS
methods here. There is a large number of works that develop randomized tensor decomposition
methods using other techniques. For a comprehensive list of such works, see [18, Sec. 2]. For an
overview of work on structured sketching, see [20, Sec. 1.2].
3 Preliminaries
3.1 General Notation
By tensor, we mean a multidimensional array containing real numbers. We refer to a tensor with
Nindices as an N-way tensor and we say that it is of order Nor that it has Nmodes. We use
bold uppercase Euler script letters (e.g., X) to denote tensors of order 3 or greater, bold uppercase
letters (e.g., X) to denote matrices, bold lowercase letters (e.g., x) to denote vectors, and regular
lowercase letters to denote scalars (e.g., x). We denote entries of an object either in parentheses
or with subscripts. For example, X(i, j, k) = Xijk is the entry on position (i, j, k) in the 3-way
tensor X, and x(i) = xiis the ith entry in the vector x. We use a colon to denote all entries
along a particular mode. For example, X(i, :) denotes the ith row of the matrix X. Superscripts
in parentheses will be used to denote a sequence of objects. For example, A(1),...,A(M)is a
sequence of tensors and e(i)is the ith canonical basis vector. The values that a tensor’s indices can
take are referred to as dimensions. For example, if X= (Xijk)I,J,K
i=1,j=1,k=1 then the dimensions of
modes 1, 2 and 3 are I,Jand Krespectively. Note that dimension does not refer to the number of
modes, which is 3 for X. For a positive integer I, we use the notation [I]def
={1, . . . , I}. For indices
i1[I1], . . . , iN[IN], the notation i1· · · iN
def
= 1 + PN
n=1(in1) Qn1
j=1 Ijwill be used to denote
the linear index corresponding to the multi-index (i1, . . . , iN). The pseudoinverse of a matrix Ais
denoted by A+. We use ,and ~to denote the Kronecker, Khatri–Rao and Hadamard matrix
products which are defined in Section A.
3.2 Tensor Networks
Atensor network (TN) consists of tensors and connections between them that indicate how they
should be contracted with each other. Since the mathematical notation easily gets unwieldy when
working with TNs, it is common practice to use graphical representations when discussing such
networks. Figure 1shows how scalars, vectors, matrices and tensors can be represented graphically.
A circle or node is used to represent a tensor, and dangling edges are used to indicate modes of the
tensor.
3
(b)(a) (c) (d)
Figure 1: Graphical TN representation of a (a) scalar, (b) vector, (c) matrix and (d) 4-way tensor.
Contraction between two tensors or a tensor with itself can be represented by connecting the
appropriate dangling edges. This is illustrated in Figure 2. In mathematical notation, these con-
tractions can be written elementwise as
(a) PiAii = trace(A) = c,
(b) PjAij xj=yi,
(c) PjAij Bjk =Cik,
(d) P`mn X`mnAi`BjmCkn =Yijk .
(b)
(a)
(c)
(d)
Figure 2: TN representation of (a) matrix trace, (b) matrix-vector multiplication, (c) matrix-
matrix multiplication and (d) a 3-way Tucker decomposition.
To reduce computational cost when contracting a TN, it is optimal to contract two tensors at
a time [26]. Determining the optimal contraction order is NP-hard, and developing heuristics for
this problem is an active research area [26,21]. There are several software packages that compute
exact or approximate optimal contraction orders, e.g., NCON [25] for Matlab and the opt_einsum
package for Python1.
1Available at https://github.com/dgasmith/opt_einsum.
4
摘要:

Sampling-BasedDecompositionAlgorithmsforArbitraryTensorNetworksOsmanAsifMalik*VivekBharadwaj„RileyMurray…AbstractWeshowhowtodevelopsampling-basedalternatingleastsquares(ALS)algorithmsfordecompositionoftensorsintoanytensornetwork(TN)format.ProvidedtheTNformatsatis escertainmildassumptions,resultingal...

展开>> 收起<<
Sampling-Based Decomposition Algorithms for Arbitrary Tensor Networks Osman Asif MalikVivek BharadwajRiley Murray.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:626.81KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注