Sampling-Based Decomposition Algorithms for Arbitrary Tensor Networks Osman Asif MalikVivek BharadwajRiley Murray

2025-05-03 0 0 626.81KB 20 页 10玖币

侵权投诉

Sampling-Based Decomposition Algorithms for Arbitrary

Tensor Networks

Osman Asif Malik∗Vivek Bharadwaj†Riley Murray‡

Abstract

We show how to develop sampling-based alternating least squares (ALS) algorithms for

decomposition of tensors into any tensor network (TN) format. Provided the TN format

satisﬁes certain mild assumptions, resulting algorithms will have input sublinear per-iteration

cost. Unlike most previous works on sampling-based ALS methods for tensor decomposition,

the sampling in our framework is done according to the exact leverage score distribution of the

design matrices in the ALS subproblems. We implement and test two tensor decomposition

algorithms that use our sampling framework in a feature extraction experiment where we

compare them against a number of other decomposition algorithms.

1 Introduction

Tensor decomposition has emerged as an important tool in data mining and machine learning

[24,4,5,11]. Applications in data mining include network analysis, web mining, topic modeling

and recommendation systems. Tensor decomposition is used widely in machine learning for things

like parameter reduction in neural networks, understanding of deep neural network expressiveness,

supervised learning and feature extraction.

Due to the multidimensional nature of tensors, they are inherently plagued by the curse of di-

mensionality. For example, representing a tensor X∈RI×···×Iwith Nmodes requires INnumbers.

This exponential dependence on Nmakes its way into algorithms for computing tensor decompo-

sitions. Alternating least squares (ALS) is arguably the most popular and successful approach for

computing a wide range of tensor decompositions. When decomposing a tensor X, each iteration

of ALS involves solving a sequence of least squares problems for which the entries of Xfeature as

the dependent variables. The per-iteration cost of ALS therefore naturally inherits the exponential

dependence on N.

A large number of papers have sought to reduce the cost of tensor decomposition by leveraging

techniques from randomized numerical linear algebra. One particularly interesting line of work

[3,14,9,19,18] seeks to construct ALS algorithms with a per-iteration cost which is sublinear

in the number of tensor entries, i.e., o(IN). Since any algorithm considering all entries of X

immediately incurs a cost of Ω(IN), these works have all resorted to sampling-based techniques.

∗Lawrence Berkeley National Laboratory, oamalik@lbl.gov

†UC Berkeley, vivek bharadwaj@berkeley.edu

‡UC Berkeley, rjmurray@berkeley.edu

arXiv:2210.03828v1 [math.NA] 7 Oct 2022

More precisely, they all sample the ALS subproblems according to the leverage score distribution

(or an approximation thereof). This is done eﬃciently by taking advantage of the special structure

of the design matrices in the ALS subproblems for tensor decomposition.

The previous works discussed above develop methods for speciﬁc tensor decompositions: The

CP decomposition in [3,14,18], Tucker decomposition in [9], and the tensor ring decomposition

in [19,18]. In this paper, we consider all decompositions that can be expressed in tensor network

(TN) format. The TN format allows for a very wide range of decompositions, including the CP,

Tucker, tensor train and tensor ring decompositions. The following summarizes our contributions:

•We ﬁrst show how to eﬃciently sample rows of any tall-and-skinny matrix which is in TN

format according to the exact leverage score distribution.

•We then show how this sampling technique can be used to yield ALS algorithms with a per-

iteration cost which is input sublinear for all TN decompositions that satisfy certain mild

assumptions.

The decomposition framework we present builds on the work in [18]. That paper is notable since

it provided the ﬁrst sampling-based ALS methods for CP and tensor ring decomposition with a

per-iteration cost depending polynomially on N; the dependence in earlier works was exponential [3,

14,19]. We are able to substantially simplify the scheme in [18] by entirely avoiding the complicated

recursive sketching procedure that it relies on. This makes implementation easier and is also what

ensures that the sampling is done according to the exact leverage score distribution rather than an

approximation of it. The simpliﬁcation is also what paves the way for generalization to arbitrary

TN formats.

2 Related Work

Cheng et al. [3] develop the ﬁrst ALS method for CP decomposition with an input sublinear per-

iteration cost. Their method uses a mixture of leverage score and row-norm sampling applied to

the matricized-tensor-times-Khatri–Rao product (MTTKRP) which arises as a key computational

kernel in CP decomposition. Larsen and Kolda [14] use leverage score sampling to reduce the size

of each ALS subproblem. Their method has improved theoretical guarantees compared to those in

[3] as well as improved scalability due to various practical improvements. Malik and Becker [19]

propose a sampling-based ALS method for tensor ring decomposition which also uses leverage score

sampling to reduce the size of the ALS subproblems.

All three papers [3,14,19] require a number of samples which scales exponentially with the

number of input tensor modes, N, for performance guarantees to hold. This translates into a per-

iteration cost which is Ω(RN+1) for the CP decomposition [3,14] and Ω(R2N+2) for the tensor ring

decomposition [19], where Ris the relevant notion of rank. For low-rank decompositions (RI),

the cost will be input sublinear despite this exponential cost, but for higher rank it might not be

(recall that, unlike in matrix decomposition, we can have R > I in tensor decomposition). Malik

[18] develop methods for both CP and tensor ring decomposition which avoid this exponential

dependence on N, instead improving it to a polynomial dependence. This is achieved by sampling

from a distribution much closer to the exact leverage score distribution than the previous works

[3,14,19] do. Since our work builds on and improves the scheme in [18], it also has a polynomial

dependence on Nwhen used for CP and tensor ring decomposition (see Tables 1and 2).

Fahrbach et al. [9] develop a method for eﬃcient sampling of ridge regression problems involving

Kronecker product design matrices. They use these techniques to achieve an eﬃcient sampling-based

ALS method for regularized Tucker decomposition.

Eﬃcient sketching of structured matrices is an active research area with applications beyond

tensor decomposition. The recent work by Ma and Solomonik [16] is particularly interesting since

it proposes structured sketching operators for general TN matrices. These operators take the

form of TNs made up of Gaussian tensors and are therefore dense. When used in ALS for tensor

decomposition, their sketching operators therefore need to access all entries of the data tensor in

each least squares solve which leads to a per-iteration cost which is at least linear in the input

tensor size (and therefore exponential in N).

Due to space constraints, we have focused on previous works that develop sampling-based ALS

methods here. There is a large number of works that develop randomized tensor decomposition

methods using other techniques. For a comprehensive list of such works, see [18, Sec. 2]. For an

overview of work on structured sketching, see [20, Sec. 1.2].

3 Preliminaries

3.1 General Notation

By tensor, we mean a multidimensional array containing real numbers. We refer to a tensor with

Nindices as an N-way tensor and we say that it is of order Nor that it has Nmodes. We use

bold uppercase Euler script letters (e.g., X) to denote tensors of order 3 or greater, bold uppercase

letters (e.g., X) to denote matrices, bold lowercase letters (e.g., x) to denote vectors, and regular

lowercase letters to denote scalars (e.g., x). We denote entries of an object either in parentheses

or with subscripts. For example, X(i, j, k) = Xijk is the entry on position (i, j, k) in the 3-way

tensor X, and x(i) = xiis the ith entry in the vector x. We use a colon to denote all entries

along a particular mode. For example, X(i, :) denotes the ith row of the matrix X. Superscripts

in parentheses will be used to denote a sequence of objects. For example, A(1),...,A(M)is a

sequence of tensors and e(i)is the ith canonical basis vector. The values that a tensor’s indices can

take are referred to as dimensions. For example, if X= (Xijk)I,J,K

i=1,j=1,k=1 then the dimensions of

modes 1, 2 and 3 are I,Jand Krespectively. Note that dimension does not refer to the number of

modes, which is 3 for X. For a positive integer I, we use the notation [I]def

={1, . . . , I}. For indices

i1∈[I1], . . . , iN∈[IN], the notation i1· · · iN

def

= 1 + PN

n=1(in−1) Qn−1

j=1 Ijwill be used to denote

the linear index corresponding to the multi-index (i1, . . . , iN). The pseudoinverse of a matrix Ais

denoted by A+. We use ⊗,and ~to denote the Kronecker, Khatri–Rao and Hadamard matrix

products which are deﬁned in Section A.

3.2 Tensor Networks

Atensor network (TN) consists of tensors and connections between them that indicate how they

should be contracted with each other. Since the mathematical notation easily gets unwieldy when

working with TNs, it is common practice to use graphical representations when discussing such

networks. Figure 1shows how scalars, vectors, matrices and tensors can be represented graphically.

A circle or node is used to represent a tensor, and dangling edges are used to indicate modes of the

tensor.

(b)(a) (c) (d)

Figure 1: Graphical TN representation of a (a) scalar, (b) vector, (c) matrix and (d) 4-way tensor.

Contraction between two tensors or a tensor with itself can be represented by connecting the

appropriate dangling edges. This is illustrated in Figure 2. In mathematical notation, these con-

tractions can be written elementwise as

(a) PiAii = trace(A) = c,

(b) PjAij xj=yi,

(d) P`mn X`mnAi`BjmCkn =Yijk .

(b)

(a)

(c)

(d)

Figure 2: TN representation of (a) matrix trace, (b) matrix-vector multiplication, (c) matrix-

matrix multiplication and (d) a 3-way Tucker decomposition.

To reduce computational cost when contracting a TN, it is optimal to contract two tensors at

a time [26]. Determining the optimal contraction order is NP-hard, and developing heuristics for

this problem is an active research area [26,21]. There are several software packages that compute

exact or approximate optimal contraction orders, e.g., NCON [25] for Matlab and the opt_einsum

package for Python1.

1Available at https://github.com/dgasmith/opt_einsum.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Sampling-BasedDecompositionAlgorithmsforArbitraryTensorNetworksOsmanAsifMalik*VivekBharadwajRileyMurrayAbstractWeshowhowtodevelopsampling-basedalternatingleastsquares(ALS)algorithmsfordecompositionoftensorsintoanytensornetwork(TN)format.ProvidedtheTNformatsatisescertainmildassumptions,resultingal...

展开>> 收起<<

Sampling-Based Decomposition Algorithms for Arbitrary Tensor Networks Osman Asif MalikVivek BharadwajRiley Murray.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Sampling-Based Decomposition Algorithms for Arbitrary Tensor Networks Osman Asif MalikVivek BharadwajRiley Murray

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: