1 Minimizing low-rank models of high-order tensors Hardness span tight relaxation and applications

2025-04-30 0 0 6.22MB 14 页 10玖币

侵权投诉

Minimizing low-rank models of high-order tensors:

Hardness, span, tight relaxation, and applications

Nicholas D. Sidiropoulos, Fellow, IEEE, Paris A. Karakasis, Student Member, IEEE, Aritra Konar, Member, IEEE

Abstract—We consider the problem of ﬁnding the smallest or

largest entry of a tensor of order Nthat is speciﬁed via its

rank decomposition. Stated in a different way, we are given N

sets of R-dimensional vectors and we wish to select one vector

from each set such that the sum of the Hadamard product

of the selected vectors is minimized or maximized. We show

that this fundamental tensor problem is NP-hard for any tensor

rank higher than one, and polynomial-time solvable in the rank-

one case. We also propose a continuous relaxation and prove

that it is tight for any rank. For low-enough ranks, the pro-

posed continuous reformulation is amenable to low-complexity

gradient-based optimization, and we propose a suite of gradient-

based optimization algorithms drawing from projected gradient

descent, Frank-Wolfe, or explicit parametrization of the relaxed

constraints. We also show that our core results remain valid no

matter what kind of polyadic tensor model is used to represent

the tensor of interest, including Tucker, HOSVD/MLSVD, tensor

train, or tensor ring. Next, we consider the class of problems

that can be posed as special instances of the problem of interest.

We show that this class includes the partition problem (and

thus all NP-complete problems via polynomial-time transforma-

tion), integer least squares, integer linear programming, integer

quadratic programming, sign retrieval (a special kind of mixed

integer programming / restricted version of phase retrieval),

and maximum likelihood decoding of parity check codes. We

demonstrate promising experimental results on a number of

hard problems, including state-of-art performance in decoding

low density parity check codes and general parity check codes.

Index Terms—Tensors, rank decomposition, inner product,

algorithms, complexity theory, NP-hard problems, error control

coding, maximum likelihood decoding, parity check codes, be-

lief propagation, under/over-determined linear equations, Galois

ﬁelds, sign retrieval.

I. INTRODUCTION

Finding the smallest or largest entry of a matrix or tensor

speciﬁed via its low-rank factors is a fundamental problem

with numerous applications in science and engineering. The

matrix version of the problem has received some attention

[1], [2], motivated by applications in graph mining (e.g., most

signiﬁcant missing link prediction, nodes that share many

neighbors), text/speech/audio similarity search and retrieval

(e.g., using text embeddings), and recommender systems (e.g.,

Manuscript received May 30, 2023; revised Oct. 19, 2023; accepted Nov.

21, 2023. The Associate Editor coordinating the review was Dr. Yuxin Chen.

This research was not supported by any sponsor. The authors believed it

is worth pursuing. N. Sidiropoulos and P. Karakasis are with the Dept.

of ECE, University of Virginia, Charlottesville, VA, 22904 USA; e-mail:

(nikos,karakasis)@virginia.edu A. Konar is with the Dept. of EE, KU Leuven,

Belgium; e-mail: aritra.konar@kuleuven.be.

This paper has supplementary downloadable material available at

http://ieeexplore.ieee.org, provided by the author. The material includes soft-

ware for reproducing the linear parity check decoding experiments. This

material is 806 KB in size.

ﬁnding the best item-context combination to recommend to a

given user).

The tensor version of the problem is considerably more

powerful, as it allows going beyond bipartite matching / pre-

diction which can be broadly useful in knowledge discovery.

As an example, given embeddings of patients, conditions,

drugs, clinical trials, therapies, we may be interested in ﬁnding

the best match, as measured by a function of the pointwise

product of the respective embeddings. Tensors can also be used

to model high-dimensional probability distributions, wherein

low-rank tensor models are used to break the curse of dimen-

sionality while allowing easy marginalization and computation

of conditional probabilities, which are crucial for prediction

[3]. In this context, ﬁnding the largest element corresponds

to ﬁnding the mode / maximum likelihood or maximum a-

posteriori estimate of missing variables, and it is not unusual

to encounter very high-order probability tensors indexed by

hundreds of categorical input variables. Despite the obvious

importance of the tensor version of this problem, there is

scant literature about it - we found only [4], which essentially

extends the approach in [2] to the low-order tensor case.

A. Prior work

The matrix case was considered in [1], which proposed us-

ing a power-method type algorithm that works directly with the

low-rank factors. In [2], the authors developed a randomized

“diamond sampling” approach for computing the maximum

element of the product of two matrices (which could be, e.g.,

two low-rank factors) in what they called the MAD (Maximum

All-pairs Dot-product) search. Their algorithm comes with

probabilistic performance guarantees and was demonstrated

to work well in practice using a variety of datasets. As

already mentioned above, [4] extends the randomized diamond

sampling approach in [2] to “star sampling” for the tensor case.

These randomized algorithms do not scale well to high-order

tensors, owing to the curse of dimensionality.

A very different approach for the higher-order tensor version

of the problem has been proposed in the computational physics

/ numerical algebra literature [5]. The basic idea of [5] (see

also references therein) is as follows. By vectorizing the

tensor and putting the resulting (very long) vector on the

diagonal of a matrix, the tensor elements become eigenvalues

corresponding to coordinate basis eigenvectors. This suggests

that the maximum element of the tensor can be computed

through a power iteration involving this very large matrix.

Of course power iterations implemented naively will have

prohibitive complexity (as tensor vectorization produces a very

arXiv:2210.11413v3 [eess.SP] 21 Dec 2023

long vector and thus a huge matrix). The idea is therefore to

employ a tensor factorization model to ease the matrix-vector

multiplication of the diagonal matrix and the interim solution

vector. This multiplication can be computed by summing the

elements of the pointwise (Hadamard) product of the two

vectors – the vectorized tensor on the matrix diagonal and

the interim solution vector. This product can be efﬁciently

computed for various tensor models, but the pointwise multi-

plication of two tensors of rank Rgenerally has rank up to R2,

necessitating conversion back to a rank-Rtensor. The natural

way to do this is via rank-Rleast-squares approximation of

the higher-rank product. This is a hard and generally ill-posed

computational problem, because the best approximation may

not even exist. Thus the method cannot be used with a tensor

rank decomposition model (known as Canonical Polyadic

Decomposition or CPD). The pointwise multiplication and

least-squares approximation are easier with a so-called tensor

train (TT) model. The Hadamard product of two TT models

is another TT model whose wagon (factor) matrices are the

Kronecker products of the respective factor matrices of the two

TT models. This implies that pointwise multiplication of two

TT models has complexity of order NIR4for an INtensor of

TT rank Rfor each wagon. Moreover, rank reduction for TT

models is SVD-based and has complexity order NIR3. Hence,

reducing the rank of the pointwise multiplication (which is R2

due to the Kronecker product) induces a complexity of order

NI(R2)3=NIR6, and as a result, only small TT ranks can

be supported in practice. Another limitation of [5] is that it

is hard to pin down if and when the iteration will converge,

because of the rank reduction step. The numerical experiments

in [5] are interesting but limited in terms of exploring the

accuracy-speedup trade-offs that can be achieved.

Unlike [5], our emphasis is on the tensor rank decom-

position (CPD) model, in good part because many of the

problems that we consider herein admit closed-form tensor

rank decomposition formulations, thus bypassing the need for

computational model ﬁtting and/or rank reduction. Another

issue is that, as we show for the ﬁrst time in this paper,

the core problem is NP-hard even with a TT model, so

there is no intrinsic computational beneﬁt to using one tensor

decomposition model over the other.

B. Contributions

Our contributions relative to the prior art are as follows:

•We focus on the higher-order tensor version of the

problem, and we analyze its computational complexity.

We show that the problem is easy when the tensor rank

is equal to one, but NP-hard otherwise – even if the rank

is as small as two.

•We consider optimization-based (as opposed to algebraic

or randomized) algorithms that can be used to compute

good approximate solutions in the high-order tensor case.

We provide an equivalent “ﬂuid” (continuous-variable)

reformulation of what is inherently a multi-way selection

problem, and a suite of relatively lightweight gradient-

based approximation algorithms. The continuous refor-

mulation is derived by showing that a certain continuous

relaxation involving a probability distribution instead of

hard selection for each mode is actually tight. This is true

for any rank, i.e., even for full-rank (unstructured) ten-

sor models. The proposed algorithms take advantage of

various optimization frameworks, from projected gradient

descent to Franke-Wolfe and various ways of explicitly

parametrizing the probability simplex constraints. For

low-enough ranks, the associated gradient computations

are computationally lightweight, even for very high-order

tensors.

•We show that our main results remain valid no

matter what kind of polyadic tensor model is used

to represent the tensor of interest, including Tucker,

HOSVD/MLSVD, tensor train, or tensor ring.

•We explore the “span” of the core problem considered,

i.e., the class of problems that can be posed as special in-

stances of computing the minimum or maximum element

of a tensor from its rank decomposition. We show that

this class includes the partition problem (and thus all NP-

complete problems via polynomial-time transformation),

as well as integer least squares, integer linear program-

ming, integer quadratic programming, sign retrieval (a

special kind of mixed integer programming / restricted

version of phase retrieval), maximum likelihood decoding

of parity check codes, and ﬁnding the least inconsistent

solution of an overdetermined system of linear equations

over a Galois ﬁeld.

•Finally, we demonstrate promising experimental results

on a number of hard problems, including better than state-

of-art performance in decoding both low density parity

check codes and general parity check codes, and sign

retrieval.

II. PRELIMINARIES

Consider a matrix M=A1AT

2, where Mis I1×I2,A1

is I1×Rand A2is I2×R, with R≤min(I1, I2)(R <

min(I1, I2)if Mis not full-rank). Three other ways to write

the same relation are M=PR

r=1 A1(:, r)A2(:, r)T, where

A(:, r)stands for the r-th column of A;M=PR

r=1 A1(:

, r)◦A2(:, r), where ◦stands for the vector outer product;

and M(i1, i2) = PR

r=1 A1(i1, r)A2(i2, r), where M(i1, i2)

denotes an element of matrix M, with obvious notation. From

the latter, notice that element (i1, i2)of matrix Mis the inner

product of two row vectors, namely row A1(i1,:) of matrix

A1, and row A2(i2,:) of matrix A2.

We are interested in ﬁnding the smallest or largest element

of matrix Mfrom the factors A1and A2. From the latter

relation, we see that seeking the smallest (or largest) element

of Mcorresponds to seeking a pair of row vectors, one drawn

from A1and the other from A2, which have the smallest

(or largest) possible inner product. One obvious way to do

this is to generate all inner products, i.e., the elements of the

product A1AT

2, and ﬁnd the smallest or largest. This entails

complexity I1I2R, which can be high, even when Ris small

– e.g., consider the case where I1and I2are in the order

of 105or more. Is there a way to avoid performing order

I1I2computations? When R= min(I1, I2), such complexity

is unavoidable, because then the elements of Mcan be

freely chosen independently of each other (e.g., from an i.i.d.

Gaussian distribution). When Ris small however, there seems

to be hope.

Generalizing, consider a tensor Tof order Nand size I1×

I2×···×IN, for which we are given a low-rank factorization,

namely

r=1

A1(:, r)◦A2(:, r)◦ ··· ◦ AN(:, r),

where Anis In×R, and R(when minimal for this decom-

position to hold) is the rank of T, in which case the above is

known as the canonical polyadic decomposition (CPD) of T

[6]. When the elements of Tare non-negative and the elements

of the factor matrices Anare constrained to be non-negative,

then the smallest Rfor which such a decomposition exists is

called the non-negative rank of T, which can be higher than

the (unconstrained) rank of T. In the sequel, we do not assume

that Ris minimal; any polyadic decomposition will do for our

purposes. In scalar form, we get that

T(i1, i2,··· , iN) =

r=1

A1(i1, r)A2(i2, r)···AN(iN, r),

which reveals that every element of Tis an inner product

of Nrow vectors, one drawn from each of the matrices

A1,··· ,AN. An alternative way to write this is using the

Hadamard (element-wise) product, denoted by ∗, as

T(i1, i2,··· , iN) = (A1(i1,:) ∗A2(i2,:) ∗ ··· ∗ AN(iN,:)) 1,

where 1is the R×1vector of all 1’s, used to sum up the R

components of the Hadamard product. We see that ﬁnding the

smallest or largest element of Tis tantamount to ﬁnding the

smallest or largest inner product of Nrow vectors, one drawn

from each Anmatrix. There is an obvious way to do this, by

generating all I1×I2×···×INelements of T, at a complexity

cost of R(N−1) ﬂops each, but this exhaustive search is

much worse than it is in the matrix case. It quickly becomes

intractable even for moderate Nand IN, e.g., N= 20 with

In= 10,∀n∈ {1,··· , N}. Is there a more efﬁcient way to

do this?

III. THEORY

When R= 1 and all the Anmatrices (column vectors an

in this case) only have non-negative entries, it is easy to see

that the smallest element of Tis T(ˇ

i1,··· ,ˇ

iN)with ˇ

in∈

arg mininan(in), and likewise the largest is T(ˆ

i1,··· ,ˆ

iN)

with ˆ

in∈arg maxinan(in). For R > 1however, the answer

is no, in the worst case – even if R= 2 and the elements of

all the Anmatrices are non-negative. We have the following

result.

Theorem 1. Finding the smallest element of a tensor from its

rank factorization is NP-hard, even when the tensor rank is as

small as R= 2 and the rank-one factors are all non-negative.

This means that for large N, the worst-case complexity is

exponential in N.

Proof. We show that an arbitrary instance of the partition

problem, which is known to be NP-complete [7], can be

converted to a speciﬁc instance of the decision version of the

problem of interest. This means that if we could efﬁciently

solve any instance of the problem of interest, we would be

in position to efﬁciently solve an arbitrary instance of the

partition problem, which is not possible according to the

current scientiﬁc consensus, unless P=NP.

Recall the partition problem, which is as follows. We

are given Npositive integers w1,··· , wN(repetitions are

allowed), and we wish to know whether there is a subset of

indices S, such that

n∈S

wn=X

n /∈S

wn.

Consider binary variables {in∈ {0,1}}N

n=1, with indesignat-

ing whether n∈ S or not, i.e., in= 1 if n∈ S, else in= 0.

Deciding whether a suitable Sexists is equivalent to deciding

whether

min

{in∈{0,1}}N

n=1 N

n=1

wnin−

n=1

wn(1 −in)!2

>0.

We will instead use a reformulation of the above which is

more conducive for our purposes, namely

min

{in∈{0,1}}N

n=1 ePN

n=1 wnin−ePN

n=1 wn(1−in)2?

>0.

Expanding the square and noting that the product term is a

constant, we obtain

min

{in∈{0,1}}N

n=1

e2PN

n=1 wnin+e2PN

n=1 wn(1−in)?

>2ePN

n=1 wn.

Each of the exponential terms is separable, i.e., a rank-one

tensor comprising non-negative factors. For example, the ﬁrst

term is

ePN

n=1 2wnin=e2w1i1e2w2i2···e2wNiN.

It follows that the function to be minimized is a tensor of size

2× ··· × 2=2Nand rank R= 2, with the following 2×2

matrix factors

An=1e2wn

e2wn1,∀n∈ {1,··· , N}.

Hence, the decision version of our problem with R= 2 and

non-negative factors contains every instance of the partition

problem as a special case. It follows that the decision version

of our problem is NP-hard, and thus its optimization version

is NP-hard as well.

In fact, the same NP-hardness result carries over to other

popular tensor models. These include the so-called Tucker

model [8], Multilinear SVD (MLSVD) [9], the Tensor Train

(TT) decomposition [10], and the Tensor Ring (TR) decom-

position [11].

Theorem 2. Finding the smallest element of a tensor from its

Tucker, MLSVD, TT, or TR factorization is NP-hard.

Proof. All these models are outer product decompositions that

are related. In particular, a rank two CPD is equivalent to a

Tucker model with a diagonal 2×2× ··· × 2core matrix.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Minimizinglow-rankmodelsofhigh-ordertensors:Hardness,span,tightrelaxation,andapplicationsNicholasD.Sidiropoulos,Fellow,IEEE,ParisA.Karakasis,StudentMember,IEEE,AritraKonar,Member,IEEEAbstract—WeconsidertheproblemoffindingthesmallestorlargestentryofatensoroforderNthatisspecifiedviaitsrankdecompositi...

展开>> 收起<<

1 Minimizing low-rank models of high-order tensors Hardness span tight relaxation and applications.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Minimizing low-rank models of high-order tensors Hardness span tight relaxation and applications

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: