1 Minimizing low-rank models of high-order tensors Hardness span tight relaxation and applications

2025-04-30 0 0 6.22MB 14 页 10玖币
侵权投诉
1
Minimizing low-rank models of high-order tensors:
Hardness, span, tight relaxation, and applications
Nicholas D. Sidiropoulos, Fellow, IEEE, Paris A. Karakasis, Student Member, IEEE, Aritra Konar, Member, IEEE
Abstract—We consider the problem of finding the smallest or
largest entry of a tensor of order Nthat is specified via its
rank decomposition. Stated in a different way, we are given N
sets of R-dimensional vectors and we wish to select one vector
from each set such that the sum of the Hadamard product
of the selected vectors is minimized or maximized. We show
that this fundamental tensor problem is NP-hard for any tensor
rank higher than one, and polynomial-time solvable in the rank-
one case. We also propose a continuous relaxation and prove
that it is tight for any rank. For low-enough ranks, the pro-
posed continuous reformulation is amenable to low-complexity
gradient-based optimization, and we propose a suite of gradient-
based optimization algorithms drawing from projected gradient
descent, Frank-Wolfe, or explicit parametrization of the relaxed
constraints. We also show that our core results remain valid no
matter what kind of polyadic tensor model is used to represent
the tensor of interest, including Tucker, HOSVD/MLSVD, tensor
train, or tensor ring. Next, we consider the class of problems
that can be posed as special instances of the problem of interest.
We show that this class includes the partition problem (and
thus all NP-complete problems via polynomial-time transforma-
tion), integer least squares, integer linear programming, integer
quadratic programming, sign retrieval (a special kind of mixed
integer programming / restricted version of phase retrieval),
and maximum likelihood decoding of parity check codes. We
demonstrate promising experimental results on a number of
hard problems, including state-of-art performance in decoding
low density parity check codes and general parity check codes.
Index Terms—Tensors, rank decomposition, inner product,
algorithms, complexity theory, NP-hard problems, error control
coding, maximum likelihood decoding, parity check codes, be-
lief propagation, under/over-determined linear equations, Galois
fields, sign retrieval.
I. INTRODUCTION
Finding the smallest or largest entry of a matrix or tensor
specified via its low-rank factors is a fundamental problem
with numerous applications in science and engineering. The
matrix version of the problem has received some attention
[1], [2], motivated by applications in graph mining (e.g., most
significant missing link prediction, nodes that share many
neighbors), text/speech/audio similarity search and retrieval
(e.g., using text embeddings), and recommender systems (e.g.,
Manuscript received May 30, 2023; revised Oct. 19, 2023; accepted Nov.
21, 2023. The Associate Editor coordinating the review was Dr. Yuxin Chen.
This research was not supported by any sponsor. The authors believed it
is worth pursuing. N. Sidiropoulos and P. Karakasis are with the Dept.
of ECE, University of Virginia, Charlottesville, VA, 22904 USA; e-mail:
(nikos,karakasis)@virginia.edu A. Konar is with the Dept. of EE, KU Leuven,
Belgium; e-mail: aritra.konar@kuleuven.be.
This paper has supplementary downloadable material available at
http://ieeexplore.ieee.org, provided by the author. The material includes soft-
ware for reproducing the linear parity check decoding experiments. This
material is 806 KB in size.
finding the best item-context combination to recommend to a
given user).
The tensor version of the problem is considerably more
powerful, as it allows going beyond bipartite matching / pre-
diction which can be broadly useful in knowledge discovery.
As an example, given embeddings of patients, conditions,
drugs, clinical trials, therapies, we may be interested in finding
the best match, as measured by a function of the pointwise
product of the respective embeddings. Tensors can also be used
to model high-dimensional probability distributions, wherein
low-rank tensor models are used to break the curse of dimen-
sionality while allowing easy marginalization and computation
of conditional probabilities, which are crucial for prediction
[3]. In this context, finding the largest element corresponds
to finding the mode / maximum likelihood or maximum a-
posteriori estimate of missing variables, and it is not unusual
to encounter very high-order probability tensors indexed by
hundreds of categorical input variables. Despite the obvious
importance of the tensor version of this problem, there is
scant literature about it - we found only [4], which essentially
extends the approach in [2] to the low-order tensor case.
A. Prior work
The matrix case was considered in [1], which proposed us-
ing a power-method type algorithm that works directly with the
low-rank factors. In [2], the authors developed a randomized
“diamond sampling” approach for computing the maximum
element of the product of two matrices (which could be, e.g.,
two low-rank factors) in what they called the MAD (Maximum
All-pairs Dot-product) search. Their algorithm comes with
probabilistic performance guarantees and was demonstrated
to work well in practice using a variety of datasets. As
already mentioned above, [4] extends the randomized diamond
sampling approach in [2] to “star sampling” for the tensor case.
These randomized algorithms do not scale well to high-order
tensors, owing to the curse of dimensionality.
A very different approach for the higher-order tensor version
of the problem has been proposed in the computational physics
/ numerical algebra literature [5]. The basic idea of [5] (see
also references therein) is as follows. By vectorizing the
tensor and putting the resulting (very long) vector on the
diagonal of a matrix, the tensor elements become eigenvalues
corresponding to coordinate basis eigenvectors. This suggests
that the maximum element of the tensor can be computed
through a power iteration involving this very large matrix.
Of course power iterations implemented naively will have
prohibitive complexity (as tensor vectorization produces a very
arXiv:2210.11413v3 [eess.SP] 21 Dec 2023
2
long vector and thus a huge matrix). The idea is therefore to
employ a tensor factorization model to ease the matrix-vector
multiplication of the diagonal matrix and the interim solution
vector. This multiplication can be computed by summing the
elements of the pointwise (Hadamard) product of the two
vectors – the vectorized tensor on the matrix diagonal and
the interim solution vector. This product can be efficiently
computed for various tensor models, but the pointwise multi-
plication of two tensors of rank Rgenerally has rank up to R2,
necessitating conversion back to a rank-Rtensor. The natural
way to do this is via rank-Rleast-squares approximation of
the higher-rank product. This is a hard and generally ill-posed
computational problem, because the best approximation may
not even exist. Thus the method cannot be used with a tensor
rank decomposition model (known as Canonical Polyadic
Decomposition or CPD). The pointwise multiplication and
least-squares approximation are easier with a so-called tensor
train (TT) model. The Hadamard product of two TT models
is another TT model whose wagon (factor) matrices are the
Kronecker products of the respective factor matrices of the two
TT models. This implies that pointwise multiplication of two
TT models has complexity of order NIR4for an INtensor of
TT rank Rfor each wagon. Moreover, rank reduction for TT
models is SVD-based and has complexity order NIR3. Hence,
reducing the rank of the pointwise multiplication (which is R2
due to the Kronecker product) induces a complexity of order
NI(R2)3=NIR6, and as a result, only small TT ranks can
be supported in practice. Another limitation of [5] is that it
is hard to pin down if and when the iteration will converge,
because of the rank reduction step. The numerical experiments
in [5] are interesting but limited in terms of exploring the
accuracy-speedup trade-offs that can be achieved.
Unlike [5], our emphasis is on the tensor rank decom-
position (CPD) model, in good part because many of the
problems that we consider herein admit closed-form tensor
rank decomposition formulations, thus bypassing the need for
computational model fitting and/or rank reduction. Another
issue is that, as we show for the first time in this paper,
the core problem is NP-hard even with a TT model, so
there is no intrinsic computational benefit to using one tensor
decomposition model over the other.
B. Contributions
Our contributions relative to the prior art are as follows:
We focus on the higher-order tensor version of the
problem, and we analyze its computational complexity.
We show that the problem is easy when the tensor rank
is equal to one, but NP-hard otherwise – even if the rank
is as small as two.
We consider optimization-based (as opposed to algebraic
or randomized) algorithms that can be used to compute
good approximate solutions in the high-order tensor case.
We provide an equivalent “fluid” (continuous-variable)
reformulation of what is inherently a multi-way selection
problem, and a suite of relatively lightweight gradient-
based approximation algorithms. The continuous refor-
mulation is derived by showing that a certain continuous
relaxation involving a probability distribution instead of
hard selection for each mode is actually tight. This is true
for any rank, i.e., even for full-rank (unstructured) ten-
sor models. The proposed algorithms take advantage of
various optimization frameworks, from projected gradient
descent to Franke-Wolfe and various ways of explicitly
parametrizing the probability simplex constraints. For
low-enough ranks, the associated gradient computations
are computationally lightweight, even for very high-order
tensors.
We show that our main results remain valid no
matter what kind of polyadic tensor model is used
to represent the tensor of interest, including Tucker,
HOSVD/MLSVD, tensor train, or tensor ring.
We explore the “span” of the core problem considered,
i.e., the class of problems that can be posed as special in-
stances of computing the minimum or maximum element
of a tensor from its rank decomposition. We show that
this class includes the partition problem (and thus all NP-
complete problems via polynomial-time transformation),
as well as integer least squares, integer linear program-
ming, integer quadratic programming, sign retrieval (a
special kind of mixed integer programming / restricted
version of phase retrieval), maximum likelihood decoding
of parity check codes, and finding the least inconsistent
solution of an overdetermined system of linear equations
over a Galois field.
Finally, we demonstrate promising experimental results
on a number of hard problems, including better than state-
of-art performance in decoding both low density parity
check codes and general parity check codes, and sign
retrieval.
II. PRELIMINARIES
Consider a matrix M=A1AT
2, where Mis I1×I2,A1
is I1×Rand A2is I2×R, with Rmin(I1, I2)(R <
min(I1, I2)if Mis not full-rank). Three other ways to write
the same relation are M=PR
r=1 A1(:, r)A2(:, r)T, where
A(:, r)stands for the r-th column of A;M=PR
r=1 A1(:
, r)A2(:, r), where stands for the vector outer product;
and M(i1, i2) = PR
r=1 A1(i1, r)A2(i2, r), where M(i1, i2)
denotes an element of matrix M, with obvious notation. From
the latter, notice that element (i1, i2)of matrix Mis the inner
product of two row vectors, namely row A1(i1,:) of matrix
A1, and row A2(i2,:) of matrix A2.
We are interested in finding the smallest or largest element
of matrix Mfrom the factors A1and A2. From the latter
relation, we see that seeking the smallest (or largest) element
of Mcorresponds to seeking a pair of row vectors, one drawn
from A1and the other from A2, which have the smallest
(or largest) possible inner product. One obvious way to do
this is to generate all inner products, i.e., the elements of the
product A1AT
2, and find the smallest or largest. This entails
complexity I1I2R, which can be high, even when Ris small
– e.g., consider the case where I1and I2are in the order
of 105or more. Is there a way to avoid performing order
I1I2computations? When R= min(I1, I2), such complexity
is unavoidable, because then the elements of Mcan be
3
freely chosen independently of each other (e.g., from an i.i.d.
Gaussian distribution). When Ris small however, there seems
to be hope.
Generalizing, consider a tensor Tof order Nand size I1×
I2×···×IN, for which we are given a low-rank factorization,
namely
T=
R
X
r=1
A1(:, r)A2(:, r)◦ ··· ◦ AN(:, r),
where Anis In×R, and R(when minimal for this decom-
position to hold) is the rank of T, in which case the above is
known as the canonical polyadic decomposition (CPD) of T
[6]. When the elements of Tare non-negative and the elements
of the factor matrices Anare constrained to be non-negative,
then the smallest Rfor which such a decomposition exists is
called the non-negative rank of T, which can be higher than
the (unconstrained) rank of T. In the sequel, we do not assume
that Ris minimal; any polyadic decomposition will do for our
purposes. In scalar form, we get that
T(i1, i2,··· , iN) =
R
X
r=1
A1(i1, r)A2(i2, r)···AN(iN, r),
which reveals that every element of Tis an inner product
of Nrow vectors, one drawn from each of the matrices
A1,··· ,AN. An alternative way to write this is using the
Hadamard (element-wise) product, denoted by , as
T(i1, i2,··· , iN) = (A1(i1,:) A2(i2,:) ∗ ··· ∗ AN(iN,:)) 1,
where 1is the R×1vector of all 1s, used to sum up the R
components of the Hadamard product. We see that finding the
smallest or largest element of Tis tantamount to finding the
smallest or largest inner product of Nrow vectors, one drawn
from each Anmatrix. There is an obvious way to do this, by
generating all I1×I2×···×INelements of T, at a complexity
cost of R(N1) flops each, but this exhaustive search is
much worse than it is in the matrix case. It quickly becomes
intractable even for moderate Nand IN, e.g., N= 20 with
In= 10,n∈ {1,··· , N}. Is there a more efficient way to
do this?
III. THEORY
When R= 1 and all the Anmatrices (column vectors an
in this case) only have non-negative entries, it is easy to see
that the smallest element of Tis T(ˇ
i1,··· ,ˇ
iN)with ˇ
in
arg mininan(in), and likewise the largest is T(ˆ
i1,··· ,ˆ
iN)
with ˆ
inarg maxinan(in). For R > 1however, the answer
is no, in the worst case – even if R= 2 and the elements of
all the Anmatrices are non-negative. We have the following
result.
Theorem 1. Finding the smallest element of a tensor from its
rank factorization is NP-hard, even when the tensor rank is as
small as R= 2 and the rank-one factors are all non-negative.
This means that for large N, the worst-case complexity is
exponential in N.
Proof. We show that an arbitrary instance of the partition
problem, which is known to be NP-complete [7], can be
converted to a specific instance of the decision version of the
problem of interest. This means that if we could efficiently
solve any instance of the problem of interest, we would be
in position to efficiently solve an arbitrary instance of the
partition problem, which is not possible according to the
current scientific consensus, unless P=NP.
Recall the partition problem, which is as follows. We
are given Npositive integers w1,··· , wN(repetitions are
allowed), and we wish to know whether there is a subset of
indices S, such that
X
n∈S
wn=X
n /∈S
wn.
Consider binary variables {in∈ {0,1}}N
n=1, with indesignat-
ing whether n∈ S or not, i.e., in= 1 if n∈ S, else in= 0.
Deciding whether a suitable Sexists is equivalent to deciding
whether
min
{in∈{0,1}}N
n=1 N
X
n=1
wnin
N
X
n=1
wn(1 in)!2
?
>0.
We will instead use a reformulation of the above which is
more conducive for our purposes, namely
min
{in∈{0,1}}N
n=1 ePN
n=1 wninePN
n=1 wn(1in)2?
>0.
Expanding the square and noting that the product term is a
constant, we obtain
min
{in∈{0,1}}N
n=1
e2PN
n=1 wnin+e2PN
n=1 wn(1in)?
>2ePN
n=1 wn.
Each of the exponential terms is separable, i.e., a rank-one
tensor comprising non-negative factors. For example, the first
term is
ePN
n=1 2wnin=e2w1i1e2w2i2···e2wNiN.
It follows that the function to be minimized is a tensor of size
2× ··· × 2=2Nand rank R= 2, with the following 2×2
matrix factors
An=1e2wn
e2wn1,n∈ {1,··· , N}.
Hence, the decision version of our problem with R= 2 and
non-negative factors contains every instance of the partition
problem as a special case. It follows that the decision version
of our problem is NP-hard, and thus its optimization version
is NP-hard as well.
In fact, the same NP-hardness result carries over to other
popular tensor models. These include the so-called Tucker
model [8], Multilinear SVD (MLSVD) [9], the Tensor Train
(TT) decomposition [10], and the Tensor Ring (TR) decom-
position [11].
Theorem 2. Finding the smallest element of a tensor from its
Tucker, MLSVD, TT, or TR factorization is NP-hard.
Proof. All these models are outer product decompositions that
are related. In particular, a rank two CPD is equivalent to a
Tucker model with a diagonal 2×2× ··· × 2core matrix.
摘要:

1Minimizinglow-rankmodelsofhigh-ordertensors:Hardness,span,tightrelaxation,andapplicationsNicholasD.Sidiropoulos,Fellow,IEEE,ParisA.Karakasis,StudentMember,IEEE,AritraKonar,Member,IEEEAbstract—WeconsidertheproblemoffindingthesmallestorlargestentryofatensoroforderNthatisspecifiedviaitsrankdecompositi...

展开>> 收起<<
1 Minimizing low-rank models of high-order tensors Hardness span tight relaxation and applications.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:6.22MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注