Faster Matrix Multiplication via Asymmetric Hashing Ran Duan Tsinghua UniversityHongxun Wu

2025-05-06 0 0 1.64MB 87 页 10玖币
侵权投诉
Faster Matrix Multiplication via Asymmetric Hashing
Ran Duan
Tsinghua University
Hongxun Wu
UC Berkeley
Renfei Zhou
Tsinghua University
Abstract
Fast matrix multiplication is one of the most fundamental problems in algorithm research. The expo-
nent of the optimal time complexity of matrix multiplication is usually denoted by ω. This paper discusses
new ideas for improving the laser method for fast matrix multiplication. We observe that the analysis of
higher powers of the Coppersmith-Winograd tensor [Coppersmith & Winograd 1990] incurs a “combi-
nation loss”, and we partially compensate for it using an asymmetric version of CW’s hashing method.
By analyzing the eighth power of the CW tensor, we give a new bound of ω < 2.371866, which improves
the previous best bound of ω < 2.372860 [Alman & Vassilevska Williams 2020]. Our result breaks the
lower bound of 2.3725 in [Ambainis, Filmus & Le Gall 2015] because of the new method for analyzing
component (constituent) tensors.
Email: duanran@mail.tsinghua.edu.cn.
Email: wuhx@berkeley.edu.
Email: zhourf20@mails.tsinghua.edu.cn.
arXiv:2210.10173v5 [cs.DS] 28 Nov 2023
1 Introduction
The time complexity of multiplying two n×nmatrices is usually denoted by O(nω+o(1))for some real
number ω. (It is easy to see that 2ω3.) Although it has been studied for more than 50 years, the
exact value of ωis still unknown, and many people believe that ω= 2. The determination of the constant
ωwould have wide implications. Not only do many matrix operations have similar complexities as fast
matrix multiplication (FMM) algorithms, such as LUP decomposition, matrix inversion, determinant [AH74,
BH74], algorithms for many combinatorial problems can also be accelerated by FMM algorithms, including
graph problems like transitive closure (see [AH74]), unweighted all-pair shortest paths [Sei95,Zwi02], all-
pair bottleneck path [DP09], and other problems such as context-free grammar parsing [Val75] and RNA-
folding [BGSW19].
The first algorithm breaking cubic time bound is by Strassen in 1969 [Str69], who showed that ω
log27<2.8074. After that, there is a series of research (e.g. [Pan78,BCRL79,Sch81,Rom82,CW81,
Str86,CW90,Sto10,Wil12,LG14]) improving the upper bound for ω. The current best bound is ω <
2.3728596 given by the refined laser method of Alman and Vassilevska Williams [AW21b]. Very recently,
DeepMind [FBH+22] used a reinforcement learning method to improve the ranks of several small matrix
multiplication tensors. They gave a time bound of O(n2.778)over F2.
The most recent improvements along this line are all based on applying Strassens laser method [Str86]
to Coppersmith-Winograd tensors [CW90], CWq. The laser method first picks a tensor Tand takes its N-th
power TN. This tensor Tis usually efficient to compute (having a small asymptotic rank). Then it forces
a subset of variables in TNto be zero (zeroing-out) and degenerates it into the direct sum of independent
matrix multiplication tensors. This gives an efficient algorithm for computing matrix multiplications.
The original work of Coppersmith and Winograd [CW90] not only analyzed CWq, but also its second
power, CW2
q. By a clever hashing technique using the Salam-Spencer set [SS42,Beh46], their analysis
of CWqshows that ω < 2.38719. For the second power, they applied the laser method to CWqafter
merging some subtensors of CW2
qinto larger matrix multiplications. This gives a better bound of ω <
2.375477. Stothers [Sto10] and Vassilevska Williams [Wil12] independently improved the analysis to the
fourth and eighth powers of CWqwhile using computer programs to automate complicated calculations.
Le Gall [LG14] further improved it to the 32-th power by formulating it as a series of convex optimization
problems.
Although analyzing higher and higher powers of CWqgives improved bounds for ω, the analysis is
not tight. Alman and Vassilevska Williams [AW21b] focus on the extra loss in hash moduli in the higher-
power analysis, and they compensate for this loss with their refined laser method. This gives the current best
algorithm. Note that these analyses are all recursive, namely, the value of the q-th power depends on the
bounds for the (q1)-th power.
In this paper, we identify a more implicit “combination loss”, which is our main contribution. Such
loss arises not from a single recursion level, but from the structure of adjacent levels. To demonstrate that
this observation indeed leads to an improved algorithm, we compensate for this loss using an asymmetric
hashing method. This improves the analysis of the second power by Coppersmith and Winograd [CW90] to
ω < 2.374631. We also generalize it to higher powers and obtain the improved bound of ω < 2.371866.1A
similar asymmetric hashing method was used in [CW90] to analyze an asymmetric tensor. Fast rectangular
matrix multiplication algorithms (e.g. [Cop82,Cop97,HP98,LG12,LGU18]) also use asymmetric hashing
to degenerate the tensor power TNin the laser method into independent rectangular matrix multiplications.
In this paper, we use asymetric hashing for a different purpose, that is, to compensate for the “combination
1This bound is slightly better than the previous version of this paper due to more flexibility in optimizing parameters.
1
loss”.
On the limitation side, Ambainis, Filmus and Le Gall [AFLG15] showed that if we analyze the lower
powers of CWqusing previous approaches, including the refined laser method, we cannot give a better
upper bound than ω < 2.3725 even if we go to arbitrarily high powers. Our bound of ω < 2.371866
breaks such a limitation by analyzing only the 8th power. This is because our analysis improves the values
for lower powers and this limitation no longer applies. They also proved a limitation of 2.3078 for a wider
class of algorithms, which includes our algorithm. There are also works that prove more general barriers
[AW21a,AW18,Alm21,CVZ21,BL20]. Our approach is also subject to these barriers.
Another approach to fast matrix multiplication is the group theoretical method of Cohn and Umans
[CU03,CKSU05,CU13]. There are also works on its limitations [ASU12,BCC+16,BCC+17,BCG+22].
1.1 Organization of this Paper
In Section 2, we will first give an overview of the laser method and our improved algorithm. In Section 3, we
introduce the concepts and notations that we will use in this paper, as well as some basic building blocks, for
example, the asymmetric hashing method. To better illustrate our ideas, in Section 4, we give an improved
analysis of the second power of the CW tensor. In Sections 5,6, and 7, we will extend the analysis to higher
powers. In Section 8, we discuss our optimization program and give the numerical results.
2 Technical Overview
Our improvement is based on the observation that a hidden “combination loss” exists. In the overview, we
will explain our ideas based on the second-power analysis of Coppersmith-Winograd [CW90]. Their analysis
implies that ω < 2.375477 which is still the best upper bound via the second power of CWq. We will point
out the “combination loss” in their analysis, and present our main ideas which serve to compensate for such
loss. With certain twists, the same ideas also apply to higher powers and allow us to obtain the improved
bound for ω.
2.1 Coppersmith-Winograd Algorithm
It is helpful to start with a high-level description of Coppersmith-Winograd [CW90]. Our exposition here
differs a little from their original work. Specifically, their work uses the values of subtensors in a black-box
way. We open up this black box and look at the structure inside those subtensors. This change will make the
combination loss visible.
In the following, we will assume some familiarity with the Coppersmith-Winograd Algorithm. For an
exposition of their algorithm, the reader may also refer to the excellent survey by Bläser [Blä13].
The 2nd-power of the CW tensor. To begin, we first set up some minimum notation. The starting point
of the laser method is a tensor Tthat can be efficiently computed (having a small asymptotic rank). The
Coppersmith-Winograd Tensor CWq, whose asymptotic rank is q+ 2, is widely used in previous works. The
exact definition of CWqis the following.2
CWq=
q
X
i=1
(xiyiz0+xiy0zi+x0yizi) + x0y0zq+1 +x0yq+1z0+xq+1y0z0.
2Readers may also refer to Section 3.4.
2
We will only use several properties of CWq. First, it is a tensor over variable sets X={x0, x1, . . . , xq+1},
Y={y0, y1, . . . , yq+1}, and Z={z0, z1, . . . , zq+1}. We let X0={x0}, X1={x1, x2, . . . , xq}, X2=
{xq+1}and define {Yj}j=0,1,2,{Zk}k=0,1,2similarly. Second, for the partition X=X0X1X2,Y=
Y0Y1Y2, and Z=Z0Z1Z2, the following holds:
For all i+j+k̸= 2, the subtensor of CWqover Xi,Yj,Zkis zero.
For all i+j+k= 2, the subtensor of CWqover Xi,Yj,Zkis a matrix multiplication tensor, denoted
by Ti,j,k.
We call this partition the level-1 partition.CWqis the summation of all such Ti,j,ks:
CWq=X
i+j+k=2
Ti,j,k =T0,1,1+T1,0,1+T1,1,0+T0,0,2+T0,2,0+T2,0,0.
For any two sets Aand B, their Cartesian product A×Bis {(a, b)|aA, b B}. The second power
CW2
qis a tensor over variable sets e
X=X×X,e
Y=Y×Y, and e
Z=Z×Z:
CW2
q=X
i+j+k=2 X
i+j+k=2
Ti,j,k Ti,j,k.
For e
X=X×X, the product of two level-1 partitions gives e
X=Si,i′′ Xi×Xi′′ . We define the level-2
partition to be a coarsening of this. For each i∈ {0,1,2,3,4}, we define
e
Xi:=[
i+i′′=i
Xi×Xi′′ ,e
Yj:=[
j+j′′=j
Yj×Yj′′ ,e
Zk:=[
k+k′′=k
Zk×Zk′′ .
For example, e
X2= (X0×X2)(X1×X1)(X2×X0). The level-2 partition is given by e
X=e
X0
e
X1 · · · e
X4,e
Y=e
Y0e
Y1 · · · e
Y4, and e
Z=e
Z0e
Z1 · · · e
Z4.
For any i+j+k= 4, let Ti,j,k be the subtensor of CW2
qover e
Xi,e
Yj,e
Zk. We have
Ti,j,k =X
i+j+k=2
Ti,j,kTii,jj,kk.
For example, T1,1,2=T1,1,0T0,0,2+T0,0,2T1,1,0+T1,0,1T0,1,1+T0,1,1T1,0,1.3
The tensor CW2
qis the sum of all such Ti,j,ks:
CW2
q=X
i+j+k=4
Ti,j,k
=T0,0,4+T0,4,0+T4,0,0
+T0,1,3+T0,3,1+T1,0,3+T1,3,0+T3,0,1+T3,1,0
+T0,2,2+T2,0,2+T2,2,0
+T1,1,2+T1,2,1+T2,1,1.
Note that these Ti,j,ks (i+j+k= 4) may not be matrix multiplication tensors. For the subtensors of CW2
q,
it is true that T0,0,4, T0,1,3, T0,2,2(and their permutations) are indeed all matrix multiplication tensors, while
T1,1,2(and its permutations) are not. We will need this important fact in the two-level analysis of Coppersmith
and Winograd.
3For notational convenience, we let Ti,j,k = 0 if any of i, j, k is negative.
3
The Laser Method. We say a set of subtensors T1, T2, . . . , Tof Tis independent if and only if the fol-
lowing two conditions hold:
These subtensors are supported on disjoint variables.
The restriction of Tover the union of the supports of the subtensors T1, . . . , Tis exactly T1+T2+
· · · +T. (That is, there cannot be any additional term in Tother than those from T1, T2, . . . , T.)
As we have seen, the tensor CWqis the sum of many matrix multiplication tensors Ti,j,k (i+j+k= 2).
If these Ti,j,ks were independent, we would have succeeded, as we would be able to efficiently compute
many independent matrix multiplications using CWq. A direct application of Schönhage’s τtheorem (See
Theorem 3.2) would give us an upper bound on ω.
Clearly, these Ti,j,ks are not even supported on disjoint variables. The first step of the laser method is
to take its n-th tensor power. Since the asymptotic rank of CWqis q+ 2, the n-th power needs (q+ 2)n
many multiplications to compute. (nwill later go to infinity.) Within CWn
q, there are certainly many non-
independent subtensors that are matrix multiplication tensors. We will carefully zero out some variables,
i.e. setting some of them to zero. After zeroing out, all remaining matrix multiplication tensors will be
independent. Therefore, we can apply Schönhages τtheorem. This is the single-level version of the laser
method.
Such a single-level analysis gives a bound of ω < 2.38719. By considering the second power, Cop-
persmith and Winograd [CW90] get an improved bound of ω < 2.375477. Note in the analysis above,
for any two non-independent matrix multiplication tensors, only one of them will survive the zeroing-out.
Looking ahead, the two-level analysis will further exploit the tensor power structure and “merge” some non-
independent matrix multiplication tensors into a larger one.
Variable Blocks. In the two-level analysis, we take the 2n-th power of CWq. After that, the level-1 partition
of the X variables becomes (Xb
i1×Xb
i2× · · · × Xb
i2n)b
i1,b
i2,...,b
i2n∈{0,1,2}(similarly for the Y and Z variables).
Let b
I∈ {0,1,2}2nbe a sequence. We call each Xb
I:=Xb
i1×Xb
i2× · · · × Xb
i2na level-1 variable block.
Same for Yb
Jand Zb
K.4
In level-2, we first equivalently view (CWq)2nas (CW2
q)n, i.e., the n-th power of CW2
q. For
each sequence I∈ {0,1,2,3,4}n, we call its corresponding block XI:=e
Xi1×e
Xi2× · · · × e
Xina
level-2 variable block. We define YJ, ZKsimilarly. (Below, we will always state our definitions in terms
of Xvariable blocks. The same definitions always apply to Y/Z blocks as well.)
Two-level Analysis. We are now ready to describe the two-level analysis of Coppersmith-Winograd in
detail.
The Analysis for Level-2. At this level, our goal is to first zero out (CWq)2ninto independent
subtensors, each isomorphic to
Tα:=O
i+j+k=4
Tα(i,j,k)n
i,j,k
for some distribution α:{(i, j, k)}i+j+k=4 [0,1]. Moreover, each α(i, j, k)has to be a multiple
of 1/n, so that the exponent α(i, j, k)nis always an integer. There are at most poly(n)many such α.
4In the overview, we always let I= (i1,· · · , in)and b
I= (b
i1,· · · ,b
i2n). Other sequences (I,b
I, J, K, etc.) are defined
similarly.
4
摘要:

FasterMatrixMultiplicationviaAsymmetricHashingRanDuan∗TsinghuaUniversityHongxunWu†UCBerkeleyRenfeiZhou‡TsinghuaUniversityAbstractFastmatrixmultiplicationisoneofthemostfundamentalproblemsinalgorithmresearch.Theexpo-nentoftheoptimaltimecomplexityofmatrixmultiplicationisusuallydenotedbyω.Thispaperdiscu...

展开>> 收起<<
Faster Matrix Multiplication via Asymmetric Hashing Ran Duan Tsinghua UniversityHongxun Wu.pdf

共87页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:87 页 大小:1.64MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 87
客服
关注