Faster Matrix Multiplication via Asymmetric Hashing Ran Duan Tsinghua UniversityHongxun Wu

2025-05-06 0 0 1.64MB 87 页 10玖币

侵权投诉

Faster Matrix Multiplication via Asymmetric Hashing

Ran Duan ∗

Tsinghua University

Hongxun Wu †

UC Berkeley

Renfei Zhou ‡

Tsinghua University

Abstract

Fast matrix multiplication is one of the most fundamental problems in algorithm research. The expo-

nent of the optimal time complexity of matrix multiplication is usually denoted by ω. This paper discusses

new ideas for improving the laser method for fast matrix multiplication. We observe that the analysis of

higher powers of the Coppersmith-Winograd tensor [Coppersmith & Winograd 1990] incurs a “combi-

nation loss”, and we partially compensate for it using an asymmetric version of CW’s hashing method.

By analyzing the eighth power of the CW tensor, we give a new bound of ω < 2.371866, which improves

the previous best bound of ω < 2.372860 [Alman & Vassilevska Williams 2020]. Our result breaks the

lower bound of 2.3725 in [Ambainis, Filmus & Le Gall 2015] because of the new method for analyzing

component (constituent) tensors.

∗Email: duanran@mail.tsinghua.edu.cn.

†Email: wuhx@berkeley.edu.

‡Email: zhourf20@mails.tsinghua.edu.cn.

arXiv:2210.10173v5 [cs.DS] 28 Nov 2023

1 Introduction

The time complexity of multiplying two n×nmatrices is usually denoted by O(nω+o(1))for some real

number ω. (It is easy to see that 2≤ω≤3.) Although it has been studied for more than 50 years, the

exact value of ωis still unknown, and many people believe that ω= 2. The determination of the constant

ωwould have wide implications. Not only do many matrix operations have similar complexities as fast

matrix multiplication (FMM) algorithms, such as LUP decomposition, matrix inversion, determinant [AH74,

BH74], algorithms for many combinatorial problems can also be accelerated by FMM algorithms, including

graph problems like transitive closure (see [AH74]), unweighted all-pair shortest paths [Sei95,Zwi02], all-

pair bottleneck path [DP09], and other problems such as context-free grammar parsing [Val75] and RNA-

folding [BGSW19].

The ﬁrst algorithm breaking cubic time bound is by Strassen in 1969 [Str69], who showed that ω≤

log27<2.8074. After that, there is a series of research (e.g. [Pan78,BCRL79,Sch81,Rom82,CW81,

Str86,CW90,Sto10,Wil12,LG14]) improving the upper bound for ω. The current best bound is ω <

2.3728596 given by the reﬁned laser method of Alman and Vassilevska Williams [AW21b]. Very recently,

DeepMind [FBH+22] used a reinforcement learning method to improve the ranks of several small matrix

multiplication tensors. They gave a time bound of O(n2.778)over F2.

The most recent improvements along this line are all based on applying Strassen’s laser method [Str86]

to Coppersmith-Winograd tensors [CW90], CWq. The laser method ﬁrst picks a tensor Tand takes its N-th

power T⊗N. This tensor Tis usually eﬃcient to compute (having a small asymptotic rank). Then it forces

a subset of variables in T⊗Nto be zero (zeroing-out) and degenerates it into the direct sum of independent

matrix multiplication tensors. This gives an eﬃcient algorithm for computing matrix multiplications.

The original work of Coppersmith and Winograd [CW90] not only analyzed CWq, but also its second

power, CW⊗2

q. By a clever hashing technique using the Salam-Spencer set [SS42,Beh46], their analysis

of CWqshows that ω < 2.38719. For the second power, they applied the laser method to CWqafter

merging some subtensors of CW⊗2

qinto larger matrix multiplications. This gives a better bound of ω <

2.375477. Stothers [Sto10] and Vassilevska Williams [Wil12] independently improved the analysis to the

fourth and eighth powers of CWqwhile using computer programs to automate complicated calculations.

Le Gall [LG14] further improved it to the 32-th power by formulating it as a series of convex optimization

problems.

Although analyzing higher and higher powers of CWqgives improved bounds for ω, the analysis is

not tight. Alman and Vassilevska Williams [AW21b] focus on the extra loss in hash moduli in the higher-

power analysis, and they compensate for this loss with their reﬁned laser method. This gives the current best

algorithm. Note that these analyses are all recursive, namely, the value of the q-th power depends on the

bounds for the (q−1)-th power.

In this paper, we identify a more implicit “combination loss”, which is our main contribution. Such

loss arises not from a single recursion level, but from the structure of adjacent levels. To demonstrate that

this observation indeed leads to an improved algorithm, we compensate for this loss using an asymmetric

hashing method. This improves the analysis of the second power by Coppersmith and Winograd [CW90] to

ω < 2.374631. We also generalize it to higher powers and obtain the improved bound of ω < 2.371866.1A

similar asymmetric hashing method was used in [CW90] to analyze an asymmetric tensor. Fast rectangular

matrix multiplication algorithms (e.g. [Cop82,Cop97,HP98,LG12,LGU18]) also use asymmetric hashing

to degenerate the tensor power T⊗Nin the laser method into independent rectangular matrix multiplications.

In this paper, we use asymetric hashing for a diﬀerent purpose, that is, to compensate for the “combination

1This bound is slightly better than the previous version of this paper due to more ﬂexibility in optimizing parameters.

loss”.

On the limitation side, Ambainis, Filmus and Le Gall [AFLG15] showed that if we analyze the lower

powers of CWqusing previous approaches, including the reﬁned laser method, we cannot give a better

upper bound than ω < 2.3725 even if we go to arbitrarily high powers. Our bound of ω < 2.371866

breaks such a limitation by analyzing only the 8th power. This is because our analysis improves the values

for lower powers and this limitation no longer applies. They also proved a limitation of 2.3078 for a wider

class of algorithms, which includes our algorithm. There are also works that prove more general barriers

[AW21a,AW18,Alm21,CVZ21,BL20]. Our approach is also subject to these barriers.

Another approach to fast matrix multiplication is the group theoretical method of Cohn and Umans

[CU03,CKSU05,CU13]. There are also works on its limitations [ASU12,BCC+16,BCC+17,BCG+22].

1.1 Organization of this Paper

In Section 2, we will ﬁrst give an overview of the laser method and our improved algorithm. In Section 3, we

introduce the concepts and notations that we will use in this paper, as well as some basic building blocks, for

example, the asymmetric hashing method. To better illustrate our ideas, in Section 4, we give an improved

analysis of the second power of the CW tensor. In Sections 5,6, and 7, we will extend the analysis to higher

powers. In Section 8, we discuss our optimization program and give the numerical results.

2 Technical Overview

Our improvement is based on the observation that a hidden “combination loss” exists. In the overview, we

will explain our ideas based on the second-power analysis of Coppersmith-Winograd [CW90]. Their analysis

implies that ω < 2.375477 which is still the best upper bound via the second power of CWq. We will point

out the “combination loss” in their analysis, and present our main ideas which serve to compensate for such

loss. With certain twists, the same ideas also apply to higher powers and allow us to obtain the improved

bound for ω.

2.1 Coppersmith-Winograd Algorithm

It is helpful to start with a high-level description of Coppersmith-Winograd [CW90]. Our exposition here

diﬀers a little from their original work. Speciﬁcally, their work uses the values of subtensors in a black-box

way. We open up this black box and look at the structure inside those subtensors. This change will make the

combination loss visible.

In the following, we will assume some familiarity with the Coppersmith-Winograd Algorithm. For an

exposition of their algorithm, the reader may also refer to the excellent survey by Bläser [Blä13].

The 2nd-power of the CW tensor. To begin, we ﬁrst set up some minimum notation. The starting point

of the laser method is a tensor Tthat can be eﬃciently computed (having a small asymptotic rank). The

Coppersmith-Winograd Tensor CWq, whose asymptotic rank is q+ 2, is widely used in previous works. The

exact deﬁnition of CWqis the following.2

CWq=

i=1

(xiyiz0+xiy0zi+x0yizi) + x0y0zq+1 +x0yq+1z0+xq+1y0z0.

2Readers may also refer to Section 3.4.

We will only use several properties of CWq. First, it is a tensor over variable sets X={x0, x1, . . . , xq+1},

Y={y0, y1, . . . , yq+1}, and Z={z0, z1, . . . , zq+1}. We let X0={x0}, X1={x1, x2, . . . , xq}, X2=

{xq+1}and deﬁne {Yj}j=0,1,2,{Zk}k=0,1,2similarly. Second, for the partition X=X0∪X1∪X2,Y=

Y0∪Y1∪Y2, and Z=Z0∪Z1∪Z2, the following holds:

•For all i+j+k̸= 2, the subtensor of CWqover Xi,Yj,Zkis zero.

•For all i+j+k= 2, the subtensor of CWqover Xi,Yj,Zkis a matrix multiplication tensor, denoted

by Ti,j,k.

We call this partition the level-1 partition.CWqis the summation of all such Ti,j,k’s:

CWq=X

i+j+k=2

Ti,j,k =T0,1,1+T1,0,1+T1,1,0+T0,0,2+T0,2,0+T2,0,0.

For any two sets Aand B, their Cartesian product A×Bis {(a, b)|a∈A, b ∈B}. The second power

CW⊗2

qis a tensor over variable sets e

X=X×X,e

Y=Y×Y, and e

Z=Z×Z:

CW⊗2

q=X

i+j+k=2 X

i′+j′+k′=2

Ti,j,k ⊗Ti′,j′,k′.

For e

X=X×X, the product of two level-1 partitions gives e

X=Si′,i′′ Xi′×Xi′′ . We deﬁne the level-2

partition to be a coarsening of this. For each i∈ {0,1,2,3,4}, we deﬁne

Xi:=[

i′+i′′=i

Xi′×Xi′′ ,e

Yj:=[

j′+j′′=j

Yj′×Yj′′ ,e

Zk:=[

k′+k′′=k

Zk′×Zk′′ .

For example, e

X2= (X0×X2)∪(X1×X1)∪(X2×X0). The level-2 partition is given by e

X=e

X0∪

X1∪ · · · ∪ e

X4,e

Y=e

Y0∪e

Y1∪ · · · ∪ e

Y4, and e

Z=e

Z0∪e

Z1∪ · · · ∪ e

Z4.

For any i+j+k= 4, let Ti,j,k be the subtensor of CW⊗2

qover e

Xi,e

Yj,e

Zk. We have

Ti,j,k =X

i′+j′+k′=2

Ti′,j′,k′⊗Ti−i′,j−j′,k−k′.

For example, T1,1,2=T1,1,0⊗T0,0,2+T0,0,2⊗T1,1,0+T1,0,1⊗T0,1,1+T0,1,1⊗T1,0,1.3

The tensor CW⊗2

qis the sum of all such Ti,j,k’s:

CW⊗2

q=X

i+j+k=4

Ti,j,k

=T0,0,4+T0,4,0+T4,0,0

+T0,1,3+T0,3,1+T1,0,3+T1,3,0+T3,0,1+T3,1,0

+T0,2,2+T2,0,2+T2,2,0

+T1,1,2+T1,2,1+T2,1,1.

Note that these Ti,j,k’s (i+j+k= 4) may not be matrix multiplication tensors. For the subtensors of CW⊗2

it is true that T0,0,4, T0,1,3, T0,2,2(and their permutations) are indeed all matrix multiplication tensors, while

T1,1,2(and its permutations) are not. We will need this important fact in the two-level analysis of Coppersmith

and Winograd.

3For notational convenience, we let Ti,j,k = 0 if any of i, j, k is negative.

The Laser Method. We say a set of subtensors T1, T2, . . . , Tℓof Tis independent if and only if the fol-

lowing two conditions hold:

•These subtensors are supported on disjoint variables.

•The restriction of Tover the union of the supports of the subtensors T1, . . . , Tℓis exactly T1+T2+

· · · +Tℓ. (That is, there cannot be any additional term in Tother than those from T1, T2, . . . , Tℓ.)

As we have seen, the tensor CWqis the sum of many matrix multiplication tensors Ti,j,k (i+j+k= 2).

If these Ti,j,k’s were independent, we would have succeeded, as we would be able to eﬃciently compute

many independent matrix multiplications using CWq. A direct application of Schönhage’s τtheorem (See

Theorem 3.2) would give us an upper bound on ω.

Clearly, these Ti,j,k’s are not even supported on disjoint variables. The ﬁrst step of the laser method is

to take its n-th tensor power. Since the asymptotic rank of CWqis q+ 2, the n-th power needs (q+ 2)n

many multiplications to compute. (nwill later go to inﬁnity.) Within CW⊗n

q, there are certainly many non-

independent subtensors that are matrix multiplication tensors. We will carefully zero out some variables,

i.e. setting some of them to zero. After zeroing out, all remaining matrix multiplication tensors will be

independent. Therefore, we can apply Schönhage’s τtheorem. This is the single-level version of the laser

method.

Such a single-level analysis gives a bound of ω < 2.38719. By considering the second power, Cop-

persmith and Winograd [CW90] get an improved bound of ω < 2.375477. Note in the analysis above,

for any two non-independent matrix multiplication tensors, only one of them will survive the zeroing-out.

Looking ahead, the two-level analysis will further exploit the tensor power structure and “merge” some non-

independent matrix multiplication tensors into a larger one.

Variable Blocks. In the two-level analysis, we take the 2n-th power of CWq. After that, the level-1 partition

of the X variables becomes (Xb

i1×Xb

i2× · · · × Xb

i2n)b

i1,b

i2,...,b

i2n∈{0,1,2}(similarly for the Y and Z variables).

Let b

I∈ {0,1,2}2nbe a sequence. We call each Xb

I:=Xb

i1×Xb

i2× · · · × Xb

i2na level-1 variable block.

Same for Yb

Jand Zb

K.4

In level-2, we ﬁrst equivalently view (CWq)⊗2nas (CW⊗2

q)⊗n, i.e., the n-th power of CW⊗2

q. For

each sequence I∈ {0,1,2,3,4}n, we call its corresponding block XI:=e

Xi1×e

Xi2× · · · × e

Xina

level-2 variable block. We deﬁne YJ, ZKsimilarly. (Below, we will always state our deﬁnitions in terms

of Xvariable blocks. The same deﬁnitions always apply to Y/Z blocks as well.)

Two-level Analysis. We are now ready to describe the two-level analysis of Coppersmith-Winograd in

detail.

•The Analysis for Level-2. At this level, our goal is to ﬁrst zero out (CWq)⊗2ninto independent

subtensors, each isomorphic to

Tα:=O

i+j+k=4

T⊗α(i,j,k)n

i,j,k

for some distribution α:{(i, j, k)}i+j+k=4 →[0,1]. Moreover, each α(i, j, k)has to be a multiple

of 1/n, so that the exponent α(i, j, k)nis always an integer. There are at most poly(n)many such α.

4In the overview, we always let I= (i1,· · · , in)and b

I= (b

i1,· · · ,b

i2n). Other sequences (I′,b

I′, J, K, etc.) are deﬁned

similarly.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FasterMatrixMultiplicationviaAsymmetricHashingRanDuan∗TsinghuaUniversityHongxunWu†UCBerkeleyRenfeiZhou‡TsinghuaUniversityAbstractFastmatrixmultiplicationisoneofthemostfundamentalproblemsinalgorithmresearch.Theexpo-nentoftheoptimaltimecomplexityofmatrixmultiplicationisusuallydenotedbyω.Thispaperdiscu...

展开>> 收起<<

Faster Matrix Multiplication via Asymmetric Hashing Ran Duan Tsinghua UniversityHongxun Wu.pdf

共87页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Faster Matrix Multiplication via Asymmetric Hashing Ran Duan Tsinghua UniversityHongxun Wu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: