1 Fault-Tolerant Strassen-Like Matrix Multiplication Osman B. Güney and Suayb S. Arslan

2025-04-28 1 0 375.46KB 6 页 10玖币

Fault-Tolerant Strassen-Like Matrix Multiplication

Osman B. Güney and Suayb S. Arslan

Abstract

In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based

on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using two different algorithms,

new check relations arise resulting in more local computations. These local computations are found using computer aided search.

To improve performance, special parity (extra) sub-matrix multiplications (PSMMs) are generated (two of them) at the expense of

increasing communication/computation cost of the system. Our preliminary results demonstrate that the proposed method outperforms

a Strassen-like algorithm with two copies and secures a very close performance to three copy version using only 2 PSMMs, reducing

the total number of compute nodes by around 24% i.e., from 21 to 16.

Keywords

Strassen-Like Algorithms, Fault-Tolerant Computation, Coded Matrix Multiplication.

I. INTRODUCTION

Matrix Multiplication (MM) is one of the fundamental operations in many ﬁelds of engineering (machine learning, computer

graphics and robotics). Due to high dimensions of the matrices, the MM operation may be the dominant factor in the total

operational latency. One of the classical ways to reduce computation delay is to distribute the calculations on multiple nodes

and allow parallel processing. Other ways include using algorithms which have lower computational complexity [1], [2]. To get

the best of both worlds, algorithms with low complexity must be distributed for parallel processing.

In 1969, Strassen discovered that MM can be performed recursively with asymptotically less computational complexity than

that of the naive way [3]. It is shown that computational complexity of MM with 2×2base case can be reduced from O(n3)to

O(nlog27)which is due to using 7 multiplications instead of 8. Any algorithm with this recursive structure is called “Strassen-like

algorithm" [2], [4]. After Strassen, better algorithms in terms of number of computational steps have been developed to reduce

the hidden constant term in the big Onotation which leads to the reduction in the asymptotical computational complexity [5],

[6], [7]. For instance, Winograd [6], has shown that MM can be performed by using 15 additions/subtractions instead of 18 of

Strassen’s algorithm. Winograd’s algorithm is optimal in terms of both multiplications and additions/subtractions for a MM with

2×2standard base case, which achieves the lower bounds provided by Probert [8].

On the other hand in many distributed settings, worker (slave) nodes may become slow or cease operation completely at

random times, which are named as straggler nodes. To overcome this difﬁculty and speed up the parallel operation, fault

tolerant techniques such as erasure coding are utilized in different contexts [9], [10], [11]. In that, master node utilizes a subset

of the fastest computations by carefully designing redundant calculations. These redundant calculations are generated by ﬁrst

partitioning the multiplicant and the multiplier into smaller sub-blocks. Then, based on the coding technique, linear combinations

of these sub-blocks are calculated. However unlike standard MM, Strassen-like algorithms use speciﬁc sub-blocking and speciﬁc

subsets to be added/subtracted and multiplied to achieve recursive complexity advantages. So, direct application of previous

literature to Strassen-like MM algorithms is not possible. In addition, no previous literature considered the fault tolerant version

of Strassen-like MMs, which has been the main motivation of this work.

To ﬁll this gap, we propose fault tolerant techniques in this study for Strassen-like MMs (speciﬁcally for Strassen and Winograd

algorithms) through computer aided search, whereby each of the nodes is assumed to perform only one but smaller size MMs.

Furthermore, just like original studies, additional parity computations are designed for generating computation redundancy. Finally,

performance comparisons are provided with theoretical and Monte Carlo simulations.

II. CLASSICAL FAULT-TOLERANT CODED MATRIX MULTIPLICATION METHODS

The motivation of applying erasure codes to speed up distributed algorithms has found a variety of applications. For instance,

an approximate matrix multiplication scheme based on anytime coding is shown to improve the latency performance of the overall

computation in [12]. Yet in another study [13], gradient computations are coded to speed-up training phase of machine learning

algorithms. Some of the classical erasure codes that can be applied to the matrix multiplication include Maximum Distance

Separable (MDS) codes [14], Product codes [15], Sparse graph codes [16], Hierarchichal codes [11] and BP-XOR codes [17].

O. B. Güney was with the Department of Electronics Engineering, Sabancı University, Istanbul, Turkey e-mail: osmanberke@sabanciuniv.edu.

S. S. Arslan was with the Department of Computer Engineering, MEF University, Istanbul, Turkey. email: arslans@mef.edu.tr.

arXiv:2210.04441v1 [cs.DC] 10 Oct 2022

Suppose that two large matrices are multiplied i.e., ATBwhere A∈Rm×nand B∈Rm×n. The ith and jth columns of the

matrices Aand B, denoted by ai∈Rmand bj∈Rm, respectively, are multiplied (dot product). In other words, computing

ATBamounts to n2dot products, where number of workers, namely N≥n2. Suppose the columns of Aand Bare divided

into kequal size matrices. This way, the large matrix multiplication problem ATBcan be viewed as kinstances of smaller

matrix multiplication problems distributed over independent compute nodes. The latest response from these nodes determine the

overall computational time. In MDS-coded computation [14], adding redundancy leads to n>ksubcomputations and the overall

computational time is determined by the kth fastest worker among the nworkers in the same network. Through order statistics,

latency analysis can be performed.

Product coded scheme proposed in [15] encodes computating multiple dimensions and achieves an improved coding gain

relative to the MDS-coded scheme. Product code is one way of constructing a larger code with small codes as building blocks.

For instance, assuming one is given an (n, k)MDS code, a product code can be constructed as follows. We ﬁrst arrange k2

symbols in a k-by-karray. Every row of the array is encoded with a (n, k)MDS code ﬁrst. Then, each column of the array

is encoded with a (n, k)MDS code, resulting in n×ncoded computation block. Finally columns or rows are distributed to

nslave nodes for performing the overall computation. Also, sparse graph codes are proposed [16] where each worker chooses

a random number of input submatrices based on a given degree distribution P; then computes a weighted linear combination

where the weights Wij are randomly drawn from a ﬁnite set S. When the master node receives a subset of ﬁnished tasks such

that the coefﬁcient matrix formed by the weights Wij is full rank, it starts to operate a hybrid decoding algorithm between

belief propagation decoding and Gaussian elimination to recover the resultant matrix C. On the other hand, hierarchical codes

decomposes the overall matrix multiplication task into a hierarchy of heterogeneously sized subtasks. The duty to complete

each subtask is shared amongst all workers and each subtask is (generally) of a different complexity. The motivation for the

hierarchical decomposition is the recognition that more workers will ﬁnish the ﬁrst subtask than the second (or third, forth, etc.)

using erasure coding. Earlier subtasks can therefore be designed to be of a higher rate than later subtasks [11].

In all of these previous research, matrices are partitioned into smaller subblocks and combinations of these subblocks are

selected to be linearly combined according to an erasure coding algorithm. In Strassen-like algorithms, such inherent speciﬁc

sub-blocking and combining prevents us from using standard approaches particularly while generating redundant computations.

To our best knowledge, this is the ﬁrst attempt towards generating fault-tolerant Strassen-like matrix multiplication methodologies.

III. FAULT-TOLERANT STRASSEN-LIKE ALGORITHMS

A. Background

We consider the base 2×2MM below, and consider two Strassen-like algorithms due to Strassen himself and Winorad [6].

The submatrix multiplications used in the Strassen algorithm are shown by Ss and the submatrix multiplications used in the

Winograd algorithm are shown by Ws. The base 2×2MM C=ATBis given by

C11 C12

C21 C22

| {z }

=A11 A12

A21 A22

| {z }

B11 B12

B21 B22

| {z }

where sub-matrix computations are given by the following set of equations,

S1=(A11+A22)(B11+B22)W1=A11B11

S2=(A21+A22)B11 W2=A12B21

S3=(B12−B22)A11 W3=A22

(B11−B12−B21+B22)

S4=(B21−B11)A22 W4=(A11−A21)(B22−B12)

S5=(A11+A12)B22 W5=(A21+A22)(B12−B11)

S6=(A21−A11)(B11+B12)W6=B22

(A11+A12−A21−A22)

S7=(A12−A22)(B21+B22)W7=(A11−A21−A22)(B11−B12+B22)

from which we can compute sub-matrices of the multiplication outputs, namely C11, C12, C21, C22 as given by

C11 =S1+S4−S5+S7=W1+W2(1)

C12 =S3+S5=W1+W5+W6−W7(2)

C21 =S2+S4=W1−W3+W4−W7(3)

C22 =S1−S2+S3+S6=W1+W4+W5−W7(4)

If only Strassen or Winograd algorithms were used to calculate Cmatrix above with parallel setting it would require 7

independent compute nodes. In order to compute all necessary sub-matrices we shall need all computations completed on time.

However due to potential stragglers, the ﬁnal merge (execution of equations (1), (2), (3) and (4)) may be delayed. To overcome

this difﬁculty, a trivial approach is replication (2-copy), which would require 14 compute nodes. However, although average

performance will increase using this approach, compute nodes which are calculating the same sub-matrix multiplication may be

subject to the presence of stragglers, which would still result in a delayed operation.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Fault-TolerantStrassen-LikeMatrixMultiplicationOsmanB.GüneyandSuaybS.ArslanAbstractInthisstudy,weproposeasimplemethodforfault-tolerantStrassen-likematrixmultiplications.TheproposedmethodisbasedonusingtwodistinctStrassen-likealgorithmsinsteadofreplicatingagivenone.Wehaverealizedthatusingtwodifferent...

展开>> 收起<<

1 Fault-Tolerant Strassen-Like Matrix Multiplication Osman B. Güney and Suayb S. Arslan.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Fault-Tolerant Strassen-Like Matrix Multiplication Osman B. Güney and Suayb S. Arslan

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: