1 Fault-Tolerant Strassen-Like Matrix Multiplication Osman B. Güney and Suayb S. Arslan

2025-04-28 0 0 375.46KB 6 页 10玖币
侵权投诉
1
Fault-Tolerant Strassen-Like Matrix Multiplication
Osman B. Güney and Suayb S. Arslan
Abstract
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based
on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using two different algorithms,
new check relations arise resulting in more local computations. These local computations are found using computer aided search.
To improve performance, special parity (extra) sub-matrix multiplications (PSMMs) are generated (two of them) at the expense of
increasing communication/computation cost of the system. Our preliminary results demonstrate that the proposed method outperforms
a Strassen-like algorithm with two copies and secures a very close performance to three copy version using only 2 PSMMs, reducing
the total number of compute nodes by around 24% i.e., from 21 to 16.
Keywords
Strassen-Like Algorithms, Fault-Tolerant Computation, Coded Matrix Multiplication.
I. INTRODUCTION
Matrix Multiplication (MM) is one of the fundamental operations in many fields of engineering (machine learning, computer
graphics and robotics). Due to high dimensions of the matrices, the MM operation may be the dominant factor in the total
operational latency. One of the classical ways to reduce computation delay is to distribute the calculations on multiple nodes
and allow parallel processing. Other ways include using algorithms which have lower computational complexity [1], [2]. To get
the best of both worlds, algorithms with low complexity must be distributed for parallel processing.
In 1969, Strassen discovered that MM can be performed recursively with asymptotically less computational complexity than
that of the naive way [3]. It is shown that computational complexity of MM with 2×2base case can be reduced from O(n3)to
O(nlog27)which is due to using 7 multiplications instead of 8. Any algorithm with this recursive structure is called “Strassen-like
algorithm" [2], [4]. After Strassen, better algorithms in terms of number of computational steps have been developed to reduce
the hidden constant term in the big Onotation which leads to the reduction in the asymptotical computational complexity [5],
[6], [7]. For instance, Winograd [6], has shown that MM can be performed by using 15 additions/subtractions instead of 18 of
Strassen’s algorithm. Winograd’s algorithm is optimal in terms of both multiplications and additions/subtractions for a MM with
2×2standard base case, which achieves the lower bounds provided by Probert [8].
On the other hand in many distributed settings, worker (slave) nodes may become slow or cease operation completely at
random times, which are named as straggler nodes. To overcome this difficulty and speed up the parallel operation, fault
tolerant techniques such as erasure coding are utilized in different contexts [9], [10], [11]. In that, master node utilizes a subset
of the fastest computations by carefully designing redundant calculations. These redundant calculations are generated by first
partitioning the multiplicant and the multiplier into smaller sub-blocks. Then, based on the coding technique, linear combinations
of these sub-blocks are calculated. However unlike standard MM, Strassen-like algorithms use specific sub-blocking and specific
subsets to be added/subtracted and multiplied to achieve recursive complexity advantages. So, direct application of previous
literature to Strassen-like MM algorithms is not possible. In addition, no previous literature considered the fault tolerant version
of Strassen-like MMs, which has been the main motivation of this work.
To fill this gap, we propose fault tolerant techniques in this study for Strassen-like MMs (specifically for Strassen and Winograd
algorithms) through computer aided search, whereby each of the nodes is assumed to perform only one but smaller size MMs.
Furthermore, just like original studies, additional parity computations are designed for generating computation redundancy. Finally,
performance comparisons are provided with theoretical and Monte Carlo simulations.
II. CLASSICAL FAULT-TOLERANT CODED MATRIX MULTIPLICATION METHODS
The motivation of applying erasure codes to speed up distributed algorithms has found a variety of applications. For instance,
an approximate matrix multiplication scheme based on anytime coding is shown to improve the latency performance of the overall
computation in [12]. Yet in another study [13], gradient computations are coded to speed-up training phase of machine learning
algorithms. Some of the classical erasure codes that can be applied to the matrix multiplication include Maximum Distance
Separable (MDS) codes [14], Product codes [15], Sparse graph codes [16], Hierarchichal codes [11] and BP-XOR codes [17].
O. B. Güney was with the Department of Electronics Engineering, Sabancı University, Istanbul, Turkey e-mail: osmanberke@sabanciuniv.edu.
S. S. Arslan was with the Department of Computer Engineering, MEF University, Istanbul, Turkey. email: arslans@mef.edu.tr.
978-1-7281-7206-4/20/$31.00 © 2020 IEEE
arXiv:2210.04441v1 [cs.DC] 10 Oct 2022
2
Suppose that two large matrices are multiplied i.e., ATBwhere ARm×nand BRm×n. The ith and jth columns of the
matrices Aand B, denoted by aiRmand bjRm, respectively, are multiplied (dot product). In other words, computing
ATBamounts to n2dot products, where number of workers, namely Nn2. Suppose the columns of Aand Bare divided
into kequal size matrices. This way, the large matrix multiplication problem ATBcan be viewed as kinstances of smaller
matrix multiplication problems distributed over independent compute nodes. The latest response from these nodes determine the
overall computational time. In MDS-coded computation [14], adding redundancy leads to n>ksubcomputations and the overall
computational time is determined by the kth fastest worker among the nworkers in the same network. Through order statistics,
latency analysis can be performed.
Product coded scheme proposed in [15] encodes computating multiple dimensions and achieves an improved coding gain
relative to the MDS-coded scheme. Product code is one way of constructing a larger code with small codes as building blocks.
For instance, assuming one is given an (n, k)MDS code, a product code can be constructed as follows. We first arrange k2
symbols in a k-by-karray. Every row of the array is encoded with a (n, k)MDS code first. Then, each column of the array
is encoded with a (n, k)MDS code, resulting in n×ncoded computation block. Finally columns or rows are distributed to
nslave nodes for performing the overall computation. Also, sparse graph codes are proposed [16] where each worker chooses
a random number of input submatrices based on a given degree distribution P; then computes a weighted linear combination
where the weights Wij are randomly drawn from a finite set S. When the master node receives a subset of finished tasks such
that the coefficient matrix formed by the weights Wij is full rank, it starts to operate a hybrid decoding algorithm between
belief propagation decoding and Gaussian elimination to recover the resultant matrix C. On the other hand, hierarchical codes
decomposes the overall matrix multiplication task into a hierarchy of heterogeneously sized subtasks. The duty to complete
each subtask is shared amongst all workers and each subtask is (generally) of a different complexity. The motivation for the
hierarchical decomposition is the recognition that more workers will finish the first subtask than the second (or third, forth, etc.)
using erasure coding. Earlier subtasks can therefore be designed to be of a higher rate than later subtasks [11].
In all of these previous research, matrices are partitioned into smaller subblocks and combinations of these subblocks are
selected to be linearly combined according to an erasure coding algorithm. In Strassen-like algorithms, such inherent specific
sub-blocking and combining prevents us from using standard approaches particularly while generating redundant computations.
To our best knowledge, this is the first attempt towards generating fault-tolerant Strassen-like matrix multiplication methodologies.
III. FAULT-TOLERANT STRASSEN-LIKE ALGORITHMS
A. Background
We consider the base 2×2MM below, and consider two Strassen-like algorithms due to Strassen himself and Winorad [6].
The submatrix multiplications used in the Strassen algorithm are shown by Ss and the submatrix multiplications used in the
Winograd algorithm are shown by Ws. The base 2×2MM C=ATBis given by
C11 C12
C21 C22
| {z }
C
=A11 A12
A21 A22
| {z }
AT
B11 B12
B21 B22
| {z }
B
where sub-matrix computations are given by the following set of equations,
S1=(A11+A22)(B11+B22)W1=A11B11
S2=(A21+A22)B11 W2=A12B21
S3=(B12B22)A11 W3=A22
(B11B12B21+B22)
S4=(B21B11)A22 W4=(A11A21)(B22B12)
S5=(A11+A12)B22 W5=(A21+A22)(B12B11)
S6=(A21A11)(B11+B12)W6=B22
(A11+A12A21A22)
S7=(A12A22)(B21+B22)W7=(A11A21A22)(B11B12+B22)
from which we can compute sub-matrices of the multiplication outputs, namely C11, C12, C21, C22 as given by
C11 =S1+S4S5+S7=W1+W2(1)
C12 =S3+S5=W1+W5+W6W7(2)
C21 =S2+S4=W1W3+W4W7(3)
C22 =S1S2+S3+S6=W1+W4+W5W7(4)
If only Strassen or Winograd algorithms were used to calculate Cmatrix above with parallel setting it would require 7
independent compute nodes. In order to compute all necessary sub-matrices we shall need all computations completed on time.
However due to potential stragglers, the final merge (execution of equations (1), (2), (3) and (4)) may be delayed. To overcome
this difficulty, a trivial approach is replication (2-copy), which would require 14 compute nodes. However, although average
performance will increase using this approach, compute nodes which are calculating the same sub-matrix multiplication may be
subject to the presence of stragglers, which would still result in a delayed operation.
摘要:

1Fault-TolerantStrassen-LikeMatrixMultiplicationOsmanB.GüneyandSuaybS.ArslanAbstractInthisstudy,weproposeasimplemethodforfault-tolerantStrassen-likematrixmultiplications.TheproposedmethodisbasedonusingtwodistinctStrassen-likealgorithmsinsteadofreplicatingagivenone.Wehaverealizedthatusingtwodifferent...

展开>> 收起<<
1 Fault-Tolerant Strassen-Like Matrix Multiplication Osman B. Güney and Suayb S. Arslan.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:375.46KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注