
1
Fault-Tolerant Strassen-Like Matrix Multiplication
Osman B. Güney and Suayb S. Arslan
Abstract
In this study, we propose a simple method for fault-tolerant Strassen-like matrix multiplications. The proposed method is based
on using two distinct Strassen-like algorithms instead of replicating a given one. We have realized that using two different algorithms,
new check relations arise resulting in more local computations. These local computations are found using computer aided search.
To improve performance, special parity (extra) sub-matrix multiplications (PSMMs) are generated (two of them) at the expense of
increasing communication/computation cost of the system. Our preliminary results demonstrate that the proposed method outperforms
a Strassen-like algorithm with two copies and secures a very close performance to three copy version using only 2 PSMMs, reducing
the total number of compute nodes by around 24% i.e., from 21 to 16.
Keywords
Strassen-Like Algorithms, Fault-Tolerant Computation, Coded Matrix Multiplication.
I. INTRODUCTION
Matrix Multiplication (MM) is one of the fundamental operations in many fields of engineering (machine learning, computer
graphics and robotics). Due to high dimensions of the matrices, the MM operation may be the dominant factor in the total
operational latency. One of the classical ways to reduce computation delay is to distribute the calculations on multiple nodes
and allow parallel processing. Other ways include using algorithms which have lower computational complexity [1], [2]. To get
the best of both worlds, algorithms with low complexity must be distributed for parallel processing.
In 1969, Strassen discovered that MM can be performed recursively with asymptotically less computational complexity than
that of the naive way [3]. It is shown that computational complexity of MM with 2×2base case can be reduced from O(n3)to
O(nlog27)which is due to using 7 multiplications instead of 8. Any algorithm with this recursive structure is called “Strassen-like
algorithm" [2], [4]. After Strassen, better algorithms in terms of number of computational steps have been developed to reduce
the hidden constant term in the big Onotation which leads to the reduction in the asymptotical computational complexity [5],
[6], [7]. For instance, Winograd [6], has shown that MM can be performed by using 15 additions/subtractions instead of 18 of
Strassen’s algorithm. Winograd’s algorithm is optimal in terms of both multiplications and additions/subtractions for a MM with
2×2standard base case, which achieves the lower bounds provided by Probert [8].
On the other hand in many distributed settings, worker (slave) nodes may become slow or cease operation completely at
random times, which are named as straggler nodes. To overcome this difficulty and speed up the parallel operation, fault
tolerant techniques such as erasure coding are utilized in different contexts [9], [10], [11]. In that, master node utilizes a subset
of the fastest computations by carefully designing redundant calculations. These redundant calculations are generated by first
partitioning the multiplicant and the multiplier into smaller sub-blocks. Then, based on the coding technique, linear combinations
of these sub-blocks are calculated. However unlike standard MM, Strassen-like algorithms use specific sub-blocking and specific
subsets to be added/subtracted and multiplied to achieve recursive complexity advantages. So, direct application of previous
literature to Strassen-like MM algorithms is not possible. In addition, no previous literature considered the fault tolerant version
of Strassen-like MMs, which has been the main motivation of this work.
To fill this gap, we propose fault tolerant techniques in this study for Strassen-like MMs (specifically for Strassen and Winograd
algorithms) through computer aided search, whereby each of the nodes is assumed to perform only one but smaller size MMs.
Furthermore, just like original studies, additional parity computations are designed for generating computation redundancy. Finally,
performance comparisons are provided with theoretical and Monte Carlo simulations.
II. CLASSICAL FAULT-TOLERANT CODED MATRIX MULTIPLICATION METHODS
The motivation of applying erasure codes to speed up distributed algorithms has found a variety of applications. For instance,
an approximate matrix multiplication scheme based on anytime coding is shown to improve the latency performance of the overall
computation in [12]. Yet in another study [13], gradient computations are coded to speed-up training phase of machine learning
algorithms. Some of the classical erasure codes that can be applied to the matrix multiplication include Maximum Distance
Separable (MDS) codes [14], Product codes [15], Sparse graph codes [16], Hierarchichal codes [11] and BP-XOR codes [17].
O. B. Güney was with the Department of Electronics Engineering, Sabancı University, Istanbul, Turkey e-mail: osmanberke@sabanciuniv.edu.
S. S. Arslan was with the Department of Computer Engineering, MEF University, Istanbul, Turkey. email: arslans@mef.edu.tr.
978-1-7281-7206-4/20/$31.00 © 2020 IEEE
arXiv:2210.04441v1 [cs.DC] 10 Oct 2022