Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures

C. C. Chou, Y. F. Deng, G. Li, Y. Wang

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

We present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen's method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen's method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen's method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20% for M ≈ 1000 and more than 100% for M ≈ 5000.

Original languageEnglish
Pages (from-to)49-69
Number of pages21
JournalComputers and Mathematics with Applications
Volume30
Issue number2
DOIs
Publication statusPublished - 1995 Jul

Fingerprint

Memory architecture
Matrix multiplication
Distributed Memory
Parallel Methods
Vertex of a graph
Architecture
Timing
Ring
Demonstrate

All Science Journal Classification (ASJC) codes

  • Modelling and Simulation
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

@article{22b5d77f88fa46b8b152907be731e703,
title = "Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures",
abstract = "We present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen's method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen's method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen's method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20{\%} for M ≈ 1000 and more than 100{\%} for M ≈ 5000.",
author = "Chou, {C. C.} and Deng, {Y. F.} and G. Li and Y. Wang",
year = "1995",
month = "7",
doi = "10.1016/0898-1221(95)00077-C",
language = "English",
volume = "30",
pages = "49--69",
journal = "Computers and Mathematics with Applications",
issn = "0898-1221",
publisher = "Elsevier Limited",
number = "2",

}

Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures. / Chou, C. C.; Deng, Y. F.; Li, G.; Wang, Y.

In: Computers and Mathematics with Applications, Vol. 30, No. 2, 07.1995, p. 49-69.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures

AU - Chou, C. C.

AU - Deng, Y. F.

AU - Li, G.

AU - Wang, Y.

PY - 1995/7

Y1 - 1995/7

N2 - We present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen's method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen's method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen's method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20% for M ≈ 1000 and more than 100% for M ≈ 5000.

AB - We present a parallel method for matrix multiplication on distributed-memory MIMD architectures based on Strassen's method. Our timing tests, performed on a 56-node Intel Paragon, demonstrate the realization of the potential of the Strassen's method with a complexity of 4.7 M2.807 at the system level rather than the node level at which several earlier works have been focused. The parallel efficiency is nearly perfect when the processor number is the power of 7. The parallelized Strassen's method seems always faster than the traditional matrix multiplication methods whose complexity is 2M3 coupled with the BMR method and the Ring method at the system level. The speed gain depends on matrix order M: 20% for M ≈ 1000 and more than 100% for M ≈ 5000.

UR - http://www.scopus.com/inward/record.url?scp=0029342667&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029342667&partnerID=8YFLogxK

U2 - 10.1016/0898-1221(95)00077-C

DO - 10.1016/0898-1221(95)00077-C

M3 - Article

AN - SCOPUS:0029342667

VL - 30

SP - 49

EP - 69

JO - Computers and Mathematics with Applications

JF - Computers and Mathematics with Applications

SN - 0898-1221

IS - 2

ER -