Learning objectives
After this class, you should be able to:
- Given a matrix, a vector, and the number of processors, show the steps involved in the 1-d and 2-d parallel algorithms, and derive the time complexity of these algorithms.
- Explain why the 2-d decomposition is more scalable than the 1-d one for matrix-vector multiplication.
- Explain why we require
b
andc
to have the same data decomposition in the matrix-vector multiplicationc = Ab
.
Reading assignment
- Handout on Parallel algorithms: Slides 27 - 30.
- Refer to an online parallel computing book. For example, section 2.3 discusses reduction, section 3.3 discusses performance metrics, and section 4.6 discusses matrix multiplication (this is more complicated than matrix-vector multiplication, but does show the benefits of a 2-D decomposition).
Exercises and review questions
- Questions on current lecture's material
- Give an example of a
4x4
matrix and a vector, and show the steps involved in the 2-d parallel algorithm for performing matrix-vector multiplication on four processors.
- Questions on next lecture's material
- Write a simple matrix multiplication code and compare its performance (Gflop/s) against the theoretical peak performance of the machine for a large matrix (around
1000x1000
).