Lecture 21

Learning objectives

After this class, you should be able to:

Given a matrix, a vector, and the number of processors, show the steps involved in the 1-d and 2-d parallel algorithms, and derive the time complexity of these algorithms.
Explain why the 2-d decomposition is more scalable than the 1-d one for matrix-vector multiplication.
Explain why we require b and c to have the same data decomposition in the matrix-vector multiplication c = Ab.

Reading assignment

Handout on Parallel algorithms: Slides 27 - 30.
Refer to an online parallel computing book. For example, section 2.3 discusses reduction, section 3.3 discusses performance metrics, and section 4.6 discusses matrix multiplication (this is more complicated than matrix-vector multiplication, but does show the benefits of a 2-D decomposition).

Exercises and review questions

Questions on current lecture's material

Give an example of a 4x4 matrix and a vector, and show the steps involved in the 2-d parallel algorithm for performing matrix-vector multiplication on four processors.

Questions on next lecture's material

Write a simple matrix multiplication code and compare its performance (Gflop/s) against the theoretical peak performance of the machine for a large matrix (around 1000x1000).

Last modified: 17 Nov 2011