Learning objectives and review

Lecture 13

Learning objectives

After this class, you should be able to:

Given an algorithm, derive its cache complexity under the ideal cache model.
Given nested loops, reorder them to improve cache performance.
Given a problem, develop a cache aware algorithm for it.
Given a problem, develop a cache oblivious algorithm for it.

Reading assignment

Read the document on cache aware and cache oblivious algorithms under the course library tab on Blackboard.

Exercises and review questions

Exercises and review questions on current lecture's material

Note the small difference in loop order, and some related optimizations, between Lec13/CAMM1D.c and Lec13/CAMMRowMaj.c. Make other changes to your matrix multiplication code and report the performance on the discussion board.
Search the following assembly codes: Lec13/CAMM1D.s, Lec13/CAMMRowMaj.s, and Lec13/MMBLAS.s for the SSE SIMD instruction for single precision multiplication: mulps. Explain why the first code has worse performance than the second one and why the second one has worse performance than the first. (Note: You can disassemble executables on Linux using objdum -d, in case you wish to perform similar analysis on other executables.)

Preparation for the next lecture

Change the number of threads in the Lec13/CAMM_omp.c code and report, on the discussion board, the performance as the number of threads and the matrix size change.

Last modified: 22 Feb 2010