Lecture 13
Learning objectives
After this class, you should be able to:
- Given an algorithm, derive its cache complexity under the ideal cache model.
- Given nested loops, reorder them to improve cache performance.
- Given a problem, develop a cache aware algorithm for it.
- Given a problem, develop a cache oblivious algorithm for it.
Reading assignment
- Read the document on cache aware and cache oblivious algorithms under the course library tab on Blackboard.
Exercises and review questions
- Exercises and review questions on current lecture's material
- Note the small difference in loop order, and some related optimizations, between
Lec13/CAMM1D.c
andLec13/CAMMRowMaj.c
. Make other changes to your matrix multiplication code and report the performance on the discussion board.- Search the following assembly codes:
Lec13/CAMM1D.s
,Lec13/CAMMRowMaj.s
, andLec13/MMBLAS.s
for the SSE SIMD instruction for single precision multiplication:mulps
. Explain why the first code has worse performance than the second one and why the second one has worse performance than the first. (Note: You can disassemble executables on Linux usingobjdum -d
, in case you wish to perform similar analysis on other executables.)- Preparation for the next lecture
- Change the number of threads in the
Lec13/CAMM_omp.c
code and report, on the discussion board, the performance as the number of threads and the matrix size change.
Last modified: 22 Feb 2010