Lecture 5
Learning objectives
After this class, you should be able to:
- Explain how use of shared memory can ameliorate the memory bandwidth bottleneck.
- Explain the purpose of the following CUDA keywords and API:
__shared__
,__constant__
, __syncthreads().- Give the approximate latencies for accessing data in (i) register, (ii) shared memory, and (iii) global memory. Also give their lifetime and scope.
- Give the typical sizes of shared memory and L1 cache.
- Use shared memory to reduce the data transfer overhead in GPU code.
- Given a problem, calculate limits on performance based on the memory bandwidth bottleneck.
Reading assignment
- Read the UIUC Lec-5 slides.
- Chapter 5 of text.
Exercises and review questions
- Exercises and review questions on current lecture's material
- Give the time complexity of an algorithm, how can you estimate if tiled algorithms may be useful or not?
- Write code for matrix multiplication using shared memory and compare its performance with CPU code.
- Preparation for the next lecture
- None.
Last modified: 24 Jan 2013