Learning objectives and review

Lecture 5

Learning objectives

After this class, you should be able to:

Explain how use of shared memory can ameliorate the memory bandwidth bottleneck.
Explain the purpose of the following CUDA keywords and API: __shared__, __constant__, __syncthreads().
Give the approximate latencies for accessing data in (i) register, (ii) shared memory, and (iii) global memory. Also give their lifetime and scope.
Give the typical sizes of shared memory and L1 cache.
Use shared memory to reduce the data transfer overhead in GPU code.
Given a problem, calculate limits on performance based on the memory bandwidth bottleneck.

Reading assignment

Read the UIUC Lec-5 slides.
Chapter 5 of text.

Exercises and review questions

Exercises and review questions on current lecture's material

Give the time complexity of an algorithm, how can you estimate if tiled algorithms may be useful or not?
Write code for matrix multiplication using shared memory and compare its performance with CPU code.

Preparation for the next lecture

None.

Last modified: 24 Jan 2013