Lecture 6
Learning objectives
After this class, you should be able to:
- Explain the different types of memories in the CUDA memory model, and how they map to a real NVIDIA GPU.
- Give typical latencies for each of the memories, their scope, their lifetime, and explain which ones are cached and which ones are read only.
- Use shared memory to write efficient code.
Reading assignment
- Read Chapter 4 of Kirk and Hwu's GPU book (GPU-4 on Blackboard -- course library).
Exercises and review questions
- Exercises and review questions on current lecture's material
- Write a CUDA program that reverses the contents of an array using shared memory. Here, each block of threads will load a chunk of data from DRAM into shared memory, reverse them in shared memory, and write the chunk back from shared memory to DRAM. Let the number of threads in a block be
n
. They can write data back from shared memory to DRAM in one of the following manners: (i) threadi
writes the data in locationi
to its appropriate location in DRAM, or (ii) threadi
writes the data in locationn-i-1
. Compare the performances of the two alternatives for different numbers of threads per block and numbers of blocks. Compare their performance against your code for a similar review question from Lecture 5. Report your performance results on the discussion board, as a reply to theLecture 6
thread.- Preparation for the next lecture
- How many SPEs does the Cell processor have? How many of those can you use on the PS3?
Last modified: 29 Jan 2010