Learning objectives and review

Lecture 6

Learning objectives

After this class, you should be able to:

Explain the different types of memories in the CUDA memory model, and how they map to a real NVIDIA GPU.
Give typical latencies for each of the memories, their scope, their lifetime, and explain which ones are cached and which ones are read only.
Use shared memory to write efficient code.

Reading assignment

Read Chapter 4 of Kirk and Hwu's GPU book (GPU-4 on Blackboard -- course library).

Exercises and review questions

Exercises and review questions on current lecture's material

Write a CUDA program that reverses the contents of an array using shared memory. Here, each block of threads will load a chunk of data from DRAM into shared memory, reverse them in shared memory, and write the chunk back from shared memory to DRAM. Let the number of threads in a block be n. They can write data back from shared memory to DRAM in one of the following manners: (i) thread i writes the data in location i to its appropriate location in DRAM, or (ii) thread i writes the data in location n-i-1. Compare the performances of the two alternatives for different numbers of threads per block and numbers of blocks. Compare their performance against your code for a similar review question from Lecture 5. Report your performance results on the discussion board, as a reply to the Lecture 6 thread.

Preparation for the next lecture

How many SPEs does the Cell processor have? How many of those can you use on the PS3?

Last modified: 29 Jan 2010