Learning objectives and review

Lecture 4

Learning objectives

After this class, you should be able to:

Explain how the threads are organized and scheduled.
Explain how resources are allocated to the threads.
Explain how knowledge of the above can be used to optimize code performance, for example, by choosing suitable number of threads per block.
Explain row-major ordering.

Reading assignment

Read the UIUC Lec-4 slides.

Exercises and review questions

Exercises and review questions on current lecture's material

What is a warp? What is the common characteristic of all threads in a warp with respect to scheduling?
Why does branch divergence between warps not create the performance penalty that branch divergence within a warp does?
How does the SIMT implementation on a GPU combine features of both SIMD instructions and simultaneous multi-threading on a conventional processor?
Implement matrix multiplication on the CPU and GPU using the algorithms presented in Lecture 4, and compare their relative performances in terms of GFlop/s. Report your performance results on the discussion board

Preparation for the next lecture

None.

Last modified: 15 Jan 2013