Lecture 4
Learning objectives
After this class, you should be able to:
- Explain how the threads are organized and scheduled.
- Explain how resources are allocated to the threads.
- Explain how knowledge of the above can be used to optimize code performance, for example, by choosing suitable number of threads per block.
- Explain
row-major ordering
.
Reading assignment
- Read the UIUC Lec-4 slides.
Exercises and review questions
- Exercises and review questions on current lecture's material
- What is a warp? What is the common characteristic of all threads in a warp with respect to scheduling?
- Why does branch divergence between warps not create the performance penalty that branch divergence within a warp does?
- How does the SIMT implementation on a GPU combine features of both SIMD instructions and simultaneous multi-threading on a conventional processor?
- Implement matrix multiplication on the CPU and GPU using the algorithms presented in Lecture 4, and compare their relative performances in terms of GFlop/s. Report your performance results on the discussion board
- Preparation for the next lecture
- None.
Last modified: 15 Jan 2013