Lecture 20
Learning objectives
After this class, you should be able to:
- Explain stream scheduling in different generations of Nvidia GPUs.
- Use the above knowledge to improve performance (i) when overlapping computation and data movement and (ii) when trying to execute kernels concurrently.
Reading assignment
- UIUC Lecture 21b.
- Read the Nvidia webinar: StreamsAndConcurrencyWebinar.pdf.
- Look up internet resources to learn about timinng using CUDA Events.
Exercises and review questions
- Exercises and review questions on current lecture's material
- Analyze your assignment 3 code that used double and triple buferring for potential bottlenecks based on Fermi's queues. Make changes in your code to improve performance. Report the old and new performance results on Blackboard.
- Preparation for the next lecture
- Modify one of your earlier timing codes to use CUDA Events to perform timing. Is the time obtained close to your previous timing?
Last modified: 26 Mar 2013