Lecture 19
Learning objectives
After this class, you should be able to:
- Explain the position of the GPU as a part of the PC architecture.
- Use the above knowledge to identify performance bottlenecks for applications.
- Explain the purpose of the
cudaHostAlloc
call and how it can be used for zero copy, which has the potential to improve performance.
Reading assignment
- UIUC Lecture 20.
- Read the article on PCIe available at: arstechnica.com/features/2004/07/pcie.
- Look up internet resources to learn about zero copy.
Exercises and review questions
- Exercises and review questions on current lecture's material
- Which of the following three factors is most likely to limit vector addition performance: (i) PCIe bandwidth, (ii) device DRAM bandwidth, or (iii) GPU computing speed?
- Implement zero copy for vector addition and compare with performance without it. Report the performance results on Blackboard.
- Preparation for the next lecture
- Review double bufferring.
Last modified: 14 Mar 2013