Lecture 20

Learning objectives

After this class, you should be able to:

  1. Determine bottlenecks to thread occupancy and latency hiding due to resource constraints on the GPU and resource usage of individual threads.
  2. Optimize memory performance of CUDA code by enabling coalescing, avoiding contention for memory banks, and enabling effective use of constant cache.

Reading assignment

  1. GPU-5 on Blackboard, under the "course library" tab, except sections 5.1 and 5.3.

Exercises and review questions


Last modified: 9 Apr 2010