Learning objectives and review

Lecture 5

Learning objectives

After this class, you should be able to:

Write simple, but possibly inefficient, programs using CUDA, compile them using nvcc, and run them on a GPU accelerated system. In particular, you should be able to use the following features: (i) cudaGetDeviceCount, (ii) cudaSetDevice, (iii) cudaGetDevice, (iv) __global__, (v) blockIdx, (vi) threadIdx, (vii) cudaMalloc, (viii) cudaThreadSynchronize, (ix) cudaMemcpy, (x) cudaFree, and (xi) launch kernels.

Reading assignment

Read Chapters 2 and 3 of Kirk and Hwu's GPU book (GPU-2, GPU-3 on Blackboard -- course library).
Reference material: (i) CUDA Programming Guide and (ii) CUDA Reference Manual (both available on Blackboard -- course library).

Exercises and review questions

Exercises and review questions on current lecture's material

Write a CUDA program that launches a kernel which does nothing. Determine the time taken as a function of the number of threads and number of blocks specified.
Write a CUDA program that first copies data from host to device, then launches a kernel that does nothing, then copies data from device to host, and finally checks if the data read from the device is identical to that written to the device. Time the data transfer in each direction, and report the bandwidth as a function of the data size.
Repeat the above, but use cudaMallocHost instead of cudaMalloc, and use a corresponding routine to free the memory.
Use cudaMemcpy to transfer from device to device, and report the bandwidth as a function of the data size.
Write a CUDA program that reverse the contents of an array. Report the performance as a function of the data size, the number of threads, and number of blocks, and compare with the performance on the CPU.

Preparation for the next lecture

How much total shared is available over the entire GPU on gpu.cs.fsu.edu?

Last modified: 25 Jan 2010