Lecture 5
Learning objectives
After this class, you should be able to:
- Write simple, but possibly inefficient, programs using CUDA, compile them using
nvcc
, and run them on a GPU accelerated system. In particular, you should be able to use the following features: (i)cudaGetDeviceCount
, (ii)cudaSetDevice
, (iii)cudaGetDevice
, (iv)__global__
, (v)blockIdx
, (vi)threadIdx
, (vii)cudaMalloc
, (viii)cudaThreadSynchronize
, (ix)cudaMemcpy
, (x)cudaFree
, and (xi) launch kernels.
Reading assignment
- Read Chapters 2 and 3 of Kirk and Hwu's GPU book (GPU-2, GPU-3 on Blackboard -- course library).
- Reference material: (i) CUDA Programming Guide and (ii) CUDA Reference Manual (both available on Blackboard -- course library).
Exercises and review questions
- Exercises and review questions on current lecture's material
- Write a CUDA program that launches a kernel which does nothing. Determine the time taken as a function of the number of threads and number of blocks specified.
- Write a CUDA program that first copies data from host to device, then launches a kernel that does nothing, then copies data from device to host, and finally checks if the data read from the device is identical to that written to the device. Time the data transfer in each direction, and report the bandwidth as a function of the data size.
- Repeat the above, but use
cudaMallocHost
instead ofcudaMalloc
, and use a corresponding routine to free the memory.- Use
cudaMemcpy
to transfer from device to device, and report the bandwidth as a function of the data size.- Write a CUDA program that reverse the contents of an array. Report the performance as a function of the data size, the number of threads, and number of blocks, and compare with the performance on the CPU.
- Preparation for the next lecture
- How much total shared is available over the entire GPU on
gpu.cs.fsu.edu
?
Last modified: 25 Jan 2010