Lecture 2
Learning objectives
After this class, you should be able to:
- Explain how a CUDA program is compiled.
- Explain the purpose of the following variables, APIs, and keywords:
blockIdx
,blockDim
,threadIdx
,cudaMalloc
,cudaFree
,cudaMemcpy
,__global__
,__device__
,__host__
,cudaThreadSynchronize
.- Explain how multiple threads on the GPU are partitioned into blocks and used to perform data parallel computation.
- Use the above features to write simple CUDA programs.
Reading assignment
- Read the UIUC Lec-2 slides.
- Search online for information on
cudaMallocHost
andcudaFreeHost
.Exercises and review questions
- Exercises and review questions on current lecture's material
- Write CUDA code to compute the squares of the first
N
integers.- Write CUDA code to determine the following: (i) data transfer bandwidth from host to device, (ii) data transfer bandwidth from device to host, (iii) data transfer bandwidth from host to device using pinned memory, (iv) data transfer bandwidth from device to host using pinned memory, and (v) kernel creation overhead.
- Preparation for the next lecture
- What is pinned memory?
- Give an example of two
3 x 3
matrices and show their product. (Post your answer on the discussion board).
Last modified: 10 Jan 2013