CIS5930-02 High performance computing for scientific applications

Lectures

Home works

Assignment 1
Due: 17 Feb 2003.
Assignment 2
Due: 31 Mar 2003.

Paper presentation

www.cs.fsu.edu/~asriniva/courses/hpcsa/papers.html

Lectures

Lecture 6: 27 Jan 2003

Reading assignment:
1. Read Chapter 1: sections 1.5 - 1.5, 1.8.
2. Read Chapter 2: sections 2.2.1 - 2.2.4, 2.2.7, 2.4 - 2.7, table 2.1.
Review questions:
1. How are computers classified, based on Flynn's taxonomy?
2. Do you know what the following mean: SISD, SIMD, and MIMD?
3. In which of the following classes does a traditional sequential computer fall: SISD, SIMD, or MIMD?
4. In which of the following classes do most current commercial parallel computers fall: SISD, SIMD, or MIMD?
5. In which of the following classes does a super-scalar, pipelined sequential computer fall: SISD, SIMD, or MIMD?
6. How are MIMD computers classified, based on their address space organization?
7. Do you know the following terminology: distributed memory, shared memory, distributed shared memory, SMP, UMA, NUMA, multicomputer, multiprocessor, centralized multiprocessor, distributed multiprocessor?

Lecture 7: 29 Jan 2003

Reading assignment:
1. Read Chapter 4: section 4.2.
2. Read Chapter 7: sections 7.1-7.4.
3. Read Chapter 17: sections 17.1-17.4.
Review questions:
1. What is "false-sharing"?
2. Determine the diameter, bisection width, and cost for each of the following network topologies, as a function of the number of node, n: 2-D torus, 3-D torus, and hypercube?
3. What are some parallel programming paradigms that we discussed in class?
4. What is SPMD?
5. What are the definitions of efficiency and speedup? What do they attempt to measure?
6. What is a thread?

Lecture 8: 3 Feb 2003

Reading assignment:
1. Read Chapter 17: sections 17.1-17.7.
2. Read the OpenMP standard: www.openmp.org.
Review questions:
1. What are Amdahl's law and Gustafson's law?
2. Given the sequential fraction, what are limits on speed-up obtained using Amdahl's law and Gustafson's law? Why do the two laws yield different results?
3. Do you know the following? Concept of threads, OpenMP execution model, compiling and OpenMP program on the SGI origin 2000, compiler directives for creating a parallel region and work-sharing a for loop, reduction, library calls to set the number of threads, and get the thread number.
4. Can you give examples to demonstrate errors that can occur in a program when multiple threads execute a piece of code?
5. What do you think is the most likely reason for the restrictions OpenMP places on the type of for loops?

Lecture 9: 5 Feb 2003

Reading assignment:
1. Read Chapter 17: sections 17.1-17.7.
2. Read the OpenMP standard: www.openmp.org.
Review questions:
1. Do you know the following? OpenMP constructs for reduction, avoiding barriers at the end of for loops, setting the scheduling policy, defining critical sections, declaring variables private; and library calls for setting the number of threads and obtaining the number of processors.
2. Do you know the semantics of reduction, private, firstprivate, and lastprivate?
3. Can you give an example to demonstrate how changing the order of loops may enable more effective parallelization with OpenMP (see page 510 of the text book).
Examples: www.cs.fsu.edu/~asriniva/courses/hpcsa/examples/lec9.tar

Lecture 12: 17 Feb 2003

Reading assignment:
1. Read Chapter 3: section 3.5.
2. Read the class notes.
Review questions:
1. Do you know the following? The communication model that we are using, parallel algorithms for reduction, prefix, and solving linear recurrences?
2. Our communication model does not take into account the distance between processors. Is this justified? If we need to take it into account, how might you model the communication cost? Would the algorithms we discussed need to be modified to make them more effective? Would you choose certain architectures over others, in order to permit efficient implementation?
3. What would you need to do to handle n that is not a power-of-two, in the reduction and prefix algorithms? How would you change them if the number of processors is much greater than n? What are the speed up and efficiency of these algorithms?

Lecture 13: 19 Feb 2003

Reading assignment:
1. Read Chapter 4.
2. Read the class notes, Gropp's and Pacheco's tutorials, and relevant topics from the MPI standard.
Review questions:
1. Do you know the following? the six "basic" MPI calls, Isends and receives, collective communications calls, the semantics of the communication calls, potential for deadlock, and how it can be avoided.
Examples: www.cs.fsu.edu/~asriniva/courses/hpcsa/examples/lec13.tar

Lecture 15: 26 Feb 2003

Reading assignment:
1. Read the class notes, Gropp's and Pacheco's tutorials, and relevant topics from the MPI standard.
Review questions:
1. Do you know the following (in MPI)? duplicating communicators, splitting communicators, defining topologies, defining derived data types -- vectors and structures.

Lecture 16: 3 Mar 2003

Reading assignment:
1. Sections: 8.2, 8.3, 8.4.1, and 8.6.1 from the text.
Review questions:
1. Do you know the following? sequential matrix-vector multiplication, different ways of distributing matrices on to processors (terminology: striped, checkerboard, block, cyclic), parallel matrix-vector multiplication with striped and checkerboard distributions.
Miscellaneous:
1. Please select project and paper presentation topics, and get my approval.

Lecture 17: 5 Mar 2003

Reading assignment:
1. Sections: 8.4.1, 8.6.1, 11.1, and 11.2 from the text.
Review questions:
1. Do you know the following? how to multiply two matrices, what the time complexity of sequential matrix multiplication is, the cache model that we used, how the loop order can affect the number of cache misses in matrix multiplication.

Lecture 18: 17 Mar 2003

Reading assignment:
1. Sections: 11.3 and 11.4 from the text.
Review questions:
1. Do you know the following? Cache aware and cache oblivious algorithms for sequential matrix multiplication, how to multiply two matrices using one-dimensional and 2-dimensional decompositions, why we change the initial distribution of matrix blocks in Canon's algorithm.
2. If the matrices are initially distributed in a 1-D manner, then is it worthwhile changing them to 2-D in order to perform the multiplication more efficiently? Determine the time taken for the redistribution and determine the conditions under which the savings from the matrix multiplication make the re-distribution worthwhile.

Lecture 19: 19 Mar 2003

Reading assignment:
1. Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps.
Review questions:
1. Do you know the following? The basic aims of domain decomposition, how the domain decomposition issue is modeled as a graph partitioning problem, the justification for the edge-cut metric as a measure of the communication cost, and its shortfalls, if the graph partitioning problem is NP complete, the three geometric partitioning techniques that we studied, and their advantages and disadvantages.
2. Can you define a measure to define communication cost that is better than the edge-cut metric?
3. Can you define a metric to measure the communication cost so that the domain decomposition problem has a polynomial time algorithm?

Lecture 20: 24 Mar 2003

Reading assignment:
1. Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps (same as in Lec 19).
Review questions:
1. Do you know the following? the two combinatorial partitioning techniques that we studied, and their advantages and disadvantages, the fundamental difference between geometric and combinatorial techniques for graph partitioning, of the two combinatorial techniques that we studied, which one is used to refining an existing partition, and which one can be used to create a partition from scratch, the spectral method.

Lecture 21: 26 Mar 2003

Reading assignment:
1. Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps (same as in Lec 20).
Review questions:
1. Do you know the following? The three basic steps in multilevel methods, algorithms that you can use in each of those steps, the effectiveness of multilevel, relative to the other methods that we have studied, in terms of computational effort and quality of the partitioning.

Lecture 22: 31 Mar 2003

Reading assignment:
1. Read the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/karypischapter.ps (same as in Lec 20).
We completed the discussion of multilevel methods.

Lecture 23: 2 Apr 2003

Reading assignment:
1. Read sections 10.1, 10.2, and 10.3 from the text, and class notes.
Review questions:
1. Do you know the following? What is the fundamental idea behind Monte Carlo methods? What is the basic idea behind Monte Carlo integration? What advantages does Monte Carlo integration have over traditional numerical quadrature? How does the error of Monte Carlo integration decrease with the number of samples? How is Monte Carlo traditionally parallelized, and how does it scale up with the number of processes?

Lecture 24: 7 Apr 2003

Reading assignment:
1. Read sections 10.1, 10.2, and 10.3 from the text, class notes, and the following paper: www.cs.fsu.edu/~asriniva/courses/hpcsa/pprng.ps.
Review questions:
1. Do you know the following? Do you understand the terminology used in random number generation, such as 'cycle', 'seed', 'iteration function', and 'period'? What are low discrepancy sequences (quasi-random numbers), how do they differ from pseudo-random numbers, and how does the error of integration decrease with number of samples when they are used? What are the two broad classes of random number parallelization techniques? Can you mention two methods under each of the two classes, and their relative advantages and disadvantages?

Lecture 25: 9 Apr 2003

Paper presentation.

Lecture 26: 14 Apr 2003

Paper presentation.

Lecture 27: 16 Apr 2003

Paper presentation.

Lecture 28: 21 Apr 2003

Paper presentation.

Lecture 29: 23 Apr 2003

Project presentations.

Finals review

Last modified: 28 April 2003