Dates |
Topic |
Dates |
Topic |
28 Aug - 31 Aug
| Chapter 1: Parallel computers and computation
Review questions:
- Do you know the following terminology? SISD, SIMD, MIMD, SMP,
UMA, shared memory machine, distributed memory machine, NUMA, cc-NUMA,
Distributed shared memory, SPMD, message passing, shared memory
programming, task/channel model, multicomputer, multiprocessors. (We
have discussed some of these topics in class yet, but you should read
Chapter 1.)
- Give an example to demonstrate why lack of cache-coherence can lead to incorrect results.
- Suggest some methods of ensuring cache coherence efficiently.
- Can you simulate, through software, a distributed memory machine on
a shared memory machine, or a shared memory machine on a distributed
memory machine?
- Can you efficiently simulate, through software, a
distributed memory machine on a shared memory machine, or a shared
memory machine on a distributed memory machine?
|
22 Oct - 26 Oct
| Discussion on homework, projects and papers
|
4 Sep - 7 Sep
|
Chapter 2: Designing parallel algorithms Review questions:
- Do you understand the issues of: partitioning, domain
decomposition, functional decomposition, communication, local vs
global communication patterns, structured vs unstructured
communication patterns, static vs dynamic communication patterns,
synchronous vs asynchronous communication patterns, agglomeration,
mapping, use of graph partitioning in mapping, divide and conquer
paradigm.
- For the first prefix scheme we discussed in class, can you
show that the time complexity is O(log2N)?
- Can you come up with an example where you may be willing to
sacrifice load balance requirement to improve the total computation
time?
- We considered parallel prefix in a message passing
paradigm. How would you implement it in a shared memory paradigm? What
will the time complexity be? What assumptions on memory access did you
use to derive the time complexity (for example, can multiple
processors read the same memory location simultaneously? Can they
write simultaneously?)?
- Design a parallel algorithm for matrix-vector multiplication and matrix-matrix multiplication.
|
29 Oct - 2 Nov
| Discussions on the projects
|
10 Sep - 14 Sep
| Chapter 3: A quantitative basis for design, sections 3.1 - 3.4
Check HW 1 |
5 Nov - 9 Nov
| Midterm review and midterm
|
17 Sep - 21 Sep |
Section 3.7
Check the partial list of papers, from which you will present later in in the semester.
Review questions:
- Do you understand the issues of: the various factors to
consider in performance (such as execution time, memory requirement,
software development cost, etc), Amdahl's law and limitations,
limitations of extrapolating from observations, asymptotic analysis,
modeling execution time, the communication model we consider, and
improvements to account for contention, efficiency, speed-up,
scalability analysis, iso-efficiency function, the different network
topologies.
- Can you perform a scalability analysis for the parallel prefix algorithms we discussed in class?
- Can you create a torus using only "short" wires (that is, wires of constant length)?
- Can you create a hypercube of dimension greater than three in
the 3-dimensional world in which we live? Can you create it such that
wires do not cross? Can you embed an arbitrary graph in 3-D space so
that wires do not cross?
|
13 Nov - 16 Nov
| Paper presentations
|
24 Sep - 28 Sep
| Chapter 4: Putting components together, sections 4.1 - 4.3
| 19 Nov - 21 Nov
| Paper presentations
|
1 Oct - 5 Oct
| Chapter 4: Putting components together, section 4.6
Review questions:
- Do you know the following: modularity issues in sequential and parallel software, the three composition techniques, their advantages and disadvantages, different matrix distribution schemes (block versus cyclic, striped versus checkerboard, one dimensional versus two dimensional)?
- Given a problem, can you suggest a suitable parallel algorithm
and data distribution, and discuss the trade-offs involved with
different composition techniques? For example, give the total memory
required, discuss factors that may change the total execution time,
and analyze the communication and computation costs.
- For an example of the above, try to analyze the following two
problems: (i) addition of
N numbers, and (ii)
matrix-vector multiplication, where the matrix is distributed in a
checker-board manner, while you can assume any suitable distribution
for the vector, with each processor having N/P elements
of the vector.
|
26 Nov - 30 Nov
| Project presentations
|
8 Oct - 12 Oct
| Chapter 8: MPI
Pacheco's tutorial
Gropp's tutorial
Review questions:
- Do you know the following? The 6 basic MPI calls, need for tags
and communicators, buffering issues and deadlock, how to prevent
deadlocks, immediate sends and receives, duplicating and splitting
communicators, collective communication, topologies, derived data
types (vectors and structures).
- Given a desired topology (for example, a hypercube), can you give
suitable arguments to create a Cartesian mesh that is identical to the
desired topology?
- Can you give an example to demonstrate how send/recv can cause
deadlocks, or given an example, can you determine if deadlock can
occur, under what conditions, and how it can be prevented using
facilities provided by MPI?
- How can you implement reduction using only sends and receives?
- Given a sequential algorithm, you should be able to write an
efficient MPI program for it. For example, try to do this for
matrix-vector multiplication.
|
3 Dec - 7 Dec
| Project presentations
|
15 Oct - 19 Oct
| OpenMP
Review questions:
- Do you know the following? concept of threads, OpenMP execution
model, compiling and OpenMP program on the SGI origin 2000, compiler
directives for creating a parallel region and work-sharing a for loop,
data scope attributes clauses (private, last private, and first
private), how private variables are created, reduction, library calls to set the number of threads, and get the thread number.
- Can you give examples (other than those discussed in class) to
demonstrate errors that can occur in a program when multiple threads
execute a piece of code?
- Given a piece of sequential code, you should be able to
parallelize it with OpenMP directives. For example, try parallelizing
matrix-matrix multiplication.
- What do you think is the most likely reason for the loop variable in a work-shared construct to be private, by default?
- What do you think is the most likely reason for the restrictions OpenMP places on the type of
for loops?
|
|
|