CIS5930 Advanced Topics in Parallel and Distributed Systems, Spring 2014
This course is supported in part by the Nividia CUDA Teaching center program
(See Nvidia press release in May 11, 2011).
Syllabus, Example Programs
Lecture 1
Lecture 2
Lecture 3
Lecture 4
Lecture 5
Lecture 6
Lecture 7
Programming assignment: MPI and CUDA implementation of jacobi,
jacobi.c, input.jacobi.. Due date: Feb 14.
Lecture 8
Lecture 9
Lecture 10
Lecture 11: Recent topology and routing proposal for extreme scale systems
(Peyman Faizian),
slides
- J. Kim, et al., "Technology-Driven, Highly Scalable Dragonfly Topology",
ACM ISCA 2008.
- N. Jiang, et al., "Indirect Adaptive Routing on Large Scale Interconnect
Networks," ACM ISCA 2009.
Lecture 12: Recent topology and routing proposal for extreme scale systems
(Atiqul Mollah and Gaurish Nayak)
- A. Singla, et al., "Jellyfish: Networking Data Centers Randomly," USENIX
NSDI 2012.
- X. Yuan, et al., "A new routing scheme for Jellyfish and its performance
with HPC workloads," ACM SC'13, 2013.
- A. Singla, et al., "High Throughput Data Center Topology Design," USENIX NSDI 2014.
- Reading: Tianhe-1A Interconnect and Message-Passing Services.
Homework 2: Comments on jellyfish topology and routing (summary, advantages,
and drawbacks, 1 page max). Due Feb 25.
Lecture 13
Lecture 14: Ethernet development (Jordan Nowlin):
10-Gigabit
Ethernet (10GE), 40GE, 100GE,
400GE,
RDMA over Converged Ethernet
Lecture 15: SDN and Openflow
Lecture 16: Interconnect Simulation
Lecture 17: Interconnect Modeling
Homework 3: An flow-level event driven network simulator,
Software package, due date: April 3.
Term Project Information
Lecture 18 (03/18):
Implementation and Optimization of MPI collective
communications
- P. Patarasu, A. Faraj, and X. Yuan, "Pipelined Broadcast on Ethernet Switched Clusters." Journal of Parallel and Distributed Computing, 68(6):809-824, June 2008.
Lecture 19 (03/21): Implementation and Optimization of MPI point-to-point
communications
- M. Small, Z. Gu, and X. Yuan, ``Near-optimal Rendezvous Protocols for
RDMA-enabled Clusters,'' International Conference on Parallel Processing
(ICPP), Sept. 2010.
- M. Small and X. Yuan, "A New Design of RDMA-based Small Message
Channels for InfiniBand Clusters," IEEE International Conference on
Cluster Computing (CLUSTER), Sept. 23-27, 2013.
Lecture 20 (03/25): A New method for evaluating interconnect design
- X. Yuan, S. Mahapatra, S. Pakin, and M. Lang, "LFTI: A New Performance
Metric for Assessing Interconnect Designs for Extreme-Scale HPC Systems,"
the 28th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Phoenix, Arizona, May 19-23, 2014.
Lecture 21 (03/27, Nkem Dockery):
Challenges in large-scale graph processing
on HPC platforms and the Graph500 benchmark
- B. Hendrickson and J. Berry, "Graph analysis with high-performance
computing." Computing in Science Engineering, 10(2), march 2008.
- Richard C. Murphy, Kyle B. Wheeler, James, A. Ang, Brian W. Barrett,
"Introcuding the Graph 500," Gray User Group 2010.
- Koji Ueno and Toyotaro Suzumura, "Highly Scalable Graph Search for the
Graph500 Benchmark" HPDC 2012 (The 21st International ACM Symposium on
High-Performance Parallel and Distributed Computing) 2012/6, Delft,
Netherlands.
- http://www.graph500.org
Lecture 22 (04/01, Carlos Sanchez and Soheila):
HPC and Cloud Computing, presentation 1,
presentation 2
- A. Marathe, D. K. Lowenthal, B. Roundtree, M. Schulz, B. de Supinski,
and X. Yuan, "A Comparative Study of High-Performance Computing on
the Cloud," ACM Symposium on High-Performance Parallel and Distributed
Computing (HPDC), June 2013.
- A. Gupta et al., "The Who, What, Why, and How of High Performance
Computing Applications in the Cloud," HP Labs, Tech. Rep., 2013.
available at http://www.hpl.hp.com/techreports/2013/HPL-2013-49.pdf.
Lecture 23 (04/03, Shafayat and Zach):
Architecture-aware communication optimizations,
presentation 1,
presentation 2
- Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas G. Robertazzi,
"Design and performance evaluation of NUMA-aware RDMA-based end-to-end
data transfer systems." SC 2013
- Shigang Li, Torsten Hoefler, Marc Snir, "NUMA-aware shared-memory
collective communication for MPI." HPDC 2013.
Lecture 24 (04/08, Caitlin): Power, Resilience, and exascale computing 1
- "Technical Challenges of Exascale Computing", Available at:
http://institutes.lanl.gov/resilience/docs/JSR-12-310-Challenges_of_exascaleFINAL.pdf.
Lecture 25 (04/10, Ryan): Power, Resilience, and exascale computing 2
- M, Snir, et. al. "Addressing Failures in Exascale Computing",
Available at http://www.mcs.anl.gov/uploads/cels/papers/ANL:MCS-TM-322.pdf.
Lecture 26 (04/15, Tong and Abdullah): Power, Resilience, and exascale
computing 3
- Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic,
Alex Ramirez, Mateo Valero, "Supercomputing with Commodity CPUs: Are
Mobile SoCs Ready for HPC?" SC 2013.
- Osman Sarood, Esteban Meneses, and L. V. Kale, "A Cool Way of
Improving the Reliability of HPC Machines," SC 2013.
Lecture 27 (04/17) Term project presentation
- Peyman/Soheila: Survey of topology mapping and process allocation
on large scale interconnect networks.
- Jordan Nowlin: Survey of security in Software Defined Network
- Ryan Baird/Carlos Sanchez: A New Broadcast Algorithm
Lecture 28 (04/22) Term project presentation
- Catlin Carnahan: MPI implementation of coalescing and augmentation
algorithm for clustering labelled profiles.
- Nekmdirim Dockery: Survey of issues of grade databases in HPC.
- Gaurish Nayak: Parallel Sparse Matrix Multiplication
Lecture 29 (04/24) Term project presentation
- Zhou Tong/Shafayat Rahman: Survey of HPC using Mobile CPU
- Zach Yannes: MPI implementation of General Number Field Sieve.
- Atiqul Mollah/Abdullah Raiaan: Computing multi-commodity flow rates with
max-min fairness on fat-tree topologies.
Final exam (take home, open everything, no discussion), Due April 30, 11:59am. Place hardcopy in my office by the due time.
Final Demo Schedule, May 1, Majors lab