CS532 - High Performance Distributed Computing

Tuesday/Thursday 11:00am-12:15pm, Streibel Hall Room 115

This course introduces the basics of cluster and supercomputing with MPI (the message passing interface) and the basics of GPU computing with CUDA and OpenCL. This course will also review current research being done in the areas of high performance distributed computing and GPU computing.


  • 01/19/2016 - Most account issues can be resolved by visiting the Computer Science Department Wiki'.
  • 01/14/2016 - You can find updated instructions for using the CS cluster and setting up your own MPI daemon ring here.
  • 01/14/2016 - You can get to the Computational Research Center (CRC)'s website here. It has instructions for compiling, transferring files and running things on the large campus clusters.

Contact Information



Office Hours

Streibel Hall 220
To Be Decided
Or by appointment.


Email is the best and most reliable way to contact me.


Department of Computer Science
University of North Dakota
Streibel Hall Room 220
3950 Campus Road Stop 9015
Grand Forks, North Dakota 52802-9015
Date Topic Lecture Notes/Lab Files Reading
01/12/2016 No Class - Travel
01/14/2016 Course Introduction
01/19/2016 Parallel Hardware - Part 1
Chapters 2.1-2.3 (Pacheco)
01/21/2016 Lab Parallel Hardware [pdf]
01/26/2016 Parallel Hardware - Part 1
Chapters 2.3, 2.6 (Pacheco)
01/28/2016 Lab
02/02/2016 MPI - Part 1 MPI [pdf]
02/04/2016 Lab
02/09/2016 MPI - Part 2 mpi_reduce_example.cxx
02/11/2016 Lab
02/16/2016 MPI - Part 3 mpi_process_ring.cxx
02/18/2016 Lab
02/23/2016 GPU Computing with CUDA - Part 1 (Basics) CUDA 1 [pdf]
02/25/2016 Lab
03/01/2016 GPU Computing with CUDA - Part 2 (Grids, Blocks, Warps, Threads) CUDA 2 [pdf]
03/03/2016 Lab
03/08/2016 GPU Computing with CUDA - Part 3 (Synchronization) CUDA 3 [pdf]
03/10/2016 Lab
03/15/2016 No Class - Spring Break
03/17/2016 No Class - Spring Break
03/22/2016 Project Proposal Presentations
03/24/2016 Lab
03/29/2016 Lab
03/31/2016 Lab
04/05/2016 Lab
04/07/2016 Class will be held at the UND Big Data Summit in the River Valley Room in the Memorial Union.
04/12/2016 Lab
04/14/2016 Related Work Presentations:
Travis Desell - Example
Marshall Mattingly - TBD
04/19/2016 Related Work Presentations:
AbdelRahman ElSaid - A recurrent neural network implementation using the graphics processing unit
Debesh Adhikari - A Parallel Algorithm for UAV Flight Route Planning on GPU
04/21/2016 Related Work presentations:
Mohammed Mahmoud - TBD
Steven Buettner - TBD
04/26/2016 Related Work Presentations:
Fatima El Jamiy - Parallelization of genetic algorithms using Hadoop Map/Reduce
Run Li - Parallelization of MRCI based on hole-particle symmetry
04/28/2016 No Class - Travel
05/03/2016 Exam Review
05/05/2016 Exam

Course Description

This course introduces the basics of cluster and supercomputing with MPI (the message passing interface) and the basics of GPU computing with CUDA and OpenCL. This course will also review current research being done in the areas of high performance distributed computing and GPU computing.


At the completion of this course, students should be proficient in GPU computing and programming in MPI; as well as knowledgeable of the current research being done in the area of high performance distributed computing and GPU computing. Main objectives include:

  1. Introduce super/cluster computing with MPI.
  2. Introduce GPU computing with CUDA.
  3. Examine a wide range of current topics in high performance distributed computing.
  4. Participate in a research project to gain expertise in HPDC.


Students will gain knowledge/understanding of the following:

  1. MPI and super/cluster computing.
  2. GPU computing with CUDA.
  3. Cluster and GPU architectures.
  4. Current research in high performance distributed computing.
  5. The process of performing research in a computer science topic and writing an academic research paper.
Students will acquire the ability to do the following:
  1. Create and debug programs in MPI and CUDA.
  2. Run applications on clusters and GPUs.
  3. Write a research paper using LaTeX.
  4. Present research topics in front of a group.




The following texts are suggested for this course:

  • An Introduction to Parallel Programming, Peter S. Pacheco.
  • Programming Massively Parallel Processors, David B. Kirk and Wen-mei W. Hwu.


The course grade will consist of one group research project, one research paper presentation, four programming assignments and one test. The grade will be calculated as follows:

  • 40% - Group Research Project:
    • 10% - Initial Proposal and Related Work Survey Paper
    • 10% - Group In-Class Presentation
    • 20% - Final Paper
  • 40% - Programming Assignments:
    • 10% - MPI Programming Assignment 1
    • 10% - MPI Programming Assignment 2
    • 10% - CUDA Programming Assignment 1
    • 10% - CUDA Programming Assignment 2
  • 10% - In Class Research Paper Presentation
  • 10% - In Class Participation

There will be the following grade distribution:

  • [90 - 100]: A
  • [80 - 89.9999...]: B
  • [70 - 79.9999...]: C
  • [65 - 69.9999...]: D

Academic Integrity

The development of the individual problem solving skills needed for computer programming is one of the major objectives of this course. Students are to work independently of each other in completing the programming assignments. Any exception to this rule will require documentation signed by me allowing the collaborative work.

If you need help, you are welcome to consult with your instructor, your teaching assistant, or the staff of the department's Instructional Help Desk in 109A Streibel Hall. A submission of source code that you did not develop or homework assignments that was not your individual writing will be treated as plagiarism. These assignments will receive zero points and you may be referred to the Associate Dean of Student Life as a case of Scholastic Dishonesty.


Class attendance and lab attendance are required. Any student missing more than 6 classes without a doctors excuse will fail the course. The classroom is the primary venue for course material, announcements, and other information relevant to the course. An on-line course management system may be used to make some information available to students, but this is intended to enhance, not replace, classroom interaction.

Homework and Lab Submission

Code for homeworks and labs must be commented and properly formatted (see different coding styles, I prefer 1TBS) or points will be taken away. The final homework submission must be submitted through Moodle. Each homework will list its grading criteria.

Late assignments will have their grades penalized by 15% the first day, and 30% the second day. No assignments will be accepted more than two days late. Homeworks are to be done individually (see Academic Integrity) and may involve a significant amount of programming, so start them early.

Lab Policies

All lab assignments must be completed by the end of the lab session. Any exception will require proper excuse with permission granted before the end of the lab session. Partial credit may be given to incomplete work.

Any issues related to the machines in the computer labs can be sent to cslabs@cs.und.edu.

Students with Disabilities

Upon request, the Computer Science Department will provide reasonable accommodations for students with disabilities as specified in the policies of the UND office of Disability Services for Students (DSS). You must contact your instructor to request and arrange accommodations.