Skip to content

dmitrypek/tutorial-multi-gpu

This branch is 190 commits behind FZJ-JSC/tutorial-multi-gpu:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Nov 14, 2021
7b90529 · Nov 14, 2021

History

93 Commits
Nov 14, 2021
Nov 14, 2021
Nov 14, 2021
Nov 14, 2021
Nov 10, 2021
Nov 14, 2021
Nov 14, 2021
Nov 10, 2021
Nov 14, 2021
Nov 11, 2021
Nov 14, 2021
Nov 11, 2021
Nov 14, 2021
Oct 29, 2021
Nov 14, 2021
Oct 29, 2021
Sep 23, 2021
Nov 13, 2021

Repository files navigation

SC21 Tutorial: Efficient Distributed GPU Programming for Exascale

Coordinates

  • Date: 14 November 2021
  • Occasion: SC21 Tutorial
  • Tutors: Simon Garcia (BSC), Andreas Herten (JSC), Markus Hrywniak (NVIDIA), Jiri Kraus (NVIDIA), Lena Oden (Uni Hagen)

Setup

The tutorial is an interactive tutorial with introducing lectures and practical exercises to apply knowledge. The exercises have been derived from the Jacobi solver implementations available in NVIDIA/multi-gpu-programming-models.

Curriculum:

  1. Lecture: Tutorial Overview, Introduction to System + Onboarding Andreas
  2. Lecture: MPI-Distributed Computing with GPUs Lena
  3. Hands-on: Multi-GPU Parallelization
  4. Lecture: Performance / Debugging Tools Markus
  5. Lecture: Optimization Techniques for Multi-GPU Applications Jiri
  6. Hands-on: Overlap Communication and Computation with MPI
  7. Lecture: Overview of NCCL and NVSHMEN in MPI Lena
  8. Hands-on: Using NCCL and NVSHMEM
  9. Lecture: Device-initiated Communication with NVSHMEM Jiri
  10. Hands-on: Using Device-Initiated Communication with NVSHMEM
  11. Lecture: Conclusion and Outline of Advanced Topics Andreas

Onboarding

The supercomputer used for the exercises is JUWELS Booster, a system located a Jülich Supercomputing Centre (Germany) with about 3700 NVIDIA A100 GPUs.

Visual onboarding instructions can be found in the subfolder of the according lecture, 01b-H-Onboarding/. Here follows the textual description:

  • Register for an account at JuDoor
  • Sign-up for the training2125 project
  • Accept the Usage Agreement of JUWELS
  • Wait for wheels to turn as your information is pushed through the systems (about 15 minutes)
  • Access JUWELS Booster via JSC's Jupyter portal
  • Create a Jupyter v2 instance using LoginNodeBooster and the training2125 allocation on JUWELS
  • When started, launch a browser-based Shell in Jupyter
  • Source the course environment to introduce commands and helper script to environment
    source $PROJECT_training2125/env.sh
    
  • Sync course material to your home directory with jsc-material-sync.

You can also access JSC's facilities via SSH. In that case you need to add your SSH key through JuDoor. You need to restrict access from certain IPs/IP ranges via the from clause, as explained in the documentation. We recommend using Jupyter JSC for its simplicity, especially during such a short day that is the tutorial day.

About

SC21 Tutorial on Multi GPU Usage at Exascale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 52.9%
  • C++ 27.0%
  • Jupyter Notebook 12.3%
  • Makefile 6.8%
  • Shell 1.0%