layout | title |
---|---|
page |
Running LAMMPS on HPC Systems - Lesson Outline |
The following list of items is meant as a guide on what content should go where in this repo. This should work as a guide where you can contribute. If a bullet point is prefixed by a file name, this is the lesson where the listed content should go into. This document is meant as a concept map converted into a flow learning goals and questions.
-
index.md: Prelude:Why should I take this course?
- Why should I bother about software performance?
- What can I expect to learn from this course?
-
01-why-bother-with-performance.md:: Brief notes on software performance
- What is software performance?
- Why is software performance important?
- How can performance be measured?
- What is meant by flops, walltime and CPU hours?
- What can affect performance?
-
02-benchmark-and-scaling.md: How do I benchmark software performance in HPC?: about benchmark and scaling
- What is benchmarking?
- What are the factors that can affect a benchmark?
- Case study 1: A simple benchmarking example of LAMMPS in a HPC
- Hands-on 1: Can you do it on your own?
- What is scaling?
- How do I perform scaling analysis?
- Quntifying speedup: t1/tp
- Am I wasting my resourse?
- Case Study 2: Get scaling data for a LAMMPS run
- Hands-on 2: Do a scaling analysis
-
03-acceleration.md: Can I accelerate performance?:brief discussion over various aspects of speeding up software performances
-
Hardware acceleration and software acceleration
- multi-core cpu
- GPU
-
Can I use specialised code to extract best of an available hardware?
-
Multi-threading via OpenMP: parallel processing in shared memory platform
- Thread based parallelism
- Important run-time environment variables
- bottlenecks in an OpenMP applications
- hyperthreading
- cpu affinity
-
Multi-threading via CUDA: host-device relationship
- bottlenecks in host-device architectures
-
-
What if I need more workers than that available in a single node?
- How using MPI we can achieve this?
- What is the bottleneck here?
- communication overhead
- domain decomposition
-
Is this possible to use optimized library/code to get acceleration?
- Brief mention about various optimized libraries like MKL, FFTW
-
-
04-lammps-bottlenecks.md: Identifying bottlenecks in LAMMPS: learn to analyze timing data in LAMMPS
- Case study 3: Understand the task timing breakdown of LAMMPS output
- Hands-on 3: Understand the task timing breakdown of LAMMPS output of a different problem
-
05-accelerating-lammps.md: How can I accelerate LAMMPS performance?: various options to accelerate LAMMPS
-
Knowing what hardwares LAMMPS can be used on
-
How can I enable architecture support at runtime?
- Accelerator packages in LAMMPS
- What packages for which architecture?
- OPT
- USER-OMP
- USER-INTEL
- GPU
- KOKKOS
- What packages for which architecture?
- Accelerator packages in LAMMPS
-
Why KOKKOS?
- What is Kokkos?
- Important features of LAMMPS Kokkos package
- Fixes that support KOKKOS in LAMMPS
- Package options
-
-
06-invoking-kokkos.md: How do I invoke KOKKOS in LAMMPS?: technical aspects to use KOKKOS with LAMMPS
- Transition from regular LAMMPS call to accelerated call
-
07-kokkos-openmp.md: Compare KOKKOS/OpenMP performance with regular LAMMPS/OpenMP performance: learn to use openmp with KOKKOS
- Case study 4: using OpenMP+KOKKOS for Skylake AVX-512 architecture
- Comparing LAMMPS performance between runs with and without KOKKOS
- Exercise 4: Similar study with slightly different problem
-
08-kokkos-gpu.md: Compare KOKKOS/GPU performance with regular LAMMPS/GPU performance: learn to use gpu with KOKKOS
- Case study 5: using OpenMP+KOKKOS for NVIDIA Tesla V100 architecture
- Comparing LAMMPS performance between runs with and without KOKKOS
- Exercise 5: Similar study with slightly different problem
-
09-limitations.md: What are the limitatations of different accelerator packages?: discuss the limitations of KOKKOS and other accelerator packages