layout	title
page	Running LAMMPS on HPC Systems - Lesson Outline

How to use this outline

The following list of items is meant as a guide on what content should go where in this repo. This should work as a guide where you can contribute. If a bullet point is prefixed by a file name, this is the lesson where the listed content should go into. This document is meant as a concept map converted into a flow learning goals and questions.

Accelerating LAMMPS on a HPC

index.md: Prelude:Why should I take this course?
- Why should I bother about software performance?
- What can I expect to learn from this course?
01-why-bother-with-performance.md:: Brief notes on software performance
- What is software performance?
- Why is software performance important?
- How can performance be measured?
- What is meant by flops, walltime and CPU hours?
- What can affect performance?
02-benchmark-and-scaling.md: How do I benchmark software performance in HPC?: about benchmark and scaling
- What is benchmarking?
- What are the factors that can affect a benchmark?
  - Case study 1: A simple benchmarking example of LAMMPS in a HPC
  - Hands-on 1: Can you do it on your own?
- What is scaling?
- How do I perform scaling analysis?
- Quntifying speedup: t₁/t_p
- Am I wasting my resourse?
  - Case Study 2: Get scaling data for a LAMMPS run
  - Hands-on 2: Do a scaling analysis
03-acceleration.md: Can I accelerate performance?:brief discussion over various aspects of speeding up software performances
- Hardware acceleration and software acceleration
  - multi-core cpu
  - GPU
- Can I use specialised code to extract best of an available hardware?
  - Multi-threading via OpenMP: parallel processing in shared memory platform
    - Thread based parallelism
    - Important run-time environment variables
    - bottlenecks in an OpenMP applications
      - hyperthreading
      - cpu affinity
  - Multi-threading via CUDA: host-device relationship
    - bottlenecks in host-device architectures
- What if I need more workers than that available in a single node?
  - How using MPI we can achieve this?
  - What is the bottleneck here?
    - communication overhead
    - domain decomposition
- Is this possible to use optimized library/code to get acceleration?
  - Brief mention about various optimized libraries like MKL, FFTW
04-lammps-bottlenecks.md: Identifying bottlenecks in LAMMPS: learn to analyze timing data in LAMMPS
- Case study 3: Understand the task timing breakdown of LAMMPS output
- Hands-on 3: Understand the task timing breakdown of LAMMPS output of a different problem
05-accelerating-lammps.md: How can I accelerate LAMMPS performance?: various options to accelerate LAMMPS
- Knowing what hardwares LAMMPS can be used on
- How can I enable architecture support at runtime?
  - Accelerator packages in LAMMPS
    - What packages for which architecture?
      - OPT
      - USER-OMP
      - USER-INTEL
      - GPU
      - KOKKOS
- Why KOKKOS?
  - What is Kokkos?
  - Important features of LAMMPS Kokkos package
  - Fixes that support KOKKOS in LAMMPS
  - Package options
06-invoking-kokkos.md: How do I invoke KOKKOS in LAMMPS?: technical aspects to use KOKKOS with LAMMPS
- Transition from regular LAMMPS call to accelerated call
07-kokkos-openmp.md: Compare KOKKOS/OpenMP performance with regular LAMMPS/OpenMP performance: learn to use openmp with KOKKOS
- Case study 4: using OpenMP+KOKKOS for Skylake AVX-512 architecture
- Comparing LAMMPS performance between runs with and without KOKKOS
- Exercise 4: Similar study with slightly different problem
08-kokkos-gpu.md: Compare KOKKOS/GPU performance with regular LAMMPS/GPU performance: learn to use gpu with KOKKOS
- Case study 5: using OpenMP+KOKKOS for NVIDIA Tesla V100 architecture
- Comparing LAMMPS performance between runs with and without KOKKOS
- Exercise 5: Similar study with slightly different problem
09-limitations.md: What are the limitatations of different accelerator packages?: discuss the limitations of KOKKOS and other accelerator packages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!