INSPECT/reproducibility.md at master · RRZE-HPC/INSPECT · GitHub

Why is benchmarking complicated?

anything has to be documented
Reduce influences
anything has to be controlled
How to make your benchmark reproducable?

Benchmark Impact Factors

Hardware

CPU: type, name, model, frequencies, CoD/SNCm...
Memory
Vendors
IO subsystem
pinning

Software

OS
- relevant OS settings (numa_balancing)
- Environment Variables
Compiler
Version with all options specified
Libraries
Version / DL source (original, patched)
Bios settings

How to obtain Information

likwid-topology
likwid-powermeter
likwid-setFrequencies

Benchmark preparation

Reliable timer/timing granularity
if possible: Establish a (basic) performance model (roofline, ECM, ...)
get some reference numbers to decide if your results are reasonable
- micro benchmarks: likwid-bench
- documentation of the hardware vendor

Good Performance Metrics

simple performance metric: time to solution, 1/walltime

Let's do it!

...

Why does my runtime vary?

no/wrong task placement (-> pinning)
- eliminate performance variation
- making use of architectural features
- avoid resource contention
- likwid-pin, numactl, sched.h, taskset, OPENMP/MPI-specific settings
Did you set the correct thread count?
Too short runtime
- depends on the workingset size
- should be atleast a second
- timer granularity problems
- too few repetition