- anything has to be documented
- Reduce influences
- anything has to be controlled
- How to make your benchmark reproducable?
- CPU: type, name, model, frequencies, CoD/SNCm...
- Memory
- Vendors
- IO subsystem
- pinning
- OS
- relevant OS settings (
numa_balancing
) - Environment Variables
- relevant OS settings (
- Compiler
- Version with all options specified
- Libraries
- Version / DL source (original, patched)
- Bios settings
likwid-topology
likwid-powermeter
likwid-setFrequencies
- Reliable timer/timing granularity
- if possible: Establish a (basic) performance model (roofline, ECM, ...)
- get some reference numbers to decide if your results are reasonable
- micro benchmarks:
likwid-bench
- documentation of the hardware vendor
- micro benchmarks:
- simple performance metric: time to solution, 1/walltime
...
- no/wrong task placement (-> pinning)
- eliminate performance variation
- making use of architectural features
- avoid resource contention
likwid-pin
,numactl
,sched.h
,taskset
, OPENMP/MPI-specific settings
- Did you set the correct thread count?
- Too short runtime
- depends on the workingset size
- should be atleast a second
- timer granularity problems
- too few repetition