Skip to content

A benchmark comparison of different BLAS backends for NumPy.

License

Notifications You must be signed in to change notification settings

kostrykin/blas-benchmark

Repository files navigation

blas-benchmark

Tasks are defined in tasks/*.py. The Conda environments, which specify the different BLAS configurations, are defined in results/*/environment.yml. Python versions and numbers of threads are defined in profiles.yml.

To determine the runtime of a task, each task is repeated for at least 10 seconds, and the average is determined. The repetition and averaging procedure is repeated 3 times, and the best result is used.

Main results:

The configurations mkl2020.0_debug and mkl2020.1_fakeintel perform overall best:

  AMD Ryzen Threadripper 3970X AMD EPYC 7763
2 threads 16 threads 2 threads
openblas 1.003829 1.019895 1.016262
mkl2024.0 1.128423 1.213864 1.065984
mkl2020.0_debug 1.156737 1.261223 1.162273
mkl2020.1_fakeintel 1.144065 1.281782 1.164156

The score of a configuration is the geometric mean of the best possible speed-up in comparison to the other configurations. See reports/*.ipynb for details.

Benchmark CLI:

Run the benchmark on your CPU:

python -m benchmark.cli --profiles py38_2threads py38_16threads --run

Or only update the reports:

python -m benchmark.cli

Acknowledgements:

About

A benchmark comparison of different BLAS backends for NumPy.

Topics

Resources

License

Stars

Watchers

Forks