Skip to content

Commit

Permalink
[OpenBLAS] update multithreading cutoff (#7189)
Browse files Browse the repository at this point in the history
* update multithreading cutoff

400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement.
```
julia> BLAS.set_num_threads(32)

julia> peakflops(400)
1.1644410982935661e10

julia> BLAS.set_num_threads(16)

julia> peakflops(400)
1.5580026746524042e10

julia> BLAS.set_num_threads(8)

julia> peakflops(400)
2.210268354206555e10

julia> BLAS.set_num_threads(4)

julia> peakflops(400)
1.937951340161483e10

julia> BLAS.set_num_threads(1)

julia> peakflops(400)
1.740427478902416e10

julia> BLAS.set_num_threads(32)

julia> peakflops(100)
1.9949726688744364e9

julia> BLAS.set_num_threads(16)

julia> peakflops(100)
2.9579541605843735e9

julia> BLAS.set_num_threads(8)

julia> peakflops(100)
4.373630506947512e9

julia> BLAS.set_num_threads(4)

julia> peakflops(100)
3.924300248211991e9

julia> BLAS.set_num_threads(1)

julia> peakflops(100)
1.0693014253788e10

* rebuild

* Update build_tarballs.jl
  • Loading branch information
oscardssmith authored Aug 8, 2023
1 parent 470613b commit b02a6e7
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 2 deletions.
1 change: 1 addition & 0 deletions O/OpenBLAS/OpenBLAS32@0.3.23/build_tarballs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ dependencies = openblas_dependencies(platforms)
# Build the tarballs
build_tarballs(ARGS, name, version, sources, script, platforms, products, dependencies;
preferred_gcc_version=v"6", lock_microarchitecture=false, julia_compat="1.10")

2 changes: 1 addition & 1 deletion O/OpenBLAS/OpenBLAS@0.3.23/build_tarballs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ dependencies = openblas_dependencies(platforms)
build_tarballs(ARGS, name, version, sources, script, platforms, products, dependencies;
preferred_gcc_version=v"6", lock_microarchitecture=false, julia_compat="1.10")

# Build trigger: 2
# Build trigger: 3
2 changes: 1 addition & 1 deletion O/OpenBLAS/common.jl
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ function openblas_script(;num_64bit_threads::Integer=32, openblas32::Bool=false,
fi
# We always want threading
flags=(USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=50 NO_AFFINITY=1)
flags=(USE_THREAD=1 GEMM_MULTITHREADING_THRESHOLD=400 NO_AFFINITY=1)
if [[ "${CONSISTENT_FPCSR}" == "true" ]]; then
flags+=(CONSISTENT_FPCSR=1)
fi
Expand Down

0 comments on commit b02a6e7

Please sign in to comment.