Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[OpenBLAS] update multithreading cutoff (#7189)
* update multithreading cutoff 400 is a much better cutoff than 50 more most modern machines. Note that 100 is way too small even for modern 4 core machines (I think the 50 limit was found pre AVX2 (and possibly pre fma)). 400 is probably a bit larger than optimal on small machines but only gives up ~13% performance single core compared to 8 core (and on laptops it will probably be better because the single core can turbo higher). It also mitigates the horrible performance cliff of using 16 or more threads on medium sized matrices (between roughly 400 and 1600). Of course the better answer would be to make it so BLAS's threading is integrated with julia's (and we use an appropriate number of threads based on the matrix size), but for now this is a pretty noticeable improvement. ``` julia> BLAS.set_num_threads(32) julia> peakflops(400) 1.1644410982935661e10 julia> BLAS.set_num_threads(16) julia> peakflops(400) 1.5580026746524042e10 julia> BLAS.set_num_threads(8) julia> peakflops(400) 2.210268354206555e10 julia> BLAS.set_num_threads(4) julia> peakflops(400) 1.937951340161483e10 julia> BLAS.set_num_threads(1) julia> peakflops(400) 1.740427478902416e10 julia> BLAS.set_num_threads(32) julia> peakflops(100) 1.9949726688744364e9 julia> BLAS.set_num_threads(16) julia> peakflops(100) 2.9579541605843735e9 julia> BLAS.set_num_threads(8) julia> peakflops(100) 4.373630506947512e9 julia> BLAS.set_num_threads(4) julia> peakflops(100) 3.924300248211991e9 julia> BLAS.set_num_threads(1) julia> peakflops(100) 1.0693014253788e10 * rebuild * Update build_tarballs.jl
- Loading branch information