You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 7, 2021. It is now read-only.
Simplify the _contract_scalar! logic by using LoopVectorization.jl instead of BLAS calls. I believe in general LoopVectorization.jl is as fast as MKL for any BLAS level 1 and 2 operations (and even BLAS level 3 for smaller matrices), and can be used very simply through the broadcast syntax like @avx Rᵈ .= α .* Tᵈ .+ Rᵈ instead of the ugly syntax BLAS.axpy!(α, Tᵈ, Rᵈ).
This should simplify that logic a lot, and allow us to merge the logic of _contract_scalar! and permutedims!!, which could also use LoopVectorization.jl for the trivial permutation branch. In principle, _contract_scalar! should be able to just call permutedims!!.
EDIT: Note that a main disadvantage of changing from BLAS calls to @avx calls is that they use different kinds of multithreading, so to get both matrix multiplications (which of course right now use BLAS.gemm!) and vectorized operations to use multithreading, a user would need to have both BLAS and Julia multithreading enabled. I have found that even if they are not nested, they can still cause problems with each other, so that needs to be investigated. Looking forward to a world where this is all in Julia!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Simplify the _contract_scalar! logic by using LoopVectorization.jl instead of BLAS calls. I believe in general
LoopVectorization.jl
is as fast as MKL for any BLAS level 1 and 2 operations (and even BLAS level 3 for smaller matrices), and can be used very simply through the broadcast syntax like@avx Rᵈ .= α .* Tᵈ .+ Rᵈ
instead of the ugly syntaxBLAS.axpy!(α, Tᵈ, Rᵈ)
.This should simplify that logic a lot, and allow us to merge the logic of
_contract_scalar!
and permutedims!!, which could also useLoopVectorization.jl
for the trivial permutation branch. In principle,_contract_scalar!
should be able to just callpermutedims!!
.EDIT: Note that a main disadvantage of changing from
BLAS
calls to@avx
calls is that they use different kinds of multithreading, so to get both matrix multiplications (which of course right now useBLAS.gemm!
) and vectorized operations to use multithreading, a user would need to have both BLAS and Julia multithreading enabled. I have found that even if they are not nested, they can still cause problems with each other, so that needs to be investigated. Looking forward to a world where this is all in Julia!The text was updated successfully, but these errors were encountered: