Simplify `_contract_scalar!` logic by using `LoopVectorization.jl` #83

mtfishman · 2021-04-30T14:42:24Z

Simplify the _contract_scalar! logic by using LoopVectorization.jl instead of BLAS calls. I believe in general LoopVectorization.jl is as fast as MKL for any BLAS level 1 and 2 operations (and even BLAS level 3 for smaller matrices), and can be used very simply through the broadcast syntax like @avx Rᵈ .= α .* Tᵈ .+ Rᵈ instead of the ugly syntax BLAS.axpy!(α, Tᵈ, Rᵈ).

This should simplify that logic a lot, and allow us to merge the logic of _contract_scalar! and permutedims!!, which could also use LoopVectorization.jl for the trivial permutation branch. In principle, _contract_scalar! should be able to just call permutedims!!.

EDIT: Note that a main disadvantage of changing from BLAS calls to @avx calls is that they use different kinds of multithreading, so to get both matrix multiplications (which of course right now use BLAS.gemm!) and vectorized operations to use multithreading, a user would need to have both BLAS and Julia multithreading enabled. I have found that even if they are not nested, they can still cause problems with each other, so that needs to be investigated. Looking forward to a world where this is all in Julia!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify `_contract_scalar!` logic by using `LoopVectorization.jl` #83

Simplify `_contract_scalar!` logic by using `LoopVectorization.jl` #83

mtfishman commented Apr 30, 2021 •

edited

Loading

Simplify _contract_scalar! logic by using LoopVectorization.jl #83

Simplify _contract_scalar! logic by using LoopVectorization.jl #83

Comments

mtfishman commented Apr 30, 2021 • edited Loading

Simplify `_contract_scalar!` logic by using `LoopVectorization.jl` #83

Simplify `_contract_scalar!` logic by using `LoopVectorization.jl` #83

mtfishman commented Apr 30, 2021 •

edited

Loading