Skip to content
This repository was archived by the owner on Dec 7, 2021. It is now read-only.

Simplify _contract_scalar! logic by using LoopVectorization.jl #83

Open
mtfishman opened this issue Apr 30, 2021 · 0 comments
Open

Simplify _contract_scalar! logic by using LoopVectorization.jl #83

mtfishman opened this issue Apr 30, 2021 · 0 comments

Comments

@mtfishman
Copy link
Member

mtfishman commented Apr 30, 2021

Simplify the _contract_scalar! logic by using LoopVectorization.jl instead of BLAS calls. I believe in general LoopVectorization.jl is as fast as MKL for any BLAS level 1 and 2 operations (and even BLAS level 3 for smaller matrices), and can be used very simply through the broadcast syntax like @avx Rᵈ .= α .* Tᵈ .+ Rᵈ instead of the ugly syntax BLAS.axpy!(α, Tᵈ, Rᵈ).

This should simplify that logic a lot, and allow us to merge the logic of _contract_scalar! and permutedims!!, which could also use LoopVectorization.jl for the trivial permutation branch. In principle, _contract_scalar! should be able to just call permutedims!!.

EDIT: Note that a main disadvantage of changing from BLAS calls to @avx calls is that they use different kinds of multithreading, so to get both matrix multiplications (which of course right now use BLAS.gemm!) and vectorized operations to use multithreading, a user would need to have both BLAS and Julia multithreading enabled. I have found that even if they are not nested, they can still cause problems with each other, so that needs to be investigated. Looking forward to a world where this is all in Julia!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant