-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance issue with diagonal multiplication #44651
Conversation
|
The regression fix is good. But vectorization should not accelete matrix size (that much) used by test (about 10~20?) |
Are these test failures related?
|
Yes, and there are more. I think I'll need to write out the alpha, beta = 0 and != 0 cases manually. This will look scary, but I don't see another option. |
Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
@staticfloat After spending quite some time, I realized I don't really know what I'm after, actually. The |
For the record, this PR does improve the speed of EDIT: This PR julia> using LinearAlgebra
julia> A = rand(500, 500);
julia> C = copy(A);
julia> D = Diagonal(rand(500));
julia> using BenchmarkTools
julia> @btime rmul!($A, $D);
900.866 μs (0 allocations: 0 bytes)
julia> @btime lmul!($D, $A); # ca. 1 ms
1.053 ms (0 allocations: 0 bytes)
julia> @btime mul!($C, $A, $D);
1.085 ms (0 allocations: 0 bytes)
julia> @btime mul!($C, $D, $A);
1.094 ms (0 allocations: 0 bytes) One day old master julia> using LinearAlgebra
julia> A = rand(500, 500);
julia> C = copy(A);
julia> D = Diagonal(rand(500));
julia> using BenchmarkTools
julia> @btime rmul!($A, $D); # ca. 1 ms
2.874 ms (2 allocations: 96 bytes)
julia> @btime lmul!($D, $A); # ca. 1 ms
2.402 ms (0 allocations: 0 bytes)
julia> @btime mul!($C, $A, $D);
1.222 ms (2 allocations: 96 bytes)
julia> @btime mul!($C, $D, $A);
1.215 ms (0 allocations: 0 bytes) |
Test failures are unrelated. Let's go! |
(cherry picked from commit 03af781)
This an attempt to fix a major performance regression in diagonal multiplication, see #42321 (comment), following suggestions by @N5N3. @staticfloat may want to keep an eye on CI runtime here.
On my machine, in a complete
test-LinearAlgebra
run, I had a runtime of the Diagonal test only of nearly 1000 seconds (of course including compilation time). Locally, mymake test-LinearAlgebra
is currently broken, but when I ran the Diagonal test suite, it finished in less than 300 seconds. Not sure if this is fair comparison, but let's see what the online CI says.