Performs some triangular solves. For example:
julia> using TriangularSolve, LinearAlgebra, MKL;
julia> BLAS.set_num_threads(1)
julia> BLAS.get_config().loaded_libs
1-element Vector{LinearAlgebra.BLAS.LBTLibraryInfo}:
LBTLibraryInfo(libmkl_rt.so, ilp64)
julia> N = 100;
julia> A = rand(N,N); B = rand(N,N); C = similar(A);
julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 15.909 ΞΌs β¦ 41.524 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 17.916 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 17.751 ΞΌs Β± 697.786 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ β β ββ ββ βββ β
βββββββββββββββββββββββββββββββββββ
β
ββββββ
βββββββββββ
βββββββ β
15.9 ΞΌs Histogram: log(frequency) by time 19.9 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 17.578 ΞΌs β¦ 75.835 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 19.852 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 19.827 ΞΌs Β± 1.342 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ β ββ
ββββ β
β β β
βββββββββββββββββββββββββ
ββββββ
ββββββββββββββββββ
βββ
β
βββ
ββ
β
β
17.6 ΞΌs Histogram: log(frequency) by time 22.4 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 19.102 ΞΌs β¦ 69.966 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 21.561 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 21.565 ΞΌs Β± 890.952 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ ββ ββ βββ β
β
β
βββββββββββββ
βββββββββ
ββββ
ββββββββββββ
ββ
βββββββ
βββββ
ββββββββ β
19.1 ΞΌs Histogram: log(frequency) by time 23.4 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 19.082 ΞΌs β¦ 39.078 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 19.694 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 19.765 ΞΌs Β± 774.848 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β ββ β
ββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββ β
19.1 ΞΌs Histogram: frequency by time 22.1 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
Multithreaded benchmarks:
julia> BLAS.set_num_threads(min(Threads.nthreads(), TriangularSolve.VectorizationBase.num_cores()))
julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min β¦ max): 8.309 ΞΌs β¦ 24.357 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 8.769 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 8.812 ΞΌs Β± 382.702 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββββ
ββ
βββββββββββ
ββββββββββββββββββ
ββββββββββββββββββββββββββββββ β
8.31 ΞΌs Histogram: frequency by time 9.7 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 11.996 ΞΌs β¦ 151.147 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 14.163 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 14.281 ΞΌs Β± 2.372 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββββββ
ββ β ββββ
β
β
βββ β
βββββββββββββββββββββββββββββββββββββ
ββ
βββββββ
β
βββββββββββββ β
12 ΞΌs Histogram: frequency by time 17.3 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
Range (min β¦ max): 7.903 ΞΌs β¦ 22.442 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 9.871 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 9.789 ΞΌs Β± 864.957 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ ββ ββ
β
β ββ ββ βββ βββ
ββββββββ β β
βββ
βββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
7.9 ΞΌs Histogram: log(frequency) by time 11.8 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 13.507 ΞΌs β¦ 142.574 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 15.258 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 15.319 ΞΌs Β± 2.045 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ ββ βββ
β ββββ βββββ
βββ
ββββββββββ
ββββββ
βββββββββββββββββ
ββββββββββββββββββββββββ β
13.5 ΞΌs Histogram: frequency by time 18.5 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> versioninfo()
Julia Version 1.8.0-DEV.438
Commit 88a6376e99* (2021-08-28 11:03 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
JULIA_NUM_THREADS = 8
Single-threaded benchmarks on an M1 mac:
julia> N = 100;
julia> A = rand(N,N); B = rand(N,N); C = similar(A);
julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 21.416 ΞΌs β¦ 34.458 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 21.624 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 21.767 ΞΌs Β± 491.788 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
β βββ ββ β ββ ββ β βββ β
βββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββ β
21.4 ΞΌs Histogram: log(frequency) by time 23.2 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 39.124 ΞΌs β¦ 57.749 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 46.166 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 46.274 ΞΌs Β± 1.766 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββ
ββββ
ββββββ
βββββββββββββββββββββββ
β
ββ
βββββββββββββββββββββββββββ
ββ
β
βββ β
39.1 ΞΌs Histogram: frequency by time 50.2 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 48.291 ΞΌs β¦ 57.833 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 49.124 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 49.306 ΞΌs Β± 802.143 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββ
βββββββββββ
ββββββββββββββ βββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββ β
48.3 ΞΌs Histogram: log(frequency) by time 53 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 34.249 ΞΌs β¦ 40.208 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 34.375 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 34.748 ΞΌs Β± 774.675 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββββ
β ββββ
β
βββ βββ ββ β
ββββββββββββββββββββββββββββββββββ
ββ
β
βββββββ
ββββ
ββββ
ββββββββ β
34.2 ΞΌs Histogram: log(frequency) by time 37.1 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
Or
julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min β¦ max): 23.750 ΞΌs β¦ 30.541 ΞΌs β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 23.875 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 23.948 ΞΌs Β± 316.293 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
βββ β βββ β β β β β β β
β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ
ββββββ β
23.8 ΞΌs Histogram: log(frequency) by time 25 ΞΌs <
Memory estimate: 0 bytes, allocs estimate: 0.
For editing convenience (you can copy/paste the above into a REPL and it should automatically strip julia>
s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):
using TriangularSolve, LinearAlgebra, MKL;
BLAS.set_num_threads(Threads.nthreads())
BLAS.get_config().loaded_libs
N = 100;
A = rand(N,N); B = rand(N,N); C = similar(A);
@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false))
@benchmark ldiv!($C, LowerTriangular($B), $A)
BLAS.set_num_threads(TriangularSolve.VectorizationBase.num_cores())
@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
@benchmark ldiv!($C, LowerTriangular($B), $A)
versioninfo()
Currently, rdiv!
with UpperTriangular
and ldiv!
with LowerTriangulra
matrices are the only supported configurations.