Skip to content

rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)

License

Notifications You must be signed in to change notification settings

JuliaSIMD/TriangularSolve.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TriangularSolve

Stable Dev Build Status Coverage

Performs some triangular solves. For example:

julia> using TriangularSolve, LinearAlgebra, MKL;

julia> BLAS.set_num_threads(1)

julia> BLAS.get_config().loaded_libs
1-element Vector{LinearAlgebra.BLAS.LBTLibraryInfo}:
 LBTLibraryInfo(libmkl_rt.so, ilp64)

julia> N = 100;

julia> A = rand(N,N); B = rand(N,N); C = similar(A);

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  15.909 ΞΌs …  41.524 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     17.916 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   17.751 ΞΌs Β± 697.786 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  ▃▁    ▁    ▁     ▄▁    β–‡β–†    β–†β–ˆβ–ƒ                             β–‚
  β–ˆβ–ˆβ–ƒβ–β–β–ˆβ–ˆβ–β–β–β–β–ˆβ–†β–β–β–ƒβ–‡β–ˆβ–ˆβ–„β–ƒβ–β–ˆβ–ˆβ–ˆβ–†β–β–„β–„β–ˆβ–ˆβ–ˆβ–„β–„β–…β–…β–†β–‡β–ˆβ–‡β–„β–…β–†β–‡β–ˆβ–ˆβ–‡β–ˆβ–‡β–‡β–†β–„β–…β–„β–β–„β–β–„β–„β–‡ β–ˆ
  15.9 ΞΌs       Histogram: log(frequency) by time      19.9 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  17.578 ΞΌs … 75.835 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     19.852 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   19.827 ΞΌs Β±  1.342 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–„β–‚              β–‚    β–†β–…   β–β–ˆβ–‡β–‚   β–…β–ƒ     β–‚                   β–‚
  β–ˆβ–ˆβ–β–β–ƒβ–ˆβ–‡β–β–β–β–ˆβ–‡β–„β–„β–β–ˆβ–ˆβ–‡β–„β–„β–„β–ˆβ–ˆβ–†β–…β–„β–ˆβ–ˆβ–ˆβ–ˆβ–…β–„β–†β–ˆβ–ˆβ–†β–†β–†β–†β–‡β–ˆβ–ˆβ–‡β–‡β–†β–†β–‡β–†β–…β–†β–„β–…β–…β–†β–„β–…β–„β–…β–… β–ˆ
  17.6 ΞΌs      Histogram: log(frequency) by time      22.4 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  19.102 ΞΌs …  69.966 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     21.561 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   21.565 ΞΌs Β± 890.952 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–‚β–‚                 β–‚β–ƒ     β–„β–„     β–†β–ˆβ–„     β–…β–…                  β–‚
  β–ˆβ–ˆβ–ƒβ–β–β–β–‡β–ˆβ–β–β–β–β–…β–ˆβ–β–β–β–β–β–ˆβ–ˆβ–…β–β–β–β–…β–ˆβ–ˆβ–†β–β–β–β–†β–ˆβ–ˆβ–ˆβ–†β–…β–ƒβ–…β–ˆβ–ˆβ–ˆβ–ˆβ–ƒβ–„β–…β–ˆβ–ˆβ–‡β–‡β–…β–†β–†β–‡β–‡β–ˆβ–‡β–†β–† β–ˆ
  19.1 ΞΌs       Histogram: log(frequency) by time      23.4 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  19.082 ΞΌs …  39.078 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     19.694 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   19.765 ΞΌs Β± 774.848 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

    β–ƒ        β–„β–ˆ         ▁
  β–‚β–‡β–ˆβ–ˆβ–„β–‚β–β–β–‚β–‚β–ƒβ–ˆβ–ˆβ–ˆβ–ƒβ–‚β–β–‚β–β–‚β–‚β–…β–ˆβ–‡β–ƒβ–‚β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–‚β–‚β–‚ β–ƒ
  19.1 ΞΌs         Histogram: frequency by time         22.1 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Multithreaded benchmarks:

julia> BLAS.set_num_threads(min(Threads.nthreads(), TriangularSolve.VectorizationBase.num_cores()))

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  8.309 ΞΌs …  24.357 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     8.769 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   8.812 ΞΌs Β± 382.702 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

               β–β–ƒβ–„β–†β–†β–ˆβ–ˆβ–‡β–†β–…β–ƒβ–
  β–‚β–β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–„β–…β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–„
  8.31 ΞΌs         Histogram: frequency by time         9.7 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  11.996 ΞΌs … 151.147 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     14.163 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   14.281 ΞΌs Β±   2.372 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

          β–‚β–„β–‡β–ˆβ–ˆβ–ˆβ–‡β–†β–…β–ƒβ–‚ ▁   β–‚β–„β–„β–…β–…β–…β–†β–ƒβ–ƒ         ▁
  β–β–β–β–‚β–‚β–ƒβ–„β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–„β–…β–†β–‡β–ˆβ–ˆβ–ˆβ–†β–…β–…β–ƒβ–„β–‚β–‚β–‚β–β–β–β–β–β–β–β– β–…
  12 ΞΌs           Histogram: frequency by time         17.3 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  7.903 ΞΌs …  22.442 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     9.871 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   9.789 ΞΌs Β± 864.957 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–‚β–ƒ  β–„β–ƒ  β–ƒβ–…   β–…β–ƒ   β–†β–‚   β–†β–„   β–‚β–‡β–„   β–ƒβ–ˆβ–…β–‚β–‚β–β–β–„β–†β–ƒβ– ▁             β–‚
  β–ˆβ–ˆβ–…β–‚β–ˆβ–ˆβ–†β–…β–ˆβ–ˆβ–†β–†β–†β–ˆβ–ˆβ–‡β–‡β–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–ˆβ–‡β–‡β–†β–‡β–†β–…β–† β–ˆ
  7.9 ΞΌs       Histogram: log(frequency) by time      11.8 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  13.507 ΞΌs … 142.574 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     15.258 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   15.319 ΞΌs Β±   2.045 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

     ▁▃   ▁▂   ▁▃▅▁  ▁▄▄▁  β–‚β–†β–ˆβ–†β–ƒ
  β–β–‚β–…β–ˆβ–ˆβ–ˆβ–†β–‡β–ˆβ–ˆβ–ˆβ–†β–…β–ˆβ–ˆβ–ˆβ–ˆβ–†β–…β–ˆβ–ˆβ–ˆβ–ˆβ–†β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–„β–„β–†β–†β–…β–„β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β– β–„
  13.5 ΞΌs         Histogram: frequency by time         18.5 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.8.0-DEV.438
Commit 88a6376e99* (2021-08-28 11:03 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)
Environment:
  JULIA_NUM_THREADS = 8

Single-threaded benchmarks on an M1 mac:

julia> N = 100;

julia> A = rand(N,N); B = rand(N,N); C = similar(A);

julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  21.416 ΞΌs …  34.458 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     21.624 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   21.767 ΞΌs Β± 491.788 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

    β–ƒ β–†β–ˆβ–ˆ β–†β–„ ▁                 β–ƒβ–„ β–„β–‚          ▁            ▂▃▁ β–‚
  β–ƒβ–‡β–ˆβ–β–ˆβ–ˆβ–ˆβ–β–ˆβ–ˆβ–β–ˆβ–†β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–ˆβ–β–ˆβ–ˆβ–β–ˆβ–ˆβ–ˆβ–β–†β–ƒβ–β–β–†β–‡β–β–ˆβ–ˆβ–β–ˆβ–†β–…β–β–„β–ƒβ–β–ƒβ–ƒβ–‡β–β–ˆβ–ˆβ–ˆ β–ˆ
  21.4 ΞΌs       Histogram: log(frequency) by time      23.2 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  39.124 ΞΌs … 57.749 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     46.166 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   46.274 ΞΌs Β±  1.766 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

                              β–β–β–„β–‚β–†β–ƒβ–ˆβ–…β–‡β–„β–‡β–…β–ƒβ–ƒβ–β–ƒβ–β–‚               
  β–‚β–β–β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–…β–…β–†β–…β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–‡β–†β–†β–…β–†β–…β–…β–„β–ƒβ–ƒ β–…
  39.1 ΞΌs         Histogram: frequency by time        50.2 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  48.291 ΞΌs …  57.833 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     49.124 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   49.306 ΞΌs Β± 802.143 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

    β–β–ƒβ–…β–†β–‡β–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–‡β–†β–…β–„β–‚β–‚β–β–β–β–‚β–β–β–β–β–β–β– ▁▁▁                           β–ƒ
  β–ƒβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–„β–‚β–„β–ƒβ–‚β–ƒβ–ƒβ–„β–„β–ƒβ–†β–…β–‡β–‡β–‡β–ˆβ–ˆβ–‡β–ˆβ–‡β–‡ β–ˆ
  48.3 ΞΌs       Histogram: log(frequency) by time        53 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  34.249 ΞΌs …  40.208 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     34.375 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   34.748 ΞΌs Β± 774.675 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–†β–ˆβ–ˆβ–†β–ƒβ–„β–…β–ƒ                ▁▁▄▅▅▃▂▁                     β–‚β–ƒβ–‚  ▁▂ β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–β–β–ƒβ–β–β–β–β–β–ƒβ–„β–ƒβ–β–β–ƒβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–…β–„β–…β–…β–†β–„β–„β–„β–„β–„β–…β–„β–„β–ƒβ–…β–ƒβ–„β–ƒβ–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆ β–ˆ
  34.2 ΞΌs       Histogram: log(frequency) by time      37.1 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Or

julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  23.750 ΞΌs …  30.541 ΞΌs  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     23.875 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   23.948 ΞΌs Β± 316.293 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

   ▃▁▆ β–ˆ β–‡β–†β–† β–„ ▁                               ▁ ▁         ▁ ▁ β–‚
  β–…β–ˆβ–ˆβ–ˆβ–†β–ˆβ–β–ˆβ–ˆβ–ˆβ–„β–ˆβ–β–ˆβ–ˆβ–‡β–β–„β–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–ƒβ–β–β–β–ƒβ–β–β–β–β–β–†β–β–‡β–†β–ˆβ–β–ˆβ–β–‡β–†β–…β–β–…β–β–‡β–†β–ˆβ–β–ˆ β–ˆ
  23.8 ΞΌs       Histogram: log(frequency) by time        25 ΞΌs <

 Memory estimate: 0 bytes, allocs estimate: 0.

For editing convenience (you can copy/paste the above into a REPL and it should automatically strip julia> s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):

using TriangularSolve, LinearAlgebra, MKL;
BLAS.set_num_threads(Threads.nthreads())
BLAS.get_config().loaded_libs
N = 100;

A = rand(N,N); B = rand(N,N); C = similar(A);

@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))

@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false))
@benchmark ldiv!($C, LowerTriangular($B), $A)

BLAS.set_num_threads(TriangularSolve.VectorizationBase.num_cores())
@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B))
@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))

@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A)
@benchmark ldiv!($C, LowerTriangular($B), $A)

versioninfo()

Currently, rdiv! with UpperTriangular and ldiv! with LowerTriangulra matrices are the only supported configurations.

About

rdiv!(::AbstractMatrix, ::UpperTriangular) and ldiv!(::LowerTriangular, ::AbstractMatrix)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages