more efficient getindex methods for unit triangular types #14493

Sacha0 · 2015-12-27T23:04:42Z

This pull request provides behaviorally-identical but more efficient getindex methods for Unit(Lower|Upper)Triangular types. See JuliaLang/LinearAlgebra.jl#294 for details. I've benchmarked these methods on Intel Ivy Bridge and AMD Piledriver machines; benchmarking on additional architectures might be wise. The benchmarks below, a subset of those in JuliaLang/LinearAlgebra.jl#294,

import Base.LinAlg.UnitLowerTriangular
import Base.LinAlg.UnitUpperTriangular

# Define this PR's getindex methods
altgetindex{T,S}(A::UnitLowerTriangular{T,S}, i::Integer, j::Integer) = i > j ? A.data[i,j] : ifelse(i == j, one(T), zero(A.data[j,i]))
altgetindex{T,S}(A::UnitUpperTriangular{T,S}, i::Integer, j::Integer) = i < j ? A.data[i,j] : ifelse(i == j, one(T), zero(A.data[j,i]))

# Benchmark functions using master getindex
function master_sumlt(A::UnitLowerTriangular, n, s)
    @inbounds for j in 1:n, i in j:n
        s += getindex(A, i, j)
    end
    s
end
function master_sumall(A::UnitLowerTriangular, n, s)
    @inbounds for j in 1:n, i in 1:n
        s += getindex(A, i, j)
    end
    s
end
# Benchmark functions using this PR's getindex
function alt_sumlt(A::UnitLowerTriangular, n, s)
    @inbounds for j in 1:n, i in j:n
        s += altgetindex(A, i, j)
    end
    s
end
function alt_sumall(A::UnitLowerTriangular, n, s)
    @inbounds for j in 1:n, i in 1:n
        s += altgetindex(A, i, j)
    end
    s
end

# Benchmarks proper
using Benchmarks

function prettytimes(res)
    # based on Benchmarks.pretty_time_string
    stats = Benchmarks.SummaryStatistics(res)
    timecenter = stats.elapsed_time_center
    timelower = get(stats.elapsed_time_lower)
    timeupper = get(stats.elapsed_time_upper)
    timecenter < 1_000.0 ? (scalefactor = 1.0; units = "ns") :
        timecenter < 1_000_000.0 ? (scalefactor = 1_000.0; units = "μs") :
            timecenter < 1_000_000_000.0 ? (scalefactor = 1_000_000.0; units = "ms") :
                (scalefactor = 1_000_000_000.0; units = " s")
    @sprintf("%6.2f %s [%6.2f,%6.2f]", timecenter/scalefactor, units, timelower/scalefactor, timeupper/scalefactor)
end

matsize = 1000;
ultdensemat = UnitLowerTriangular(rand(matsize, matsize));
ultsparsemat = UnitLowerTriangular(sprand(matsize, matsize, 0.05));

wm, wt = 10, 25
println(" $(lpad("[branch]", wm)) $(lpad("sumlt/dense", wt)), $(lpad("sumall/dense", wt)) | $(lpad("sumlt/sparse", wt)), $(lpad("sumall/sparse", wt))")
@printf("%s: %s, %s | %s, %s\n",
    lpad("master", wm),
    prettytimes(@benchmark master_sumlt(ultdensemat, matsize, 0.0)),
    prettytimes(@benchmark master_sumall(ultdensemat, matsize, 0.0)),
    prettytimes(@benchmark master_sumlt(ultsparsemat, matsize, 0.0)),
    prettytimes(@benchmark master_sumall(ultsparsemat, matsize, 0.0)) )
@printf("%s: %s, %s | %s, %s\n",
    lpad("this PR", wm),
    prettytimes(@benchmark alt_sumlt(ultdensemat, matsize, 0.0)),
    prettytimes(@benchmark alt_sumall(ultdensemat, matsize, 0.0)),
    prettytimes(@benchmark alt_sumlt(ultsparsemat, matsize, 0.0)),
    prettytimes(@benchmark alt_sumall(ultsparsemat, matsize, 0.0)) )

yield

[branch]               sumlt/dense,              sumall/dense |              sumlt/sparse,             sumall/sparse
 master:   2.58 ms [  2.33,  2.83],   5.24 ms [  4.86,  5.61] |  12.29 ms [ 11.55, 13.04],  31.49 ms [ 30.51, 32.47]
this PR: 739.79 μs [699.44,780.14],   1.39 ms [  1.30,  1.48] |  10.64 ms [  9.99, 11.28],  28.13 ms [ 26.94, 29.33]

on the Ivy Bridge machine, and

[branch]               sumlt/dense,              sumall/dense |              sumlt/sparse,             sumall/sparse
 master:   4.17 ms [  4.13,  4.22],   8.28 ms [  8.19,  8.36] |  15.84 ms [ 15.74, 15.93],  38.27 ms [ 38.03, 38.52]
this PR:   1.13 ms [  1.10,  1.16],   2.15 ms [  2.10,  2.20] |  13.00 ms [ 12.91, 13.08],  33.55 ms [ 33.41, 33.68]

on the Piledriver machine. Best! (Edit: xor -> sum)

…ower|Upper)Triangular types. See #14490.

tkelman · 2015-12-28T07:58:28Z

The error on appveyor looks real, an UndefRefError in linalg/triangular? https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.12133/job/v9kpyew4v3g1fmr0

andreasnoack · 2015-12-28T10:44:29Z

Great. Thanks for the detailed timings. My rule of thumb is that a memory load is more expensive than the branch, but I wasn't aware that the memory load in zero(A.data[i,j]) was optimized away.

I'm not sure what is causing the error on AppVeyor.

Sacha0 · 2015-12-28T18:38:41Z

I wasn't aware that the memory load in zero(A.data[i,j]) was optimized away.

That optimization surprised me --- pretty clever!

I'm not sure what is causing the error on AppVeyor.

Neither I. Everything passes locally and the offending code (b1a2d54/test/linalg/triangular.jl#L155) looks innocuous. Is debug-play in the build environment possible, without repeatedly pushing new branches? Does MPFR behave differently on Windows?

tkelman · 2015-12-28T18:44:05Z

There's a way you can access build workers remotely over rdp, see http://www.appveyor.com/docs/how-to/rdp-to-build-worker - and everything should run equivalently if you turn appveyor on for your fork, may have to temporarily add branch names to the whitelist in appveyor.yml if you do it that way though.

Does MPFR behave differently on Windows?

Possibly. C long is 32 bits on Win64, unlike most other 64 bit systems.

Also note your links to specific lines of code aren't coming out quite right - if you paste the entire url, as in

julia/test/linalg/triangular.jl

Line 155 in b1a2d54

@test B == A1

, ~~github will hide the https://github.com/JuliaLang/julia/blob/ part automatically.~~ apparently not? strange

Sacha0 · 2015-12-28T20:54:44Z

There's a way you can access build workers remotely over rdp, see http://www.appveyor.com/docs/how-to/rdp-to-build-worker - and everything should run equivalently if you turn appveyor on for your fork, may have to temporarily add branch names to the whitelist in appveyor.yml if you do it that way though.

Great, thanks! I'll check that out if the potential issue with this PR described in JuliaLang/LinearAlgebra.jl#294 gets resolved.

more efficient getindex methods for unit triangular types

Behaviorally-identical but more efficient getindex methods for Unit(L…

e71538a

…ower|Upper)Triangular types. See #14490.

kshyatt added performance Must go faster linear algebra Linear algebra labels Dec 27, 2015

andreasnoack mentioned this pull request Dec 28, 2015

more efficient and generic indexing through triangular types? JuliaLang/LinearAlgebra.jl#294

Open

Sacha0 mentioned this pull request Jan 27, 2016

fix promotion in bidiagonal backslash, At_ldiv_B, and Ac_ldiv_B. add missing bidiagonal matrix conversions. #14506

Merged

andreasnoack added a commit that referenced this pull request Mar 7, 2016

Merge pull request #14493 from Sacha0/utgetindex

ffa7e7b

more efficient getindex methods for unit triangular types

andreasnoack merged commit ffa7e7b into JuliaLang:master Mar 7, 2016

vtjnash mentioned this pull request Mar 14, 2016

PSA: don't use ifelse unless you are careful that you have a leaf bitstype #15494

Closed

Sacha0 deleted the utgetindex branch June 14, 2016 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more efficient getindex methods for unit triangular types #14493

more efficient getindex methods for unit triangular types #14493

Sacha0 commented Dec 27, 2015

tkelman commented Dec 28, 2015

andreasnoack commented Dec 28, 2015

Sacha0 commented Dec 28, 2015

tkelman commented Dec 28, 2015

Sacha0 commented Dec 28, 2015

more efficient getindex methods for unit triangular types #14493

more efficient getindex methods for unit triangular types #14493

Conversation

Sacha0 commented Dec 27, 2015

tkelman commented Dec 28, 2015

andreasnoack commented Dec 28, 2015

Sacha0 commented Dec 28, 2015

tkelman commented Dec 28, 2015

Sacha0 commented Dec 28, 2015