Non inlining of function arguments causing slow creation of sparse matrices from IJV format #10694

KristofferC · 2015-03-31T16:39:19Z

The default functions being passed into the sparse matrix constructors from IJV vectors, for example:

sparse(I,J,V::AbstractVector,m,n) = sparse(I, J, V, Int(m), Int(n), +)

are not inlined, which means that significant time is spent to do type inference when the combine function is called many times.

In my code, for a medium sized FEM problem, the difference between inlining the plus or not is about 4x:

julia> @time sparse_inline(dof_rows, dof_cols, k_values, fp.n_eqs, fp.n_eqs)
elapsed time: 0.096662539 seconds (61 MB allocated, 3.20% gc time in 2 pauses with 0 full sweep)
80400x80400 sparse matrix with 1119192 Float64 entries:
.
.

julia> @time Base.sparse(dof_rows, dof_cols, k_values, fp.n_eqs, fp.n_eqs)
elapsed time: 0.393187843 seconds (257 MB allocated, 27.98% gc time in 11 pauses with 1 full sweep)
80400x80400 sparse matrix with 1119192 Float64 entries:
.
.

Since plus is the most common, maybe a special version should be made for that.

I realize that inlining of function arguments is coming so then this will be fixed by itself, however, if this it far away, then it might be worth looking at this.

JeffBezanson · 2015-03-31T16:44:39Z

Yes, the default should probably be special-cased. My sense is that wanting a function other than + is quite rare; is that right?

KristofferC · 2015-03-31T16:49:05Z

I can't answer how common it is to use other functions than + but for assembling the stiffness matrix in finite elements you want to use + so it would be nice to have a fast version for that.

simonster · 2015-03-31T16:50:09Z

Maybe just replace + with AddFun() (and modify the method signatures)?

andreasnoack · 2015-03-31T17:44:11Z

Or make + a type and overload call(::Type{+},x,y).

mauro3 · 2015-04-02T10:29:50Z

Matlab only implements + for (fast) accumulation, so presumably that is what is used the most by far. Certainly the case for me.

This fixes JuliaLang#10694

mlubin · 2015-04-02T17:55:15Z

+ is certainly the most common. I've also used funky combine functions like vcat, but in that case there's a diminished expectation of performance.

ViralBShah · 2015-04-02T19:09:26Z

+ is the certainly most common. So far the performance was ok and nobody complained, and I didn't want to have a totally different implementation just for +. The functor approach is perfect.

I often use other combine functions when using sparse matrices as graph data structures and doing graph algorithms. Often, just overwriting rather than combining is useful to have. Matlab created accumarray as a generalization, but it doesn't work for sparse, even though it is incredibly useful.

jiahao · 2015-04-02T19:10:54Z

I think sparse accumarray exists now.

ViralBShah · 2015-04-02T19:11:39Z

Perhaps it was not there 5 years back (or maybe it was), when I used matlab a lot. We've come a long way. :-)

ViralBShah · 2015-04-02T19:12:10Z

@KristofferC Can you try and see how the latest master performs now? I merged @mauro3 's PR.

ViralBShah · 2015-04-04T18:38:50Z

BTW, I didn't mean to close this issue.

@KristofferC Is your code that generates the I, J, and V self-contained? It would be great if you can share it in that case. It would make a nice perf test.

KristofferC · 2015-04-04T20:48:52Z

It is not self contained but it shouldnt be too hard to generate a small benchmark that has the same characteristics as te one I get in my FE code. I am currently in a place where I am unable to get the latest master but on Monday I should be able to test the functor commit and write the benchmark.

pao · 2015-04-04T21:01:50Z

I feel like we've piled a lot of requests on @KristofferC...you're going to have a busy Monday at this rate!

KristofferC · 2015-04-07T10:05:45Z

This is an example which shows the difference between the current functor method and the previous that only used function argument:

function IJV_bench(sparsity::FloatingPoint, n::Int, accums::Int)
    nzs_1 = repmat(rand(1:n, Int(sparsity * n^2)), accums)
    nzs_2 = repmat(rand(1:n, Int(sparsity * n^2)), accums)
    vals = rand(Float64, length(nzs_1))
    @time Base.sparse(nzs_1, nzs_2, vals, n, n, +)
    @time Base.sparse(nzs_1, nzs_2, vals, n, n)
    return
end

sparsity = 0.0001
n = 10^5
accums = 20
IJV_bench(sparsity, n, accums)

accums says how many times the combine function will run for each non zero index pair. Increasing this value naturally gives a larger difference between the two methods.

I put accums unnaturally high right now, to show the difference more clearly. For example, in finite elements, for a structured 3d grid of hexahedrons accums would be 8 since each node is then surrounded by 8 elements.

mauro3 · 2015-04-07T13:08:00Z

Here the result of the benchmark:

julia> IJV_bench(sparsity, n, accums)
elapsed time: 2.79138178 seconds (2643 MB allocated, 5.64% gc time in 98 pauses with 4 full sweep)
elapsed time: 0.634323741 seconds (324 MB allocated, 0.16% gc time in 2 pauses with 0 full sweep)

and with accums=8

julia> IJV_bench(sparsity, n, accums)
elapsed time: 1.122919613 seconds (995 MB allocated, 7.65% gc time in 41 pauses with 3 full sweep)
elapsed time: 0.279827421 seconds (141 MB allocated, 0.29% gc time in 2 pauses with 0 full sweep)

both a bit more than 4x faster.

KristofferC · 2015-04-09T11:54:39Z

Is there anything more to do here? Else, maybe close it.

JeffBezanson added performance Must go faster sparse Sparse arrays labels Mar 31, 2015

mauro3 added a commit to mauro3/julia that referenced this issue Apr 2, 2015

Make sparse constructor use functor as combine fn

f1acd2b

This fixes JuliaLang#10694

mauro3 mentioned this issue Apr 2, 2015

Make sparse constructor use functor as combine fn #10719

Merged

ViralBShah closed this as completed in #10719 Apr 2, 2015

ViralBShah reopened this Apr 4, 2015

simonster closed this as completed Apr 9, 2015

KristofferC mentioned this issue Jun 12, 2015

Performance improvements for sparse triu!, tril!, dropzeros!, droptol! #11685

Merged

KristofferC mentioned this issue Mar 30, 2016

Remove use of functors for sparse matrices and vectors #15696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non inlining of function arguments causing slow creation of sparse matrices from IJV format #10694

Non inlining of function arguments causing slow creation of sparse matrices from IJV format #10694

KristofferC commented Mar 31, 2015

JeffBezanson commented Mar 31, 2015

KristofferC commented Mar 31, 2015

simonster commented Mar 31, 2015

andreasnoack commented Mar 31, 2015

mauro3 commented Apr 2, 2015

mlubin commented Apr 2, 2015

ViralBShah commented Apr 2, 2015

jiahao commented Apr 2, 2015

ViralBShah commented Apr 2, 2015

ViralBShah commented Apr 2, 2015

ViralBShah commented Apr 4, 2015

KristofferC commented Apr 4, 2015

pao commented Apr 4, 2015

KristofferC commented Apr 7, 2015

mauro3 commented Apr 7, 2015

KristofferC commented Apr 9, 2015

Non inlining of function arguments causing slow creation of sparse matrices from IJV format #10694

Non inlining of function arguments causing slow creation of sparse matrices from IJV format #10694

Comments

KristofferC commented Mar 31, 2015

JeffBezanson commented Mar 31, 2015

KristofferC commented Mar 31, 2015

simonster commented Mar 31, 2015

andreasnoack commented Mar 31, 2015

mauro3 commented Apr 2, 2015

mlubin commented Apr 2, 2015

ViralBShah commented Apr 2, 2015

jiahao commented Apr 2, 2015

ViralBShah commented Apr 2, 2015

ViralBShah commented Apr 2, 2015

ViralBShah commented Apr 4, 2015

KristofferC commented Apr 4, 2015

pao commented Apr 4, 2015

KristofferC commented Apr 7, 2015

mauro3 commented Apr 7, 2015

KristofferC commented Apr 9, 2015