Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non inlining of function arguments causing slow creation of sparse matrices from IJV format #10694

Closed
KristofferC opened this issue Mar 31, 2015 · 16 comments · Fixed by #10719
Closed
Labels
performance Must go faster sparse Sparse arrays

Comments

@KristofferC
Copy link
Member

The default functions being passed into the sparse matrix constructors from IJV vectors, for example:

sparse(I,J,V::AbstractVector,m,n) = sparse(I, J, V, Int(m), Int(n), +)

are not inlined, which means that significant time is spent to do type inference when the combine function is called many times.

In my code, for a medium sized FEM problem, the difference between inlining the plus or not is about 4x:

julia> @time sparse_inline(dof_rows, dof_cols, k_values, fp.n_eqs, fp.n_eqs)
elapsed time: 0.096662539 seconds (61 MB allocated, 3.20% gc time in 2 pauses with 0 full sweep)
80400x80400 sparse matrix with 1119192 Float64 entries:
.
.

julia> @time Base.sparse(dof_rows, dof_cols, k_values, fp.n_eqs, fp.n_eqs)
elapsed time: 0.393187843 seconds (257 MB allocated, 27.98% gc time in 11 pauses with 1 full sweep)
80400x80400 sparse matrix with 1119192 Float64 entries:
.
.

Since plus is the most common, maybe a special version should be made for that.

I realize that inlining of function arguments is coming so then this will be fixed by itself, however, if this it far away, then it might be worth looking at this.

@JeffBezanson JeffBezanson added performance Must go faster sparse Sparse arrays labels Mar 31, 2015
@JeffBezanson
Copy link
Member

Yes, the default should probably be special-cased. My sense is that wanting a function other than + is quite rare; is that right?

@KristofferC
Copy link
Member Author

I can't answer how common it is to use other functions than + but for assembling the stiffness matrix in finite elements you want to use + so it would be nice to have a fast version for that.

@simonster
Copy link
Member

Maybe just replace + with AddFun() (and modify the method signatures)?

@andreasnoack
Copy link
Member

Or make + a type and overload call(::Type{+},x,y).

@mauro3
Copy link
Contributor

mauro3 commented Apr 2, 2015

Matlab only implements + for (fast) accumulation, so presumably that is what is used the most by far. Certainly the case for me.

@mlubin
Copy link
Member

mlubin commented Apr 2, 2015

+ is certainly the most common. I've also used funky combine functions like vcat, but in that case there's a diminished expectation of performance.

@ViralBShah
Copy link
Member

+ is the certainly most common. So far the performance was ok and nobody complained, and I didn't want to have a totally different implementation just for +. The functor approach is perfect.

I often use other combine functions when using sparse matrices as graph data structures and doing graph algorithms. Often, just overwriting rather than combining is useful to have. Matlab created accumarray as a generalization, but it doesn't work for sparse, even though it is incredibly useful.

@jiahao
Copy link
Member

jiahao commented Apr 2, 2015

I think sparse accumarray exists now.

@ViralBShah
Copy link
Member

Perhaps it was not there 5 years back (or maybe it was), when I used matlab a lot. We've come a long way. :-)

@ViralBShah
Copy link
Member

@KristofferC Can you try and see how the latest master performs now? I merged @mauro3 's PR.

@ViralBShah
Copy link
Member

BTW, I didn't mean to close this issue.

@KristofferC Is your code that generates the I, J, and V self-contained? It would be great if you can share it in that case. It would make a nice perf test.

@ViralBShah ViralBShah reopened this Apr 4, 2015
@KristofferC
Copy link
Member Author

It is not self contained but it shouldnt be too hard to generate a small benchmark that has the same characteristics as te one I get in my FE code. I am currently in a place where I am unable to get the latest master but on Monday I should be able to test the functor commit and write the benchmark.

@pao
Copy link
Member

pao commented Apr 4, 2015

I feel like we've piled a lot of requests on @KristofferC...you're going to have a busy Monday at this rate!

@KristofferC
Copy link
Member Author

This is an example which shows the difference between the current functor method and the previous that only used function argument:

function IJV_bench(sparsity::FloatingPoint, n::Int, accums::Int)
    nzs_1 = repmat(rand(1:n, Int(sparsity * n^2)), accums)
    nzs_2 = repmat(rand(1:n, Int(sparsity * n^2)), accums)
    vals = rand(Float64, length(nzs_1))
    @time Base.sparse(nzs_1, nzs_2, vals, n, n, +)
    @time Base.sparse(nzs_1, nzs_2, vals, n, n)
    return
end

sparsity = 0.0001
n = 10^5
accums = 20
IJV_bench(sparsity, n, accums)

accums says how many times the combine function will run for each non zero index pair. Increasing this value naturally gives a larger difference between the two methods.

I put accums unnaturally high right now, to show the difference more clearly. For example, in finite elements, for a structured 3d grid of hexahedrons accums would be 8 since each node is then surrounded by 8 elements.

@mauro3
Copy link
Contributor

mauro3 commented Apr 7, 2015

Here the result of the benchmark:

julia> IJV_bench(sparsity, n, accums)
elapsed time: 2.79138178 seconds (2643 MB allocated, 5.64% gc time in 98 pauses with 4 full sweep)
elapsed time: 0.634323741 seconds (324 MB allocated, 0.16% gc time in 2 pauses with 0 full sweep)

and with accums=8

julia> IJV_bench(sparsity, n, accums)
elapsed time: 1.122919613 seconds (995 MB allocated, 7.65% gc time in 41 pauses with 3 full sweep)
elapsed time: 0.279827421 seconds (141 MB allocated, 0.29% gc time in 2 pauses with 0 full sweep)

both a bit more than 4x faster.

@KristofferC
Copy link
Member Author

Is there anything more to do here? Else, maybe close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster sparse Sparse arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants