-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed of sprand() #6708
Comments
On quick checks, |
Cc: @rwgardner |
On my machine, almost all of the time in your
I'm not sure we can do much about the first, short of speeding up Basically, to make this faster it seems like we should compute the entries row by row to start with. |
Is that code trying to get |
@StefanKarpinski, I think we should probably avoid computing random elements in |
Yes, I guess when something's very sparse that's likely to be much better. Of course when the number of non-zeros is substantial, regardless of how sparse things are, it's still a lot of work. |
I don't think that large matrices with a large fraction of nonzeros are a practical concern. You shouldn't be using sparse formats at all in that case. |
I meant huge matrices with a small fraction of non-zeros where the number of non-zeros is still large. |
Of course the work will always be Ω(# nonzeros) (right now it is Θ(# log #)). This is mostly about improving the constant factor (although my proposal also reduces the complexity by a factor log(#/n)/log(#)). |
@ViralBShah, this is all about speeding up |
@StefanKarpinski, your solution of generating a random N-bit integer is not practical here, because it requires Θ(N) work and storage, which could be many orders of magnitude greater than the number of nonzeros. In any case, working by columns should make the log factor in the |
No, I see that. I'm trying to figure out where this is slow. Maybe ind2sub can be sped up, which is a good thing whether we manage to avoid it here or not. |
I suspect it's because our div and rem operations are so horribly pessimized, but I haven't narrowed it down to that yet. Inlining is making precise profiling a bit difficult. |
Originally, I meant this to track down the performance issues raised by @rwgardner in |
The recent history behind Yes, |
I still think it's crazy for our integer division to be so damned slow. We do the fast, efficient but slightly unsafe thing with integers everywhere else in the language. I don't think div should be an exception. We could provide a checked_div operation like checked_add, but that seems silly since unlike overflow, division by zero is easy to explicitly check for. There's also the trick where you can implement div by a fixed divisor with a multiply and a shift by factors computed in advance, which would be applicable to vectorized ind2sub for example. |
There's a big difference between overflow and undefined behavior. Results that overflow are still well-defined, and can sometimes even be exploited to get the answer you want. The undefined answers, however, are random numbers, or possibly a trap if you're lucky. Getting different results for no clear reason is just not ok. |
It seems like generating each sparse column separately would be a big win here. The expected density of each column is the same as the density of the matrix of a whole, so that's pretty simple. The hard part is the patching up after-the-fact if the fraction isn't close. It may make more sense to allocate non-zeros to the columns up front and then just choose that many unique indices per column and sort those. |
So we're just not going to have a way to do integer division efficiently? |
We can if LLVM adds intrinsics for x86 division, or something similar. We can also add our own unsafe division intrinsics, and a compiler flag to turn off the checks globally. We could also decide that the typemin/-1 case is not important, and at least remove the check for that. |
I really like the idea of generating the columns separately, I should have thought of that myself. The big win is not needing to use |
Certainly for linear indexing of array views, having fast integer division would be huge, and is presumably a good example of a case where the unsafe algorithm should be fine? |
Possibly relevant? Proposed addition of LLVM intrinsics for safe division: http://article.gmane.org/gmane.comp.compilers.llvm.devel/72466 |
This implementation is much faster: using Distributions
function spr(m,n,p)
M = Multinomial(iround(m*n*p), n)
c = cumsum([1; rand(M)])
r = rand(1:m, c[end]-1)
for i = 1:length(c)-1
lo, hi = c[i], c[i+1]-1
sort!(r, lo, hi, QuickSort, Base.Order.Forward)
end
v = rand(length(r))
S = SparseMatrixCSC(m,n,c,r,v)
end
julia> @time spr(70000, 70000, 0.001)
elapsed time: 0.254208626 seconds (89577536 bytes allocated)
70000x70000 sparse matrix with 4900000 Float64 entries:
[312 , 1] = 0.759103
[1498 , 1] = 0.197246
[5473 , 1] = 0.192934
[7402 , 1] = 0.742496
[9401 , 1] = 0.538901
[9555 , 1] = 0.150737
[9859 , 1] = 0.880329
⋮
[60202, 70000] = 0.290513
[61760, 70000] = 0.828304
[62626, 70000] = 0.123417
[64031, 70000] = 0.272032
[64734, 70000] = 0.451746
[65528, 70000] = 0.205945
[66508, 70000] = 0.0673513
[66696, 70000] = 0.938025 As compared to this: julia> @time sprand(70000, 70000, 0.001);
elapsed time: 1.817448442 seconds (548796520 bytes allocated) Of course, it's not quite correct since it allows duplicate row indices in the same column. When I insert some code to try to handle that, it slows back down to the same as what we have now: function spr(m,n,p)
M = Multinomial(iround(m*n*p), n)
c = cumsum([1; rand(M)])
r = rand(1:m, c[end]-1)
for i = 1:length(c)-1
lo, hi = c[i], c[i+1]-1
sort!(r, lo, hi, QuickSort, Base.Order.Forward)
while true
dups = false
for j = lo:hi-1
r[j] == r[j+1] || continue
r[j] = rand(1:m)
dups = true
end
dups || break
sort!(r, lo, hi, InsertionSort, Base.Order.Forward)
end
end
v = rand(length(r))
S = SparseMatrixCSC(m,n,c,r,v)
end
julia> @time spr(70000, 70000, 0.001)
elapsed time: 1.94048937 seconds (652454080 bytes allocated)
70000x70000 sparse matrix with 4900000 Float64 entries:
[200 , 1] = 0.870909
[316 , 1] = 0.675384
[2708 , 1] = 0.561914
[3678 , 1] = 0.919305
[3717 , 1] = 0.777615
[4848 , 1] = 0.628789
[6701 , 1] = 0.00253897
⋮ If we can figure out a faster way to handle duplicates, this is a better algorithm by far. Of course, we'd need to copy the multinomial sampling code out of Distributions. Another nice feature of this implementation is that it tends to give exactly the requested sparsity. |
Won't this implementation will exponentially slow down as the density increases ( |
This implementation, which fills in the matrix by columns as I suggested, is about four times faster than the original on my machine: # Return a random sorted subset of 1:N, where each element is included with probability p.
function randomsubset!(p,N,ind,uind)
if p == 1
resize!(uind, N)
uind[:] = 1:N
return uind
end
resize!(uind, 0)
p == 0 && return uind
if N < 100 # O(N) algorithm for small N
for i = 1:N
rand() < p && push!(uind, i)
end
else
K = p <= 0.5 ? N*p : N*(1-p)
# Use Newton's method to invert the birthday problem
L = log1p(-1/N)
KN = K/N
k = K
k = k + (expm1(-k*L) - exp(-k*L)*KN)/L
k = k + (expm1(-k*L) - exp(-k*L)*KN)/L # for K<N/2, 2 iterations suffice
ik = ifloor(k)
if rand() < k - ik
ik += 1
end
if ik == 0
if p <= 0.5
return uind
else
resize!(uind, N)
uind[:] = 1:N
return uind
end
end
if (ik > length(ind))
resize!(ind, ik)
end
for i = 1:ik
ind[i] = rand(1:N)
end
sort!(ind,1,ik,QuickSort, Base.Order.Forward)
if p <= 0.5
j = ind[1]
push!(uind, j)
uj = j
for i = 2:ik
j = ind[i]
if j != uj
push!(uind, j)
uj = j
end
end
else
push!(ind, N+1) # sentinel
ii = 1
for i = 1:N
if i != ind[ii]
push!(uind, i)
else
while (i == ind[ii])
ii += 1
end
end
end
end
end
uind
end
function randomsubset(p, N)
uind = Array(Int, int(N*p))
sizehint(uind, int(N*p))
randomsubset!(p,N,Int[],uind)
end
import Base.SparseMatrix.sparse_IJ_sorted!
function sprand2{T}(m::Integer, n::Integer, density::FloatingPoint, rng::Function,::Type{T}=eltype(rng(1)))
0 <= density <= 1 || throw(ArgumentError("density must be between 0 and 1"))
N = n*m
N == 0 && return spzeros(T,m,n)
N == 1 && return rand() <= density ? sparse(rng(1)) : spzeros(T,1,1)
I, J = Array(Int, 0), Array(Int, 0) # indices of nonzero elements
sizehint(I, int(N*density))
sizehint(J, int(N*density))
# coldensity = density of nonzero columns
if density*m > 1
coldensity = 1 - (1-density)^m
else # use Taylor series (= binomial expansion) to avoid cancellation errors
x = -(coldensity = density * m)
for k = 2:m
x *= (k-m-1)*density/k
oldcoldensity = coldensity
coldensity -= x
if oldcoldensity == coldensity
break
end
end
end
rows = Array(Int, 0)
sizehint(rows, int(m*density))
ind = Int[]
for j in randomsubset(coldensity, n)
randomsubset!(density, m, ind, rows)
append!(I, rows)
nrows = length(rows)
Jlen = length(J)
resize!(J, Jlen+nrows)
for i = Jlen+1:length(J)
J[i] = j
end
end
return sparse_IJ_sorted!(I, J, rng(length(I)), m, n, +) # it will never need to combine
end
sprand2(m::Integer, n::Integer, density::FloatingPoint) = sprand2(m,n,density,rand,Float64) |
@timholy – yes, and the duplicate elimination is way too slow anyway, so it's not really meant as a real proposal, but a starting point. The multinomial sampling approach for choosing column counts is quite good though. I had some thoughts about how to deduplicate more efficiently, although perhaps @stevengj's algorithm is better than that. |
A 4x speedup is great. And I like the idea of splitting that function out on its own, as it's surely useful in other contexts. |
Note, however, that my solution in its current form is probably not what we want: its distribution of nonzero entries is not equivalent to the original uniform sampling. This is easily fixed by randomizing the number of entries per column properly. |
(Fix merged.) |
Based on the discussion here, we need to investigate if
sparse()
can be sped up.https://groups.google.com/forum/#!topic/julia-users/X8Jca4SZMxo
Potential opportunities include experimenting with different ways of sorting inputs, avoiding the use of the generic
combine
function,@inbounds
, etc.The following example takes 1.7 seconds for me:
The text was updated successfully, but these errors were encountered: