-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix sparse getindex regression #7162
Conversation
We should also update NEWS.md with the sparse indexing improvements. |
Here are some results from a comparison between We can probably move |
Testing for equality after I don't have time to investigate just now though. |
Cc: @kmsquire |
We should probably also test for small inputs. It is quite likely that for columns with less than 10 elements, a linear search may be faster. |
@ViralBShah I did a comparison when I wrote the indexing and found no advantage of using linear search. But that was wrong as Binary and linear search are about equally fast for a haystack of size 15
(But this is not what is found in http://schani.wordpress.com/2010/04/30/linear-vs-binary-search/ for C. Where linear search is better up to arrays length 200) |
It would be nice to bake the linear search into the various |
That blog post is good. I would be curious to see how close we can get, since a bunch of work on SIMD is going on now. |
Made PR with performance tests for sparse indexing: #7177. |
Updated and optimized the tests at https://gist.github.com/tanmaykm/7a1e3d56ff7445f0aa03 further. Now times reported by alll three methods ( The bulk of the difference in test result was because of importing I think we can use |
@tanmaykm good detective work but I'm still puzzled why There are also a |
Updated. @mauro3 yes, the slower |
Is this good to merge now? |
ind = binarysearch(A.rowval, i0, A.colptr[i1], A.colptr[i1+1]-1) | ||
ind > -1 ? A.nzval[ind] : zero(T) | ||
r1 = int(A.colptr[i1]) | ||
r2 = A.colptr[i1+1]-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this also be converted to int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r2
is an int
because of the subtraction
Thanks for the clarifications and sorry to bother with two more things!
function rangesearch(haystack::Range, needle)
di = (needle - first(haystack))/ step(haystack)
i = ifloor(di)
(di-i==0 && 1<=i+1<=length(haystack)) ? i+1 : -1
end |
Sorry, no the second suggestion is wrong, ignore it. |
On my machine, # last run only
elapsed time: 0.06766474 seconds (0 bytes allocated)
elapsed time: 0.032284962 seconds (0 bytes allocated)
elapsed time: 0.070093474 seconds (0 bytes allocated) I actually find that somewhat surprising as well. I'm also not well versed at |
One possible reason has nothing to do with the search function: the timing does include an extra 10^6 array lookups that aren't in the |
More detective work: This shows that for
This suggests that the difference is caused by caching(?). Trying slightly different tests, it didn't seen the extra array lookups make much of a difference. |
also replaced binarysearch with methods from sort.jl ref JuliaLang#7131, JuliaLang#7047
Caching does seem to be the likely explanation. Changed |
I think this is good to merge. |
fix sparse getindex regression
Thanks. Merged. |
I now see:
|
Yes. Raised an issue #7197 for that. |
This PR aims to fix broken
getindex
for sparse matrices with index type other thanInt
and adds some related tests.ref #7131, #7047