-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rand(::Set) #16231
Comments
Note that my implementation above is efficient for the usual case that the set's internal hash table is at most a few times larger than the size of the set. You can defeat this by deleting the contents of the set and then re-inserting a much smaller number of elements. For example, my code will be extremely inefficient for: s = Set(1:10^7)
empty!(s)
push!(s, 1)
length(s.dict.slots) / length(s) # gives 1.7e7!
@time rand(s) takes almost 1 second for I suppose you could test whether |
I'd like to take a look at this, if you don't mind. |
@PythonNut, that would be great. |
I'm new to the Julia scene, but this strikes me as something that has performance implications outside this issue. You mention iteration, and I wouldn't be surprised if other operations would also be impeded by large numbers of empty bins. One possible solution is to Does this sound like a better path forward? |
I think it would be good to fix I don't know about automating this; the need for that seems pretty rare. Typical uses of (We should also check what e.g. Python does in this case.) Anyway, that should be a separate issue and a separate PR. |
@stevengj that sounds quite reasonable. I've run some profiling, the results of which are visualized here:
v0.4.5 Fedora x86_64 i5-2520Mv0.4.5 Fedora i386 Core2Duo T2500ConclusionAs you can see, the crossover points vary with the size of the set (in addition to the number of bins), and also vary by machine. In addition, the |
@PythonNut, nice job. In principle, one could do a bit better than But I doubt it will change the crossover point by orders of magnitude. With such a high crossover point, I would only bother to implement the O(1) algorithm. |
Note also that rather than |
@stevengj yes, looks like the effect isn't that significant. v0.4.5 Arch Linux x86_64 i7-4712HQI understand that your technique has the additional advantage of short circuiting the collection once the value is found, so it performs better on average. However, they have almost identical worst-case times, which I can't explain since we should be saving a bit of allocation. ╮(╯_╰)╭
Thanks. I was looking for something like that, but couldn't find it for some reason. I guess I'll transition to copying your implementation over and writing tests. It might be a while, since I'm new to the |
Your thoroughness is inspiring. |
@stevengj would you also like a |
|
@stevengj Yup, that's what I was thinking. Sorry, I'm having some trouble building my development Julia since I don't have much disk space. :/ Don't worry, I'll think of something. |
Sorry guys, it looks like I'll need a much bigger VM to build Julia, and I haven't got around to making it yet—my job is making it hard. Hopefully, I get a spare moment to build it and start work on the tests for this. |
Performant random implementation for Dict and Set types.
As mentioned on the mailing list, it might be nice to have a
rand(::Set)
function. Since this is hard to implement efficiently without access to theSet
internals, it is something that almost has to be in Base. Here is a sample O(1) implementation:If someone wants to put together a test case and documentation, it could make a good intro PR.
Probably also add a
rand(::Set, dims)
method (etcetera) to supplementrand(::Array, dims)
, which could be accomplished just by convertingrand
arguments fromAbstractArray
toUnion{AbstractArray,AbstractSet}
in a few places.The text was updated successfully, but these errors were encountered: