-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Implement unique consistently #29038
Conversation
f8d7653
to
5659d8e
Compare
I managed to use a single function body for |
If I make all uniquifying algorithms explicit and accept the code duplication (https://gist.github.com/laborg/f9af0f5c2def4edfa410c5522afbff1d) the benchmarks look sligthley better: unique
unique!
I'm not sure which version is better, but I feel the consistency between implementations for |
What is the advantage of the new |
@stevengj: you mean as an implementation or a way of calling this? |
Both. My suggestion involves no extra copies of the data, and I’m skeptical that it is much slower. But also, if that works as an implementation it’s not clear why we need a new API. |
Well, given that we have |
Not all functions on arrays need to accept a function argument. We don't have |
We don't have |
is not equivalent to
Nevertheless I don't think that |
Oh, I see. This sounds more analogous to |
Yes |
What else would the function mean do? In the case of |
What I initially thought when I read the description was that |
Is there anything useful here now with #30141 merged? |
Well, although I've haven't redone the benchmarks, I believe this PR should have a faster implementations for all unique functions (unique(),unique!(),unique(f,),unique!(f,)), because it implements proper widening (similar to collect). Btw.: Benchmarks of unique should include all kinds of data, not just sorted Int64, as was used as demonstration in #30141... @andyferris was talking about a faster unique implementation - I would be interested to benchmark it against this one here. |
unique and Julia have changed since this attempt so closing seems the best solution. |
Sorry it didn't get used. I feel bad every time I come across a PR like this that didn't get merged 😞 |
In this case I can clearly say you shouldn't. This was more an exercise to get to know Julia better and in this part it was a success! The recent implementation by Tim Holy is nicer too. |
This pull request updates the
unique[!]([f], itr)
implementation in sets.jl and achieves a couple of things:unqiue!(f,itr)
unique(f,itr)
instead ofunique(itr)
, meaning that unique(f, itr) will behave as unique(itr) regarding returned types.unique[!]([f], itr)
Remarks:
<:AbstractString, <:Real, <:Symbol
). Usinghasmethod
to assert sortability would be slow...Benchmarks
Datasets
unique(itr)
andunique(f,itr)
unique!(itr)
Discussion
unqiue!
is understandable, as the default element type for the seenSet
is currentlyAny