-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make unique(f, itr)
and unique!(f, itr)
faster
#30286
Conversation
base/set.jl
Outdated
else | ||
seen2 = convert(Set{promote_typejoin(eltype(seen), typeof(x))}, seen) | ||
push!(seen2, t) | ||
return _unique!(out, f, C, seen2, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is really clever.
Avoid creation of a `Set{Any}`.
c4decee
to
cacbb7f
Compare
How much does the manual specialisation on |
Where's this? |
push!(out, x) | ||
if y isa eltype(seen) | ||
push!(seen, y) | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do
else
TT = promote_typejoin(eltype(seen), typeof(y))
if eltype(seen) === TT
push!(seen, y)
else
seen2 = ...
end
end
Idea is that we don't grow the stack if f
is e.g. type-unstable between Float32
and Float64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, maybe disregard. I misunderstoof promote_typejoin
, which apparently doesn't do that. Point remains, do we want collect-style widening here (integer and float merge to Float64
) or do we want exact widening as you did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, is this calling _unique!
recursively each time a new value is encountered?! @chethega is right that this is bad, a call should only happen when encountering a new type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - it simply pushes if the value is of type eltype of seen. If not, and only if not, then it constructs seen2 and makes a new call.
Nevermind, I misread the code. |
Adding to 1.1 since this fixes a perf regression introduced on master. |
I’m happy for this to merge now, unless someone else wants to review? |
Can an test be added: the make sure the behavour of the following has not changed.
It is not the behavour I expect, but it is the behavour of 1.0 |
What didn’t you expect? The eltype, or the length 1 result? |
The length 1 result. (On further thought, it would actually be hard to not maintain that behavour given all such numbers both |
The checks are always done via |
The hidden fact here is that |
Ok let’s merge - no point holding up 1.1. |
* Make `unique(f, itr)` and `unique!(f, itr)` faster Avoid creation of a `Set{Any}`. * Fix unique! for resizable OffsetVector (cherry picked from commit c2fb1dc)
Avoid creation of a
Set{Any}
to make the function-providing forms faster, and address regression ofunique(itr)
perfermance from #30141.cc @raghav9-97 @oxinabox @fredrikekre
Setup:
Before #30141:
After #30141:
After this PR: