-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added unique!(f, itr) function #30141
Changes from all commits
2799812
35eabac
43bc5c3
e562c06
c2535b3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -178,16 +178,41 @@ function unique(f, C) | |
out | ||
end | ||
|
||
# If A is not grouped, then we will need to keep track of all of the elements that we have | ||
# seen so far. | ||
function _unique!(A::AbstractVector) | ||
seen = Set{eltype(A)}() | ||
""" | ||
unique!(f, A::AbstractVector) | ||
|
||
Selects one value from `A` for each unique value produced by `f` applied to | ||
elements of `A` , then return the modified A. | ||
|
||
# Examples | ||
```jldoctest | ||
julia> unique!(x -> x^2, [1, -1, 3, -3, 4]) | ||
3-element Array{Int64,1}: | ||
1 | ||
3 | ||
4 | ||
|
||
julia> unique!(n -> n%3, [5, 1, 8, 9, 3, 4, 10, 7, 2, 6]) | ||
3-element Array{Int64,1}: | ||
5 | ||
1 | ||
9 | ||
|
||
julia> unique!(iseven, [2, 3, 5, 7, 9]) | ||
2-element Array{Int64,1}: | ||
2 | ||
3 | ||
``` | ||
""" | ||
function unique!(f, A::AbstractVector) | ||
seen = Set() | ||
idxs = eachindex(A) | ||
y = iterate(idxs) | ||
count = 0 | ||
for x in A | ||
if x ∉ seen | ||
push!(seen, x) | ||
t = f(x) | ||
if t ∉ seen | ||
push!(seen,t) | ||
count += 1 | ||
A[y[1]] = x | ||
y = iterate(idxs, y[2]) | ||
|
@@ -196,6 +221,10 @@ function _unique!(A::AbstractVector) | |
resize!(A, count) | ||
end | ||
|
||
# If A is not grouped, then we will need to keep track of all of the elements that we have | ||
# seen so far. | ||
_unique!(A::AbstractVector) = unique!(identity, A::AbstractVector) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I will try to build the inner _unique function as suggested by @andyferris.In the meantime I will benchmark the existing solution and will present it here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fredrikekre When I run There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some benchmarking results for unique!(f, iter) and unique!(iter) are stated:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I lead you down the wrong path with my coimment on #30141 (comment) Replacing Can you also report benchmarks for julia master for the old performance of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a branch, @fredrikekre I was thinking it would be better to have small PRs with seperate concerns, as Lyndon suggests. Let's simply merge this feature PR from @raghav9-97 (his first Julia PR if I understand) and I'll follow up straight away with the performance PR that addresses both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But yeah, this particular line does leave us in an bad intermediate state... oh well. |
||
|
||
# If A is grouped, so that each unique element is in a contiguous group, then we only | ||
# need to keep track of one element at a time. We replace the elements of A with the | ||
# unique elements that we see in the order that we see them. Once we have iterated | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this won't be efficient because this means
Set{Any}()
.Since relying on inference is naughty, what should probably happen here is a widening pattern. Start with a
Set{typeof(first(A))}()
and then keep pushing to this set until the element type isn't wide enough. Someone else might know better, but I think the most efficient pattern is then to dispatch again to more-or-less the same function. This is, unfortunately, a little bit of work to implement, sorry.You'll want an inner function like
_unique!(f, A, seen, i)
whereseen
is the currentSet
andi
is the current index The outerunique!
function is creates the initialSet
and dispatches to the inner function.i.e. something like
and the other funciton approximately
I haven't tested the above at all, but I hope it demonstrates the pattern at least. The compiler will elide all the complexity in the standard case that
f(first(A))
is inferrable. Also note that I've assumed allAbstractVector
s haveAbstractUnitRange
indices in the above to avoid the need to useiterate
, but the other pattern is fine too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree,
I also think that can be in a seperate PR.
Right now, the existing
unique(f, iter)
(no bang),does the same thing.
And can be optimised in the same way.
This PR adds a naive implementation that can be improved later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this could be implemented in a different PR as both with and without bang version of unique(f, iter) uses naive method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyferris I can try to implement the inner function _unique within this PR if this is what is needed to merge it but will need some help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I didn’t mean to block this - iterating later is of course fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyferris you might want to dismiss your review,
However that is done,
(possibly by rereviewing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyferris Please re review this PR.