-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacement implementation for groupslices and associated functions #15503
Conversation
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments. Still needs tests and doc integration.
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function. The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic. As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic. As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices. For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments. Still needs tests and doc integration.
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function. The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic. As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic. As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices. For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.
… functions This commit adds some initial tests for the functions groupslices, groupinds, firstinds, and lastinds. The initial tests are very basic and do not currently include a test for a multidimensional input array that would have hash collisions between slices.
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments. Still needs tests and doc integration.
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function. The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic. As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic. As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices. For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed. Change error to ArgumentError uniquerow is ic, and seems to handle hash collisions already Add initial tests for multidimensional.jl for groupslices and related functions This commit adds some initial tests for the functions groupslices, groupinds, firstinds, and lastinds. The initial tests are very basic and do not currently include a test for a multidimensional input array that would have hash collisions between slices.
@@ -0,0 +1,30 @@ | |||
# This file is a part of Julia. License is MIT: http://julialang.org/license | |||
|
|||
using Base.Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add this file to the list in test/choosetests.jl
for it to run
This commit addresses the removal of trailing whitespace from prior commit, and also adds an entry for multidimensional to test/choosetests.jl
that are present in the array `C` as returned from `unique(A, dim)`. | ||
""" | ||
|
||
function groupinds(ic::Vector{Int}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to work in my use case. (I haven't switched to your branch or updated to 0.5. I just copied your code changes)
A = [1.0 1.0 2.0 2.0 1.0 1.0 1.0 2.0 2.0 2.0
1.0 1.0 2.0 2.0 2.0 1.0 2.0 2.0 2.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0]
ic = groupslices(A, 2)
ib = groupinds(ic)
ERROR: BoundsError: attempt to access 4-element Array{Array{Int64,1},1}:
[1,2]
Int64[]
[3,4]
Int64[]
at index [5]
in groupinds at 1046
My fix:
function groupinds(ic::Vector{Int})
d = Dict{Int, Int}()
ia = unique(ic)
n = length(ia)
for i = 1:n
d[ia[i]]= i
end
ib = Array(Vector{Int},n)
for k = 1:n
ib[k] = Int[]
end
for h = 1:length(ic)
push!(ib[d[ic[h]]], h)
end
return ib
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit incorporates the fix for groupinds provided by @JoshChristie, and adds the provided test case from that line note to test/multidimensional.jl
all(A .== C[:,:,ic]) | ||
end | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the empty line between the docstring and the function it is documenting
The functionality from this PR is now included in the following repository: https://github.com/AndyGreenwell/GroupSlices.jl That repo has been tagged, registered, and submitted as a PR to METADATA.jl. Once that PR is merged in METADATA, then I will close this PR. |
This PR replaces the original functionality proposed in issue #14142.
The main groupslices function returns the vector
uniquerow
(also referred to asic
in the prior discussion) as originally calculated by theunique(A,dim)
implementation in multidimensional.jl. A series of wrapper functions (groupinds
,firstinds
,lastinds
) are included for returning the vectoria
and vector of vectorsib
, as discussed within #14142.