Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement implementation for groupslices and associated functions #15503

Closed
wants to merge 16 commits into from

Conversation

AndyGreenwell
Copy link
Contributor

This PR replaces the original functionality proposed in issue #14142.

The main groupslices function returns the vector uniquerow (also referred to as ic in the prior discussion) as originally calculated by the unique(A,dim) implementation in multidimensional.jl. A series of wrapper functions (groupinds, firstinds, lastinds) are included for returning the vector ia and vector of vectors ib, as discussed within #14142.

This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments.  Still needs tests and doc integration.
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function.  The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic.  As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic.  As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices.  For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments.  Still needs tests and doc integration.
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function.  The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic.  As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic.  As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices.  For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.
… functions

This commit adds some initial tests for the functions groupslices, groupinds, firstinds, and lastinds.  The initial tests are very basic and do not currently include a test for a multidimensional input array that would have hash collisions between slices.
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments.  Still needs tests and doc integration.
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function.  The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic.  As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic.  As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices.  For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.

Change error to ArgumentError

uniquerow is ic, and seems to handle hash collisions already

Add initial tests for multidimensional.jl for groupslices and related functions

This commit adds some initial tests for the functions groupslices, groupinds, firstinds, and lastinds.  The initial tests are very basic and do not currently include a test for a multidimensional input array that would have hash collisions between slices.
@@ -0,0 +1,30 @@
# This file is a part of Julia. License is MIT: http://julialang.org/license

using Base.Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add this file to the list in test/choosetests.jl for it to run

@StefanKarpinski StefanKarpinski self-assigned this May 5, 2016
that are present in the array `C` as returned from `unique(A, dim)`.
"""

function groupinds(ic::Vector{Int})
Copy link

@JoshChristie JoshChristie May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to work in my use case. (I haven't switched to your branch or updated to 0.5. I just copied your code changes)

A =  [1.0  1.0  2.0  2.0  1.0  1.0  1.0  2.0  2.0  2.0
      1.0  1.0  2.0  2.0  2.0  1.0  2.0  2.0  2.0  1.0
      1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  2.0]
ic = groupslices(A, 2)
ib = groupinds(ic)
ERROR: BoundsError: attempt to access 4-element Array{Array{Int64,1},1}:
 [1,2]  
 Int64[]
 [3,4]  
 Int64[]
  at index [5]
in groupinds at 1046

My fix:

function groupinds(ic::Vector{Int})
    d = Dict{Int, Int}()
    ia = unique(ic)
    n = length(ia)
    for i = 1:n
        d[ia[i]]= i
    end

    ib = Array(Vector{Int},n)
    for k = 1:n
        ib[k] = Int[]
    end

    for h = 1:length(ic)
        push!(ib[d[ic[h]]], h)
    end
    return ib
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, using a Dict similar to d above is along the lines of what was in the original proposal here and here. I'll incorporate your changes.

This commit incorporates the fix for groupinds provided by @JoshChristie, and adds the provided test case from that line note to test/multidimensional.jl
all(A .== C[:,:,ic])
end
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the empty line between the docstring and the function it is documenting

@AndyGreenwell
Copy link
Contributor Author

The functionality from this PR is now included in the following repository:

https://github.com/AndyGreenwell/GroupSlices.jl

That repo has been tagged, registered, and submitted as a PR to METADATA.jl. Once that PR is merged in METADATA, then I will close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants