Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement implementation for groupslices and associated functions #15503

Closed
wants to merge 16 commits into from
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions base/exports.jl
Original file line number Diff line number Diff line change
Expand Up @@ -532,8 +532,11 @@ export
findprev,
findnz,
first,
firstinds,
flipdim,
gradient,
groupinds,
groupslices,
hcat,
hvcat,
ind2sub,
Expand All @@ -547,6 +550,7 @@ export
isperm,
issorted,
last,
lastinds,
levicivita,
linspace,
logspace,
Expand Down
168 changes: 168 additions & 0 deletions base/multidimensional.jl
Original file line number Diff line number Diff line change
Expand Up @@ -931,3 +931,171 @@ If `dim` is specified, returns unique regions of the array `itr` along `dim`.
@nref $N A d->d == dim ? sort!(uniquerows) : (1:size(A, d))
end
end

"""
groupslices(A, dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see CONTRIBUTING.md, the signature line needs to be added to the rst docs, then run make docs to populate the docstring content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which *.rst file should the signature line for this function live? At present, there is no multidimensional.rst file.


Returns a vector of integers where each integer element of the returned vector
is a group number corresponding to the unique slices along dimension `dim` as
returned from `unique(A, dim)`, where `A` can be a multidimensional array.

Example usage:

If `C = unique(A, dim)`, `ic = groupslices(A, dim)`, and
rank(A) == rank(C) == 3, then:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have backticks, and I think you mean ndims here not linear algebra rank

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update.


if dim == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code examples should be indented so the doc systen renders them as such

all(A .== C[ic,:,:])
elseif dim == 2
all(A .== C[:,ic,:])
elseif dim == 3
all(A .== C[:,:,ic])
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example usage fails for me.

A =  [1.0  1.0  2.0  2.0  1.0  1.0  1.0  2.0  2.0  2.0
      1.0  1.0  2.0  2.0  2.0  1.0  2.0  2.0  2.0  1.0
      1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  2.0]
dim = 2
C = unique(A, dim)
ic = groupslices(A, dim)

if dim == 1
   all(A .== C[ic,:,:])
elseif dim == 2
   all(A .== C[:,ic,:])
elseif dim == 3
   all(A .== C[:,:,ic])
end

ERROR: BoundsError: attempt to access 3x4 Array{Float64,2}:
 1.0  2.0  1.0  2.0
 1.0  2.0  2.0  1.0
 1.0  1.0  1.0  2.0
  at index [Colon(),[1,1,3,3,5,1,5,3,3,10],Colon()]
 in throw_boundserror at abstractarray.jl:156
 in _internal_checkbounds at abstractarray.jl:176
 in checkbounds at abstractarray.jl:159
 in _getindex at multidimensional.jl:185
 in getindex at abstractarray.jl:488

Copy link
Contributor Author

@AndyGreenwell AndyGreenwell May 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I make another commit, I need to work get the documentation strings integrated properly as @tkelman points out, but the changes in this gist should allow your example to work appropriately. In one of the changes made in the course of discussing #14142 related to detecting hash collisions, ic being directly assigned the values of uniquerow was improper. While the values of uniquerow do contain the indices of unique slices of A, they are not the indices corresponding to slices in C that allow for reconstruction of A.

The changes in this gist assign the appropriate values to ic that allow one to reconstruct A by indexing C with ic, as had been the case in the original uniqueslices submission (but had not been handling hash collisions properly).

Thank you for testing.

"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the empty line between the docstring and the function it is documenting

@generated function groupslices{T,N}(A::AbstractArray{T,N}, dim::Int)
quote
if !(1 <= dim <= $N)
ArgumentError("Input argument dim must be 1 <= dim <= $N, but is currently $dim")
end
hashes = zeros(UInt, size(A, dim))

# Compute hash for each row
k = 0
@nloops $N i A d->(if d == dim; k = i_d; end) begin
@inbounds hashes[k] = hash(hashes[k], hash((@nref $N A i)))
end

# Collect index of first row for each hash
uniquerow = Array(Int, size(A, dim))
firstrow = Dict{Prehashed,Int}()
for k = 1:size(A, dim)
uniquerow[k] = get!(firstrow, Prehashed(hashes[k]), k)
end
uniquerows = collect(values(firstrow))

# Check for collisions
collided = falses(size(A, dim))
@inbounds begin
@nloops $N i A d->(if d == dim
k = i_d
j_d = uniquerow[k]
else
j_d = i_d
end) begin
if (@nref $N A j) != (@nref $N A i)
collided[k] = true
end
end
end

if any(collided)
nowcollided = BitArray(size(A, dim))
while any(collided)
# Collect index of first row for each collided hash
empty!(firstrow)
for j = 1:size(A, dim)
collided[j] || continue
uniquerow[j] = get!(firstrow, Prehashed(hashes[j]), j)
end
for v in values(firstrow)
push!(uniquerows, v)
end

# Check for collisions
fill!(nowcollided, false)
@nloops $N i A d->begin
if d == dim
k = i_d
j_d = uniquerow[k]
(!collided[k] || j_d == k) && continue
else
j_d = i_d
end
end begin
if (@nref $N A j) != (@nref $N A i)
nowcollided[k] = true
end
end
(collided, nowcollided) = (nowcollided, collided)
end
end
ic = uniquerow
return ic
end
end

"""
groupinds(ic)

Returns a vector of vectors of integers wherein the vector of group slice
index integers as returned from `groupslices(A, dim)` is converted into a
grouped vector of vectors. Each vector entry in the returned vector of
vectors contains all of the positional indices of slices in the original
input array `A` that correspond to the unique slices along dimension `dim`
that are present in the array `C` as returned from `unique(A, dim)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the wording here is tough to follow - should probably be rephrased

"""

function groupinds(ic::Vector{Int})
Copy link

@JoshChristie JoshChristie May 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to work in my use case. (I haven't switched to your branch or updated to 0.5. I just copied your code changes)

A =  [1.0  1.0  2.0  2.0  1.0  1.0  1.0  2.0  2.0  2.0
      1.0  1.0  2.0  2.0  2.0  1.0  2.0  2.0  2.0  1.0
      1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  2.0]
ic = groupslices(A, 2)
ib = groupinds(ic)
ERROR: BoundsError: attempt to access 4-element Array{Array{Int64,1},1}:
 [1,2]  
 Int64[]
 [3,4]  
 Int64[]
  at index [5]
in groupinds at 1046

My fix:

function groupinds(ic::Vector{Int})
    d = Dict{Int, Int}()
    ia = unique(ic)
    n = length(ia)
    for i = 1:n
        d[ia[i]]= i
    end

    ib = Array(Vector{Int},n)
    for k = 1:n
        ib[k] = Int[]
    end

    for h = 1:length(ic)
        push!(ib[d[ic[h]]], h)
    end
    return ib
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, using a Dict similar to d above is along the lines of what was in the original proposal here and here. I'll incorporate your changes.

n = length(unique(ic))
ib = Array(Vector{Int},n)
for k = 1:n
ib[k] = Int[]
end
for h = 1:length(ic)
push!(ib[ic[h]], h)
end
return ib
end

"""
firstinds(ic::Vector{Int})
firstinds(ib::Vector{Vector{Int}})

Returns a vector of integers containing the first index position of each unique
value in the input integer vector `ic`, or the first index position of each
entry in the input vector of integer vectors `ib`.

When operating on the output returned from `unique(A, dim)`, the returned
vector of integers correspond to the positions of the first of each unique slice
present in the original input multidimensional array `A` along dimension `dim`.

The implementation of `firstinds` accepting a vector of integers operates on the
output returned from `groupslices(A, dim)`.

The implementation of `firstinds` accepting a vector of vector of integers
operates on the output returned from `groupinds(ic::Vector{Int})`
"""

function firstinds(ic::Vector{Int})
id = unique(ic)
n = length(id)
ia = Array(Int,n)
for i = 1:n
ia[i] = findfirst(ic, id[i])
end
return ia
end

function firstinds(ib::Vector{Vector{Int}})
ia = map(first, ib)
end

"""
lastinds(ic::Vector{Int})

Returns a vector of integers containing the last index position of each unique
value in the input integer vector `ic`.

When operating on the output returned from `groupinds(unique(A, dim))`, the
returned vector of integers correspond to the positions of the last of each
unique slice present in the original input multidimensional array `A` along
dimension `dim`.

The implementation of `firstinds` accepting a vector of vector of integers
operates on the output returned from `groupinds(ic::Vector{Int})`
"""

function lastinds(ib::Vector{Vector{Int}})
ia = map(last, ib)
end
2 changes: 1 addition & 1 deletion test/choosetests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ function choosetests(choices = [])
"markdown", "base64", "serialize", "misc", "threads",
"enums", "cmdlineargs", "i18n", "workspace", "libdl", "int",
"checked", "intset", "floatfuncs", "compile", "parallel", "inline",
"boundscheck", "error"
"boundscheck", "error", "multidimensional"
]

if Base.USE_GPL_LIBS
Expand Down
30 changes: 30 additions & 0 deletions test/multidimensional.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# This file is a part of Julia. License is MIT: http://julialang.org/license

using Base.Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add this file to the list in test/choosetests.jl for it to run


# Tests for groupslices, groupinds, firstinds, lastinds
A = [1 2 3 ; 4 5 6 ; 7 8 9]
B = [11 12 13 ; 14 15 16 ; 17 18 19]
C = [21 22 23 ; 24 25 26 ; 27 28 29]
D = cat(3, A, B, C, C, B, C, A)

ic = [1;2;3;3;2;3;1]
ib = Vector{Int}[]
push!(ib,[1;7])
push!(ib,[2;5])
push!(ib,[3;4;6])
ia = [1;2;3]
ia1 = [1;2;3]
ia2 = [7;5;6]

ic_test = groupslices(D,3)
ib_test = groupinds(ic_test)
ia_test = firstinds(ic_test)
ia1_test = firstinds(ib_test)
ia2_test = lastinds(ib_test)

@test isequal(ic, ic_test)
@test isequal(ib, ib_test)
@test isequal(ia, ia_test)
@test isequal(ia1, ia1_test)
@test isequal(ia2, ia2_test)