-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacement implementation for groupslices and associated functions #15503
Changes from 15 commits
51f0321
a44a6fc
a3e1a58
39c341b
eb3e800
45f67d9
3f7c868
e4df809
8531de3
9e3e4dc
cc260ea
ba29479
c2ccc64
e43c6ae
1d5da95
30ab66d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -931,3 +931,171 @@ If `dim` is specified, returns unique regions of the array `itr` along `dim`. | |
@nref $N A d->d == dim ? sort!(uniquerows) : (1:size(A, d)) | ||
end | ||
end | ||
|
||
""" | ||
groupslices(A, dim) | ||
|
||
Returns a vector of integers where each integer element of the returned vector | ||
is a group number corresponding to the unique slices along dimension `dim` as | ||
returned from `unique(A, dim)`, where `A` can be a multidimensional array. | ||
|
||
Example usage: | ||
|
||
If `C = unique(A, dim)`, `ic = groupslices(A, dim)`, and | ||
rank(A) == rank(C) == 3, then: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should have backticks, and I think you mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will update. |
||
|
||
if dim == 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. code examples should be indented so the doc systen renders them as such |
||
all(A .== C[ic,:,:]) | ||
elseif dim == 2 | ||
all(A .== C[:,ic,:]) | ||
elseif dim == 3 | ||
all(A .== C[:,:,ic]) | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example usage fails for me. A = [1.0 1.0 2.0 2.0 1.0 1.0 1.0 2.0 2.0 2.0
1.0 1.0 2.0 2.0 2.0 1.0 2.0 2.0 2.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0]
dim = 2
C = unique(A, dim)
ic = groupslices(A, dim)
if dim == 1
all(A .== C[ic,:,:])
elseif dim == 2
all(A .== C[:,ic,:])
elseif dim == 3
all(A .== C[:,:,ic])
end
ERROR: BoundsError: attempt to access 3x4 Array{Float64,2}:
1.0 2.0 1.0 2.0
1.0 2.0 2.0 1.0
1.0 1.0 1.0 2.0
at index [Colon(),[1,1,3,3,5,1,5,3,3,10],Colon()]
in throw_boundserror at abstractarray.jl:156
in _internal_checkbounds at abstractarray.jl:176
in checkbounds at abstractarray.jl:159
in _getindex at multidimensional.jl:185
in getindex at abstractarray.jl:488 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before I make another commit, I need to work get the documentation strings integrated properly as @tkelman points out, but the changes in this gist should allow your example to work appropriately. In one of the changes made in the course of discussing #14142 related to detecting hash collisions, The changes in this gist assign the appropriate values to Thank you for testing. |
||
""" | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove the empty line between the docstring and the function it is documenting |
||
@generated function groupslices{T,N}(A::AbstractArray{T,N}, dim::Int) | ||
quote | ||
if !(1 <= dim <= $N) | ||
ArgumentError("Input argument dim must be 1 <= dim <= $N, but is currently $dim") | ||
end | ||
hashes = zeros(UInt, size(A, dim)) | ||
|
||
# Compute hash for each row | ||
k = 0 | ||
@nloops $N i A d->(if d == dim; k = i_d; end) begin | ||
@inbounds hashes[k] = hash(hashes[k], hash((@nref $N A i))) | ||
end | ||
|
||
# Collect index of first row for each hash | ||
uniquerow = Array(Int, size(A, dim)) | ||
firstrow = Dict{Prehashed,Int}() | ||
for k = 1:size(A, dim) | ||
uniquerow[k] = get!(firstrow, Prehashed(hashes[k]), k) | ||
end | ||
uniquerows = collect(values(firstrow)) | ||
|
||
# Check for collisions | ||
collided = falses(size(A, dim)) | ||
@inbounds begin | ||
@nloops $N i A d->(if d == dim | ||
k = i_d | ||
j_d = uniquerow[k] | ||
else | ||
j_d = i_d | ||
end) begin | ||
if (@nref $N A j) != (@nref $N A i) | ||
collided[k] = true | ||
end | ||
end | ||
end | ||
|
||
if any(collided) | ||
nowcollided = BitArray(size(A, dim)) | ||
while any(collided) | ||
# Collect index of first row for each collided hash | ||
empty!(firstrow) | ||
for j = 1:size(A, dim) | ||
collided[j] || continue | ||
uniquerow[j] = get!(firstrow, Prehashed(hashes[j]), j) | ||
end | ||
for v in values(firstrow) | ||
push!(uniquerows, v) | ||
end | ||
|
||
# Check for collisions | ||
fill!(nowcollided, false) | ||
@nloops $N i A d->begin | ||
if d == dim | ||
k = i_d | ||
j_d = uniquerow[k] | ||
(!collided[k] || j_d == k) && continue | ||
else | ||
j_d = i_d | ||
end | ||
end begin | ||
if (@nref $N A j) != (@nref $N A i) | ||
nowcollided[k] = true | ||
end | ||
end | ||
(collided, nowcollided) = (nowcollided, collided) | ||
end | ||
end | ||
ic = uniquerow | ||
return ic | ||
end | ||
end | ||
|
||
""" | ||
groupinds(ic) | ||
|
||
Returns a vector of vectors of integers wherein the vector of group slice | ||
index integers as returned from `groupslices(A, dim)` is converted into a | ||
grouped vector of vectors. Each vector entry in the returned vector of | ||
vectors contains all of the positional indices of slices in the original | ||
input array `A` that correspond to the unique slices along dimension `dim` | ||
that are present in the array `C` as returned from `unique(A, dim)`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the wording here is tough to follow - should probably be rephrased |
||
""" | ||
|
||
function groupinds(ic::Vector{Int}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't seem to work in my use case. (I haven't switched to your branch or updated to 0.5. I just copied your code changes) A = [1.0 1.0 2.0 2.0 1.0 1.0 1.0 2.0 2.0 2.0
1.0 1.0 2.0 2.0 2.0 1.0 2.0 2.0 2.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0]
ic = groupslices(A, 2)
ib = groupinds(ic)
ERROR: BoundsError: attempt to access 4-element Array{Array{Int64,1},1}:
[1,2]
Int64[]
[3,4]
Int64[]
at index [5]
in groupinds at 1046 My fix: function groupinds(ic::Vector{Int})
d = Dict{Int, Int}()
ia = unique(ic)
n = length(ia)
for i = 1:n
d[ia[i]]= i
end
ib = Array(Vector{Int},n)
for k = 1:n
ib[k] = Int[]
end
for h = 1:length(ic)
push!(ib[d[ic[h]]], h)
end
return ib
end There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
n = length(unique(ic)) | ||
ib = Array(Vector{Int},n) | ||
for k = 1:n | ||
ib[k] = Int[] | ||
end | ||
for h = 1:length(ic) | ||
push!(ib[ic[h]], h) | ||
end | ||
return ib | ||
end | ||
|
||
""" | ||
firstinds(ic::Vector{Int}) | ||
firstinds(ib::Vector{Vector{Int}}) | ||
|
||
Returns a vector of integers containing the first index position of each unique | ||
value in the input integer vector `ic`, or the first index position of each | ||
entry in the input vector of integer vectors `ib`. | ||
|
||
When operating on the output returned from `unique(A, dim)`, the returned | ||
vector of integers correspond to the positions of the first of each unique slice | ||
present in the original input multidimensional array `A` along dimension `dim`. | ||
|
||
The implementation of `firstinds` accepting a vector of integers operates on the | ||
output returned from `groupslices(A, dim)`. | ||
|
||
The implementation of `firstinds` accepting a vector of vector of integers | ||
operates on the output returned from `groupinds(ic::Vector{Int})` | ||
""" | ||
|
||
function firstinds(ic::Vector{Int}) | ||
id = unique(ic) | ||
n = length(id) | ||
ia = Array(Int,n) | ||
for i = 1:n | ||
ia[i] = findfirst(ic, id[i]) | ||
end | ||
return ia | ||
end | ||
|
||
function firstinds(ib::Vector{Vector{Int}}) | ||
ia = map(first, ib) | ||
end | ||
|
||
""" | ||
lastinds(ic::Vector{Int}) | ||
|
||
Returns a vector of integers containing the last index position of each unique | ||
value in the input integer vector `ic`. | ||
|
||
When operating on the output returned from `groupinds(unique(A, dim))`, the | ||
returned vector of integers correspond to the positions of the last of each | ||
unique slice present in the original input multidimensional array `A` along | ||
dimension `dim`. | ||
|
||
The implementation of `firstinds` accepting a vector of vector of integers | ||
operates on the output returned from `groupinds(ic::Vector{Int})` | ||
""" | ||
|
||
function lastinds(ib::Vector{Vector{Int}}) | ||
ia = map(last, ib) | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# This file is a part of Julia. License is MIT: http://julialang.org/license | ||
|
||
using Base.Test | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to add this file to the list in |
||
|
||
# Tests for groupslices, groupinds, firstinds, lastinds | ||
A = [1 2 3 ; 4 5 6 ; 7 8 9] | ||
B = [11 12 13 ; 14 15 16 ; 17 18 19] | ||
C = [21 22 23 ; 24 25 26 ; 27 28 29] | ||
D = cat(3, A, B, C, C, B, C, A) | ||
|
||
ic = [1;2;3;3;2;3;1] | ||
ib = Vector{Int}[] | ||
push!(ib,[1;7]) | ||
push!(ib,[2;5]) | ||
push!(ib,[3;4;6]) | ||
ia = [1;2;3] | ||
ia1 = [1;2;3] | ||
ia2 = [7;5;6] | ||
|
||
ic_test = groupslices(D,3) | ||
ib_test = groupinds(ic_test) | ||
ia_test = firstinds(ic_test) | ||
ia1_test = firstinds(ib_test) | ||
ia2_test = lastinds(ib_test) | ||
|
||
@test isequal(ic, ic_test) | ||
@test isequal(ib, ib_test) | ||
@test isequal(ia, ia_test) | ||
@test isequal(ia1, ia1_test) | ||
@test isequal(ia2, ia2_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see CONTRIBUTING.md, the signature line needs to be added to the rst docs, then run
make docs
to populate the docstring contentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which *.rst file should the signature line for this function live? At present, there is no multidimensional.rst file.