-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for function uniqueslices #14142
Changes from 4 commits
6873024
4d0a8c1
d0b09ab
cdfd617
547600a
0779baf
7fe7997
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -863,3 +863,193 @@ If `dim` is specified, returns unique regions of the array `itr` along `dim`. | |
@nref $N A d->d == dim ? sort!(uniquerows) : (1:size(A, d)) | ||
end | ||
end | ||
|
||
""" | ||
C, ia, ib, ic = uniqueind(itr, dim) | ||
|
||
A function that operates similiarly to `unique(itr,dim)` but returns multiple | ||
output arguments having the following properties: | ||
|
||
C - the unique elements of the array `itr` along the selected dimension `dim` | ||
ia - a Vector{Int} of indices such that: | ||
`itr[ia] == C` returns `true` if `itr` is a one-dimensional array and `dim == 1` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would write this more simply as " There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated as suggested. |
||
`itr[ia,:] == C` returns `true` if `itr` is a two-dimensional array and `dim == 1` | ||
`itr[:,ia] == C` returns `true` if `itr` is a two-dimensional array and `dim == 2` | ||
`itr[:,:,ia] == C` returns `true if `itr` is a three-dimensional array and `dim == 3` | ||
and so forth for higher dimensional arrays. | ||
ib - a Vector{Vector{Int}} where each Vector{Int} contains the indices associated with the | ||
individual entries of `C` along the dimension `dim` | ||
ic - A Vector{Int} of indices such that: | ||
`C[ic] == itr` returns `true` if `itr` is a one-dimensional array and `dim == 1` | ||
`C[ic,:] == itr` returns `true` if `itr` is a two-dimensional array and `dim == 1` | ||
`C[:,ic] == itr` returns `true` if `itr` is a two-dimensional array and `dim == 2` | ||
`C[:,:,ic] == itr` returns `true if `itr` is a three-dimensional array and `dim == 3` | ||
and so forth for higher dimensional arrays. | ||
|
||
Examples: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these should probably be formatted as doctests via ```jldoctest |
||
|
||
julia> itr = [1;2;3;2;3;5;6;5;7;1]; | ||
julia> C, ia, ib, ic = uniqueind(itr,1) | ||
([1,2,3,5,6,7],[1,2,3,6,7,9],[[1,10],[2,4],[3,5],[6,8],[7],[9]],[1,2,3,2,3,4,5,4,6,1]) | ||
julia> C[ic] == itr | ||
true | ||
julia> itr[ia] == C | ||
true | ||
julia> ib | ||
6-element Array{Array{Int64,1},1}: | ||
[1,10] | ||
[2,4] | ||
[3,5] | ||
[6,8] | ||
[7] | ||
[9] | ||
|
||
julia> D = rand(Int,2,3,1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better always use the same values for doctests I think. That will allow running them as tests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add a call to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd simply create arrays from fixed inputs. This has the advantage of allowing to use small positive integers, making the output easier to read. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated as suggested. |
||
julia> E = rand(Int,2,3,1); | ||
julia> F = rand(Int,2,3,1); | ||
julia> A = cat(3, D, F, E, E, D) | ||
2x3x5 Array{Int8,3}: | ||
[:, :, 1] = | ||
2 -95 -25 | ||
-94 -60 -74 | ||
|
||
[:, :, 2] = | ||
-125 71 -58 | ||
31 -71 -1 | ||
|
||
[:, :, 3] = | ||
-79 -33 -46 | ||
80 76 -85 | ||
|
||
[:, :, 4] = | ||
-79 -33 -46 | ||
80 76 -85 | ||
|
||
[:, :, 5] = | ||
2 -95 -25 | ||
-94 -60 -74 | ||
|
||
julia> C, ia, ib, ic = uniqueind(A,3) | ||
( | ||
2x3x3 Array{Int8,3}: | ||
[:, :, 1] = | ||
2 -95 -25 | ||
-94 -60 -74 | ||
|
||
[:, :, 2] = | ||
-125 71 -58 | ||
31 -71 -1 | ||
|
||
[:, :, 3] = | ||
-79 -33 -46 | ||
80 76 -85, | ||
|
||
[1,2,3],[[1,5],[2],[3,4]],[1,2,3,3,1]) | ||
|
||
julia> A[:,:,ia] == C | ||
true | ||
julia> C[:,:,ic] == A | ||
true | ||
julia> ib | ||
3-element Array{Array{Int64,1},1}: | ||
[1,5] | ||
[2] | ||
[3,4] | ||
|
||
""" | ||
@generated function uniqueind{T,N}(A::AbstractArray{T,N}, dim::Int) | ||
quote | ||
1 <= dim <= $N || return copy(A) | ||
hashes = zeros(UInt, size(A, dim)) | ||
|
||
# Compute hash for each row | ||
k = 0 | ||
@nloops $N i A d->(if d == dim; k = i_d; end) begin | ||
@inbounds hashes[k] = hash(hashes[k], hash((@nref $N A i))) | ||
end | ||
|
||
# Collect index of first row for each hash | ||
uniquerow = Array(Int, size(A, dim)) | ||
ic = Array(Int, size(A, dim)) | ||
ia = Int[] | ||
firstrow = Dict{Prehashed,Int}() | ||
icdict = Dict{Int,Int}() | ||
iadict = Dict{UInt,Int}() | ||
h = 0 | ||
for k = 1:size(A, dim) | ||
tmp = get!(firstrow, Prehashed(hashes[k]), k) | ||
uniquerow[k] = tmp | ||
if !haskey(icdict,tmp) | ||
h += 1 | ||
icdict[tmp] = h | ||
ic[k] = h | ||
else | ||
ic[k] = icdict[tmp] | ||
end | ||
if !haskey(iadict,hashes[k]) | ||
iadict[hashes[k]] = k | ||
push!(ia,k) | ||
end | ||
end | ||
uniquerows = collect(values(firstrow)) | ||
|
||
# Check for collisions | ||
collided = falses(size(A, dim)) | ||
@inbounds begin | ||
@nloops $N i A d->(if d == dim | ||
k = i_d | ||
j_d = uniquerow[k] | ||
else | ||
j_d = i_d | ||
end) begin | ||
if (@nref $N A j) != (@nref $N A i) | ||
collided[k] = true | ||
end | ||
end | ||
end | ||
|
||
if any(collided) | ||
nowcollided = BitArray(size(A, dim)) | ||
while any(collided) | ||
# Collect index of first row for each collided hash | ||
empty!(firstrow) | ||
for j = 1:size(A, dim) | ||
collided[j] || continue | ||
uniquerow[j] = get!(firstrow, Prehashed(hashes[j]), j) | ||
end | ||
for v in values(firstrow) | ||
push!(uniquerows, v) | ||
end | ||
|
||
# Check for collisions | ||
fill!(nowcollided, false) | ||
@nloops $N i A d->begin | ||
if d == dim | ||
k = i_d | ||
j_d = uniquerow[k] | ||
(!collided[k] || j_d == k) && continue | ||
else | ||
j_d = i_d | ||
end | ||
end begin | ||
if (@nref $N A j) != (@nref $N A i) | ||
nowcollided[k] = true | ||
end | ||
end | ||
(collided, nowcollided) = (nowcollided, collided) | ||
end | ||
end | ||
|
||
C = @nref $N A d->d == dim ? sort!(uniquerows) : (1:size(A, d)) | ||
|
||
ib = Array(Vector{Int},length(ia)) | ||
for k = 1:length(ia) | ||
ib[k] = Int[] | ||
end | ||
for h = 1:length(ic) | ||
push!(ib[ic[h]], h) | ||
end | ||
|
||
return C, ia, ib, ic | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a general description of the meaning of this vector? Then the examples below will make it more concrete. Without this "and so forth for higher dimensional arrays." is a bit imprecise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder whether the names of the arguments couldn't be improved.Does
ia
stand forindices in A
andic
forindices in C
? In that case, as in MATLAB, you'd better renameitr
toa
, and explicit this meaning in parentheses.It also means
ib
doesn't stand for anything. Actually, I think you'd better not return it at all, as it can trivially be computed fromic
IIUC. At the very least, it should have a more explicit name and be returned as the last result, as it is quite different fromia
andib
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choice of names
ia
andic
were based on the MATLAB(R) names but can certainly be changed as desired by the community.Choice ofEDIT:itr
was just based on making minimal changes from the existingunique
function, butA
would be a closer choice to the existing MATLAB name.itr
has been changed toA
in the working copy of a doc string.Along with
C
, the vector of vectorsib
is essentially part of what is returned from thegroup
function in q, also known as the left (=) operator in K. Returning this output is functionality that was specifically requested of me to implement, but is also why I mentioned above that it might be desired to have multiple functions returning different outputs instead of just one returning the four outputs included here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Julia too,
itr
is usually reserved for general iterables, anda
orA
used for arrays.Maybe we can find a short way of computing it from
ic
, which could be given in the docs for those who need it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For people coming to Julia from K/q, having to execute two functions to get back the results of what is a single character operator (
=A
) applied to an array is incredibly verbose. For that part of the audience, a single function call will likely be preferred.