-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for function uniqueslices #14142
Closed
Closed
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6873024
Proposal for function uniqueind
AndyGreenwell 4d0a8c1
Remove trailing whitespace
AndyGreenwell d0b09ab
Rest of the trailing whitespaces
AndyGreenwell cdfd617
Last one...hopefully.
AndyGreenwell 547600a
Simple doc updates
AndyGreenwell 0779baf
Whitespace removal from last update.
AndyGreenwell 7fe7997
Changing function name to `uniqueslices`
AndyGreenwell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -863,3 +863,193 @@ If `dim` is specified, returns unique regions of the array `itr` along `dim`. | |
@nref $N A d->d == dim ? sort!(uniquerows) : (1:size(A, d)) | ||
end | ||
end | ||
|
||
""" | ||
C, ia, ib, ic = uniqueslices(A, dim) | ||
|
||
A function that operates similiarly to `unique(A,dim)` but returns multiple | ||
output arguments having the following properties: | ||
|
||
C - the unique elements of the array `A` along the selected dimension `dim` | ||
ia - a Vector{Int} of indices such that: | ||
`A[ia] == C` if `A` is a one-dimensional array and `dim == 1` | ||
`A[ia,:] == C` if `A` is a two-dimensional array and `dim == 1` | ||
`A[:,ia] == C` if `A` is a two-dimensional array and `dim == 2` | ||
`A[:,:,ia] == C` if `A` is a three-dimensional array and `dim == 3` | ||
and so forth for higher dimensional arrays. | ||
ib - a Vector{Vector{Int}} where each Vector{Int} contains the indices associated with the | ||
individual entries of `C` along the dimension `dim` | ||
ic - A Vector{Int} of indices such that: | ||
`C[ic] == A` if `A` is a one-dimensional array and `dim == 1` | ||
`C[ic,:] == A` if `A` is a two-dimensional array and `dim == 1` | ||
`C[:,ic] == A` if `A` is a two-dimensional array and `dim == 2` | ||
`C[:,:,ic] == A` if `A` is a three-dimensional array and `dim == 3` | ||
and so forth for higher dimensional arrays. | ||
|
||
Examples: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these should probably be formatted as doctests via ```jldoctest |
||
|
||
julia> A = [1;2;3;2;3;5;6;5;7;1]; | ||
julia> C, ia, ib, ic = uniqueind(itr,1) | ||
([1,2,3,5,6,7],[1,2,3,6,7,9],[[1,10],[2,4],[3,5],[6,8],[7],[9]],[1,2,3,2,3,4,5,4,6,1]) | ||
julia> C[ic] == A | ||
true | ||
julia> A[ia] == C | ||
true | ||
julia> ib | ||
6-element Array{Array{Int64,1},1}: | ||
[1,10] | ||
[2,4] | ||
[3,5] | ||
[6,8] | ||
[7] | ||
[9] | ||
|
||
julia> D = [1 2 3 ; 4 5 6]; | ||
julia> E = [11 12 13; 14 15 16]; | ||
julia> F = [21 22 23; 24 25 26]; | ||
julia> A = cat(3, D, F, E, E, D) | ||
2x3x5 Array{Int64,3}: | ||
[:, :, 1] = | ||
1 2 3 | ||
4 5 6 | ||
|
||
[:, :, 2] = | ||
21 22 23 | ||
24 25 26 | ||
|
||
[:, :, 3] = | ||
11 12 13 | ||
14 15 16 | ||
|
||
[:, :, 4] = | ||
11 12 13 | ||
14 15 16 | ||
|
||
[:, :, 5] = | ||
1 2 3 | ||
4 5 6 | ||
|
||
julia> C, ia, ib, ic = uniqueind(A,3) | ||
( | ||
2x3x3 Array{Int64,3}: | ||
[:, :, 1] = | ||
1 2 3 | ||
4 5 6 | ||
|
||
[:, :, 2] = | ||
21 22 23 | ||
24 25 26 | ||
|
||
[:, :, 3] = | ||
11 12 13 | ||
14 15 16, | ||
|
||
[1,2,3],[[1,5],[2],[3,4]],[1,2,3,3,1]) | ||
|
||
julia> A[:,:,ia] == C | ||
true | ||
julia> C[:,:,ic] == A | ||
true | ||
julia> ib | ||
3-element Array{Array{Int64,1},1}: | ||
[1,5] | ||
[2] | ||
[3,4] | ||
|
||
""" | ||
@generated function uniqueslices{T,N}(A::AbstractArray{T,N}, dim::Int) | ||
quote | ||
1 <= dim <= $N || return copy(A) | ||
hashes = zeros(UInt, size(A, dim)) | ||
|
||
# Compute hash for each row | ||
k = 0 | ||
@nloops $N i A d->(if d == dim; k = i_d; end) begin | ||
@inbounds hashes[k] = hash(hashes[k], hash((@nref $N A i))) | ||
end | ||
|
||
# Collect index of first row for each hash | ||
uniquerow = Array(Int, size(A, dim)) | ||
ic = Array(Int, size(A, dim)) | ||
ia = Int[] | ||
firstrow = Dict{Prehashed,Int}() | ||
icdict = Dict{Int,Int}() | ||
iadict = Dict{UInt,Int}() | ||
h = 0 | ||
for k = 1:size(A, dim) | ||
tmp = get!(firstrow, Prehashed(hashes[k]), k) | ||
uniquerow[k] = tmp | ||
if !haskey(icdict,tmp) | ||
h += 1 | ||
icdict[tmp] = h | ||
ic[k] = h | ||
else | ||
ic[k] = icdict[tmp] | ||
end | ||
if !haskey(iadict,hashes[k]) | ||
iadict[hashes[k]] = k | ||
push!(ia,k) | ||
end | ||
end | ||
uniquerows = collect(values(firstrow)) | ||
|
||
# Check for collisions | ||
collided = falses(size(A, dim)) | ||
@inbounds begin | ||
@nloops $N i A d->(if d == dim | ||
k = i_d | ||
j_d = uniquerow[k] | ||
else | ||
j_d = i_d | ||
end) begin | ||
if (@nref $N A j) != (@nref $N A i) | ||
collided[k] = true | ||
end | ||
end | ||
end | ||
|
||
if any(collided) | ||
nowcollided = BitArray(size(A, dim)) | ||
while any(collided) | ||
# Collect index of first row for each collided hash | ||
empty!(firstrow) | ||
for j = 1:size(A, dim) | ||
collided[j] || continue | ||
uniquerow[j] = get!(firstrow, Prehashed(hashes[j]), j) | ||
end | ||
for v in values(firstrow) | ||
push!(uniquerows, v) | ||
end | ||
|
||
# Check for collisions | ||
fill!(nowcollided, false) | ||
@nloops $N i A d->begin | ||
if d == dim | ||
k = i_d | ||
j_d = uniquerow[k] | ||
(!collided[k] || j_d == k) && continue | ||
else | ||
j_d = i_d | ||
end | ||
end begin | ||
if (@nref $N A j) != (@nref $N A i) | ||
nowcollided[k] = true | ||
end | ||
end | ||
(collided, nowcollided) = (nowcollided, collided) | ||
end | ||
end | ||
|
||
C = @nref $N A d->d == dim ? sort!(uniquerows) : (1:size(A, d)) | ||
|
||
ib = Array(Vector{Int},length(ia)) | ||
for k = 1:length(ia) | ||
ib[k] = Int[] | ||
end | ||
for h = 1:length(ic) | ||
push!(ib[ic[h]], h) | ||
end | ||
|
||
return C, ia, ib, ic | ||
end | ||
end |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide a general description of the meaning of this vector? Then the examples below will make it more concrete. Without this "and so forth for higher dimensional arrays." is a bit imprecise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I wonder whether the names of the arguments couldn't be improved.Does
ia
stand forindices in A
andic
forindices in C
? In that case, as in MATLAB, you'd better renameitr
toa
, and explicit this meaning in parentheses.It also means
ib
doesn't stand for anything. Actually, I think you'd better not return it at all, as it can trivially be computed fromic
IIUC. At the very least, it should have a more explicit name and be returned as the last result, as it is quite different fromia
andib
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choice of names
ia
andic
were based on the MATLAB(R) names but can certainly be changed as desired by the community.Choice ofEDIT:itr
was just based on making minimal changes from the existingunique
function, butA
would be a closer choice to the existing MATLAB name.itr
has been changed toA
in the working copy of a doc string.Along with
C
, the vector of vectorsib
is essentially part of what is returned from thegroup
function in q, also known as the left (=) operator in K. Returning this output is functionality that was specifically requested of me to implement, but is also why I mentioned above that it might be desired to have multiple functions returning different outputs instead of just one returning the four outputs included here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Julia too,
itr
is usually reserved for general iterables, anda
orA
used for arrays.Maybe we can find a short way of computing it from
ic
, which could be given in the docs for those who need it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For people coming to Julia from K/q, having to execute two functions to get back the results of what is a single character operator (
=A
) applied to an array is incredibly verbose. For that part of the audience, a single function call will likely be preferred.