Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for function uniqueslices #14142

Closed
wants to merge 7 commits into from
Closed

Proposal for function uniqueslices #14142

wants to merge 7 commits into from

Conversation

AndyGreenwell
Copy link
Contributor

This commit introduces a function uniqueind that is a slight modification of the existing unique function accepting a dim argument but returns an additional three index vectors.

The outputs, as described in the docstring, are as follows:

C - an array of the unique elements/rows/columns/hyperplanes of the input AbstractArray itr along the dimension dim

ia - a vector of index values such that slice indexing into itr along dimension dim re-produces C.
For example, if itr is a vector, itr[ia] == C returns true

ib - a vector of vectors of integers, where each vector of integers contains the index positions of the unique elements/rows/columns/hyperplanes within C

ic - a vector of index values such that slice indexing into C along dimension dim re-produces itr
For example, if itr is a vector, C[ic] == itr returns true

The presence of ia, ib, and ic are meant to mimic the behaviors (but with potentially different output order) of various unique and group functions found in a few other technical computing languages. This function provides all three of those outputs from a single function, while some other languages might only produce one or two arrays of index vectors from a given function.

It might be decided upon review that instead of a single function returning all four outputs, there should be multiple functions that return separate output arguments and more closely mimic similar functions from other languages.

This PR is in relation to issue #1845.

This commit introduces a function `uniqueind` that is a slight modification of the existing `unique` function accepting a `dim` argument but returns an additional three index vectors.

The outputs, as described in the docstring, are as follows:

`C` - an array of the unique elements/rows/columns/hyperplanes of the input AbstractArray `itr` along the dimension `dim`
`ia` - a vector of index values such that slice indexing into `itr`along dimension `dim` re-produces `C`.  
        For example, if `itr` is a vector, `itr[ia] == C` returns `true`
`ib` - a vector of vectors of integers, where each vector of integers contains the index positions of the unique elements/rows/columns/planes within `C`
`ic` - a vector of index values such that slice indexing into `C` along dimension `dim` re-rproduces `itr`
        For example, if `itr` is a vector, `C[ic] == itr` returns `true`

The presence of `ia`, `ib`, and `ic` are meant to mimic the behaviors (but with potentially different output order) of various unique and group functions found in a few other technical computing languages.  This function provides all three of those outputs from a single function, while some other languages might only produce one or two arrays of index vectors from a given function.

It might be decided upon review that instead of a single function returning all four outputs, there should be multiple functions that return separate output arguments and more closely mimic similar functions from other languages.

This PR is in relation to issue #1845.
`C[:,:,ic] == itr` returns `true if `itr` is a three-dimensional array and `dim == 3`
and so forth for higher dimensional arrays.

Examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably be formatted as doctests via ```jldoctest

Removing a trailing whitespace that caused a build failure.
Missed a few last time.
One more trailing whitespace.

C - the unique elements of the array `itr` along the selected dimension `dim`
ia - a Vector{Int} of indices such that:
`itr[ia] == C` returns `true` if `itr` is a one-dimensional array and `dim == 1`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this more simply as "itr[ia] == C", without " returns true"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested.

@nalimilan
Copy link
Member

Thanks for this. Here are a few general comments.

One is about the naming. I would be fine with uniqueind, but it is very similar to findmin/findmax, which are called quite differently (we also have indmin and indmax, which are a bit different). In #1845 it was mentioned that adding a suffix is a better solution since it makes it easier to find the function from unique thanks to autocompletion. It is also more consistent with sortperm. If we go that route, we need to deprecate findmin and findmax in favor of maxind and minind, or maybe better maximumind and minimumind (to avoid the confusion with max and min, which are different functions than maximum and minimum).

A more complex question concerns composability, as noted by @StefanKarpinski here: #1845 (comment) Do you have any idea about whether it would be doable to base unique on uniqueind without affecting performance?

Another point is that it would be great to have a generic implementation for iterables, which wouldn't support the dims argument, but would take an optional function like unique.

Finally, you should definitely add tests for this. You can take inspiration from those exiting for unique.

@AndyGreenwell
Copy link
Contributor Author

In response to the four points from @nalimilan above:

  • Regarding function naming, I just went with a version of the first name that @ViralBShah suggested in Return index vectors from unique #1845 and figured that would trigger this discussion to continue once again. Suggestions are welcome, and should be discussed in light of the suggested deprecations.
  • Regarding composability, it seems to me as if basing unique off whatever is decided upon for this function (or multiple functions that might come from this work) would affect performance of unique negatively, because this new function is performing extra work to return the additional outputs, but also because the type of the output is different between unique (an array) and this function (a tuple of multiple arrays).
  • I'll have a go at determining how this function might be structured for iterables, but that might take a little while to get right.
  • Tests will be added, and the examples converted to doctests this coming week. It is Thanksgiving weekend here in the US. I finished the initial implementation at the end of a workday a couple of hours before needing to go to the airport, and just wanted to get the code out into the world so people could review and comment. Thank you for doing so.

@nalimilan
Copy link
Member

Regarding composability, it seems to me as if basing unique off whatever is decided upon for this function (or multiple functions that might come from this work) would affect performance of unique negatively, because this new function is performing extra work to return the additional outputs, but also because the type of the output is different between unique (an array) and this function (a tuple of multiple arrays).

I hadn't noticed this difference with unique. Any reason why you chose a different behavior? I think it would be better to be consistent, whether unique or you code is changed. It sounds a bit weird to me that unique returns an array, as this can be quite wasteful if the number of unique elements varies greatly across dimensions.

@nalimilan
Copy link
Member

CC: @simonster, who wrote unique(a, dims).

@AndyGreenwell
Copy link
Contributor Author

The output of unique when the dim argument is passed is essentially C from this function. I just gave it a name so that I could return other outputs.

@nalimilan
Copy link
Member

OK, I misunderstood your reply, I thought you said C was different.

@ViralBShah ViralBShah added this to the 0.5.0 milestone Nov 29, 2015
@AndyGreenwell
Copy link
Contributor Author

It sounds a bit weird to me that unique returns an array, as this can be quite wasteful if the number of unique elements varies greatly across dimensions.

When the dim argument is applied in unique on a 2-dimensional or higher array (matrix, 3D-array, ND-array, etc) the unique entries returned as part of C are the unique rows/columns/planes/hyperplanes of the input array along dim, not unique individual scalar elements across the entire array nor individual unique scalar elements across individual rows/columns/planes/hyperplanes. This is similar behavior to both the MATLAB(R) unique function and the K/q group(=) function.

@tkelman
Copy link
Contributor

tkelman commented Nov 29, 2015

That seems to have more in common with sortrows. In Matlab you have to specifically ask for unique(A, 'rows') and it changes behavior enough that I'm wondering whether we should give it a separate name.

@AndyGreenwell
Copy link
Contributor Author

Suggestions for separate names (and separate functions with different combinations of outputs) are welcome and encouraged.

This issue provides a set of possible output arguments that have been found useful in different technical computing languages. Defining the appropriate function names and api(s) for each possible set of output arguments that might need to be returned is what will hopefully be the result that closes this issue and #1845.

This commit makes the simple doc updates suggested in #14142 to change `itr` to `A` and use a fixed set of integers, instead of random integers, when constructing the multidimensional array examples.
@AndyGreenwell
Copy link
Contributor Author

Performing a search for synonyms of "unique", here are a few possible terms:

synonyms: distinctive, distinct, individual, special, idiosyncratic; single, sole, lone, unrepeated, unrepeatable, solitary, exclusive, rare, uncommon, unusual, sui generis;
informal: one-off, one-of-a-kind, once-in-a-lifetime, one-shot

What might people think of naming the function(s) distinct? Or since this two argument function operates on hyperplanes, distinctplanes? The other term in that list that (to me) most clearly describes the intent would be unrepeated, but I guess I like the idea of a name in the affirmative as opposed to a negative.

@tkelman
Copy link
Contributor

tkelman commented Nov 30, 2015

uniqueslices, uniqueind, or distinct could all work for different pieces of this. distinct has precedent in SQL, right? And isn't your ib just groupby ? edit: there's a groupby in Iterators.jl, worth looking at?

@AndyGreenwell
Copy link
Contributor Author

Along with C, ib is essentially group (=) from K/q, but instead of returning a dict/table, its an array and vector of vectors.

Since no separate conditionals or functions to apply are being passed here, I would guess that this is a bit less generic than mostgroupby implementations.

The groupby in Iterators.jl is described as performing the following operation "Group consecutive values that share the same result of applying f." What is returned with C and ib in the code for this PR does not concern itself with consecutive values, so it's different behavior.

Since all of the English words are taken, why not try using a similar word from another language (according to Google Translate):

English (or French or Latin) "unique" to:

Spanish - único
Italian - unico
German - einzigartig
Dutch - uniek
Filipino - kakaiba
Malay, Albanian - unik
Esperanto - solaj
Catalan - únic
Croatian - jedinstvena
Azerbaijani - unikal
Basque - berezia
Swahili - kipekee
Welsh - unigryw
Lithuanian - unikalas

@nalimilan
Copy link
Member

I like único in that people with a non-Latin 1 keyboard won't likely know how to type it, but kakaiba wins (ex æquo with some others) regarding non-discoverability. :-)

Seriously, uniqueslices sounds like the best choice to me. It starts with the same prefix, which makes it easy to find both unique and uniqueslices when looking for unique-like behavior, and it's quite explicit about what it does. sortrows and sortcols could later be renamed to sortslices for full generality.

(groupby serves a different purpose, as it applies a reduction to rows based on some variables. distinct could be used, but isn't immediately obvious how it differs from unique, and actually some SQL implementations call it unique...)

@tkelman tkelman added the needs tests Unit tests are required for this change label Dec 2, 2015
@ViralBShah
Copy link
Member

@StefanKarpinski Thoughts on uniqueslices?

Needs tests and also travis to pass.

Updating function name to `uniqueslices` primarily just to kick off a new round of Travis and AppVeyor tests, because the Travis tests failed last time for reasons seemingly unrelated to the contents of this file.
@AndyGreenwell
Copy link
Contributor Author

Updated the name of the function to uniqueslices to kick-off a new Travis build. The last one failed for reasons seemingly unrelated to my previous changes.

I'll add tests once the name is approved (and more so once I wrap up some other work today).

@AndyGreenwell AndyGreenwell changed the title Proposal for function uniqueind Proposal for function uniqueslices Dec 3, 2015
@AndyGreenwell
Copy link
Contributor Author

The current Travis failure is due to the test for spawn, not the code associated with the function from this PR. Looks like I need to update this branch to reflect the merge of #14123.

@tkelman
Copy link
Contributor

tkelman commented Dec 3, 2015

more likely that error is the fault of recent libuv changes, not anything about needing a rebase

From worker 3:       �[1m*�[0m �[31mspawn                �[0m       �[34m[stdio passthrough ok]�[0m



signal (2): Interrupt: 2

while loading /private/tmp/julia/share/julia/test/spawn.jl, in expression starting on line 99

_sigtramp at /usr/lib/system/libsystem_platform.dylib (unknown line)

unknown function (ip: 0x0)

Assertion failed: (Val && "isa<> used on a null pointer"), function doit, file /private/tmp/llvm33-julia-IYf1/llvm-3.3.src/include/llvm/Support/Casting.h, line 97.

Assertion failed: (Val && "isa<> used on a null pointer"), function doit, file /usr/local/Cellar/llvm33-julia/3.3_1/lib/llvm-3.3/include/llvm/Support/Casting.h, line 97.

Worker 3 terminated.

ERROR (unhandled task failure): EOFError: read end of file

@StefanKarpinski
Copy link
Member

I can see the utility of this function but the signature just seems so fiddly and specific, I'm not sure...

@StefanKarpinski
Copy link
Member

I'm wondering if we can return one thing that can be used to easily compute the other things. For example, if what's returned is the vector of vectors of indices where each vector corresponds to a unique slice and gives the indices where that slice occurs. Is this ib? In that case you can construct ic as map(first, ib); you may also be interested in things like map(last, ib) which constructs the same C but using the last representative instead of the first. Since C can be constructed as A[:, map(first,ib)] it seems unclear if we really need to return it – isn't that likely to be just as efficient in the case where you need C while it's much more efficient not to construct C if you don't need it? (You also have the option to construct it from the last representative easily, for example.) I'm not sure how to turn ib into something from which you can reconstruct A from C easily, but it's clearly possible. Maybe the function could return just ib and ia?

@AndyGreenwell
Copy link
Contributor Author

if what's returned is the vector of vectors of indices where each vector corresponds to a unique slice and gives the indices where that slice occurs. Is this ib?

Yes, that is ib

There are use cases where the output information one wants to use could be:

  • C - this is the current two argument unique(A,dim)
  • C, ib - this is the group case from K/q, but returning a vector of vectors instead of a dict/table
  • C, ia, ic - the is like the unique(A,'rows') case from MATLAB(R) (but in their case it's not multidimensional as the two argument Julia version of unique(A,dim) currently allows).
  • possibly just using ia, ib or ic on their own without later needing C.
  • possibly other combinations...

The inspiration for me doing this work is a code I am working at present that needs either C on its own or C and ib at separate locations in the application. But I knew there would be others more interested in using ia and ic given the usage of unique in other languages.

What @StefanKarpinski states here:

In that case you can construct ic as map(first, ib); you may also be interested in things like map(last, ib) which constructs the same C but using the last representative instead of the first.

is actually how to construct ia from ib:

ia = map(first,ib) # Equivalent to what I am currently doing in this PR
ia = map(last,ib)  # Definitely something that could be useful as well

And C can be calculated from A and ia as follows:

C == A[:,:,ia,:]  # For whatever dimension `dim` to which `ia` should correspond (in this example dim == 3)

In looking at the code again, there is a way to construct ic from ib as well.

A method for doing so could be the following (which I am sure could be optimized from the below):

n = 0
for i = 1:length(ib)
    n += length(ib[i])
end
ic = Array(Int,n)
for i = 1:length(ib)
    for k = 1:length(ib[i])
        ic[ib[i][k]] = i
    end
end

All in all, in terms of returned index vectors, one only needs to return ib and then ia and ic can be calculated from ib. This implies (to me anyway) that the K/q-like group function (returning ib) is the fundamental operation. Other operations for returning C, ia and ic could be wrapper functions on top of a function that returns ib.

Defining the APIs for the wrapper functions is a useful exercise, because there is no need for people to copy code around to calculate C, ia, or ic themselves since we have code here for obtaining each of these values. Convenience wrapper functions will be appreciated.

So I guess I am now back to my main question from earlier in this thread. What would be appropriate function names and argument signatures for calculating and returning different sets of these outputs?

@AndyGreenwell
Copy link
Contributor Author

I created two separate gists with possibilities for the core function and various wrappers for function signatures having various outputs:

The included function names are currently foo, bar, baz, qux, norf, bletch and grunt and will probably need to be replaced with appropriate function names.

The difference between the two gists is mainly related to which outputs the core function foo returns and then what operations are needed in the wrappers to return the other outputs.

Comments and function name suggestions greatly appreciated.

@ViralBShah
Copy link
Member

Bump. What do we need to do to get this merged now?

@AndyGreenwell
Copy link
Contributor Author

Same tasks as in December...

  1. Decide which of those two gists to use for the core function and wrappers
  2. Provide appropriate names for those functions...unless everyone likes those names. :)
  3. Add tests based on the decisions for 1 and 2.

@gajomi
Copy link
Contributor

gajomi commented Feb 14, 2016

Some comments on the task list:

Decide which of those two gists to use for the core function and wrappers
Provide appropriate names for those functions...unless everyone likes those names. :)

The proposal for the core function in the first gist makes sense to me, in that everything can be computed in that everything can be computed from ib given dim and A. One possible name for this function (called foo in your proposal) would be finduniquesices or findalluniquesices, but I don't have a strong opinion here about the naming. With respect to the various wrappers, I would advocate to include one called uniqueslices returning the slice data itself as in the current unique(A, dim) method. I don't really see a strong need for the other wrappers being added to Base. If someone really needs these they have all the information in the core function foo. If there is an outcry down the road for additional functionality your implementation makes it easy to add.

I should like to note that if the current unique(A, dim) becomes uniqueslices one should not copy the doc string, which seems to be incorrect at this time. I was going to fix it up as part of WIP at #15009, but perhaps this is the better place for that work to get done.

Add tests based on the decisions for 1 and 2.

While the exact tests might have to wait for the implementation, we can talk about test arrays beforehand. One of the main use case I see for unique slice like methods (I recall people writing similar Python function) is in a network theory context, where A is a matrix of Bools describing connections in a graph with labeled nodes. In this case there is an interest in finding identical rows or columns as part of identifying symmetries in the network. So I would submit that there needs to be an AbstractArray{Bool,2} test case.

@gajomi
Copy link
Contributor

gajomi commented Feb 14, 2016

Thinking along these lines, I also wanted to ask a question (to @AndyGreenwell and maybe also to @simonster) about the algorithm design and potential performance consequences. Rather then eagerly computing a hash for an entire row and then checking for collisions, would it make sense to continuously accumulate partial hashes only while they are uncollided? For example, in the case of the array of Ints:

julia> A 
4x5 Array{Int8,2}:
 107  28  22  121  106
   8  51   5   60  116
  36  41  57  106   49
  86  62  97   13    8

One can see that each row/column slice is unique just by inspecting the first element in each column/row. In general, whenever an array is large and has "sufficiency diverse" entries it would be preferable to verify the absence of collisions earlier rather than later.

Now in the case of a matrix of Bools of size m by m, which is perhaps the more important one, one expects there to be collisions on partial hashes for at least log_2(m) iterations, so the strategy of checking at every column iteration isn't so wise. But it seems to me that checking at systematic points (perhaps logarithmically spaced from 1 to m) in a way that didn't incur too much checking overhead could improve performance.

Is this too ambitious or does it seem sensible?

@nalimilan
Copy link
Member

I agree with @gajomi that merging the core function under the name finduniqueslices or uniquesliceinds, as well as a wrapper uniqueslices, is the priority. Then other wrappers, which are quite short, can be added later if people ask for them. Anyway, it's always better to keep PRs simple to merge them as fast as possible and keep them focused, leaving extensions for other PRs.

Implementation-wise, do you really need generated functions? Cc: @timholy

@StefanKarpinski
Copy link
Member

My suspicion is that the most fundamental operation here is assigning group numbers to each slice – i.e. computing ic. Then ic can be turned into ib or ia with fairly little work – and are operations we might want standard library functions for. Going from ic to ib is some kind of groupinds operation: turn a vector of values into a vector of vectors of indices into the original array. Going from ic to ia is something like a firstinds operation: take a vector of values and return a vector of the indices of the first occurrence of each value. The unique function could be expressed as slicing into the original collection with the result of applying firstinds. The operation that assigns group numbers to slices could be called groupslices or something like that. Then you could write

ic = groupslices(A, dim)
ib = groupinds(ic)
ia1 = firstinds(ic)
ia2 = map(first,ib)
ia3 = map(last,ib)

My hunch (unverified) is that computing ic this way (by hashing each slice and assigning group numbers) and then computing ib from ic is not slower than computing ib and ic together. Computing ia from ib is easy and computing it from ic isn't hard either.

AndyGreenwell added a commit to AndyGreenwell/julia that referenced this pull request Mar 6, 2016
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments.  Still needs tests and doc integration.
@AndyGreenwell
Copy link
Contributor Author

The function signatures suggested by @StefanKarpinski are implemented in the following commit in my current fork of Julia.

The only modifications to the API he provides above are that I have two methods for firstinds, one that accepts a Vector{Int} corresponding to ia1 = firstinds(ic) and a second that accepts a Vector{Vector{Int}} corresponding to ia2 = map(first, ib).

The line ia3 = map(last,ib) is implemented as lastinds(ib) which accepts a Vector{Vector{Int}}.

The actual implementations of these functions cut out many of the operations used for calculating the output array (C) from unique(A,dim). Instead, only vectors of indices (or vectors of vectors of indices) are calculated and returned in these functions.

Please review and comment, and then I will create an updated PR based on any requested changes.

@StefanKarpinski
Copy link
Member

@AndyGreenwell, did you happen to benchmark to see if my hunches about combined vs separated speed were correct?

@simonster
Copy link
Member

I'm not sure that the code in that commit is right, since it doesn't seem to handle hash collisions.

@AndyGreenwell
Copy link
Contributor Author

@StefanKarpinski I have not performed comparison timings as of yet, but will do so.

@simonster The code in the commit to my current fork does not change the method I used for determining the entries in output vector ic from either my original files modified in this PR or the two gists referenced above.

Consequently, if modifications need to be made to the determination of the entries populating ic, then I would definitely appreciate any specific suggestions on line modifications that would incorporate the existing hash collision detection code from your original unique(A,dim) implementation. I will work on determining the appropriate modifications, but any specific assistance/suggestions that might be provided are greatly appreciated. Thanks.

AndyGreenwell added a commit to AndyGreenwell/julia that referenced this pull request Mar 11, 2016
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function.  The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic.  As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic.  As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices.  For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.
@AndyGreenwell
Copy link
Contributor Author

So it seems that uniquerow within the original unique(A,dim) implementation is equivalent to what I was calculating as ic, but was already taking into account hash collision detection.

I have updated the version of groupslices within my current Julia fork in this next commit. With this update, groupslices(A,dim) essentially performs the exact same operations as unique(A,dim), but with a couple of differences:

Please review that commit and let me know if there are any comments.

If approved, then I will bring my fork up to date and submit a PR.

@tkelman
Copy link
Contributor

tkelman commented Mar 12, 2016

Looks like it will need to be a new PR, did you delete your fork then recreate it or something? It now says "from unknown repository" here. Generally a better idea to create a new branch for any PR rather than working on your fork's master.

AndyGreenwell added a commit to AndyGreenwell/julia that referenced this pull request Mar 12, 2016
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments.  Still needs tests and doc integration.
AndyGreenwell added a commit to AndyGreenwell/julia that referenced this pull request Mar 12, 2016
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function.  The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic.  As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic.  As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices.  For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.
AndyGreenwell added a commit to AndyGreenwell/julia that referenced this pull request Mar 14, 2016
This commit adds updated argument signatures for the functionality discussed in JuliaLang#14142 wherein each of these functions only returns one single output argument containing index vectors of various purposes related to the unique slices currently returned from the unique function taking two input arguments.  Still needs tests and doc integration.
AndyGreenwell added a commit to AndyGreenwell/julia that referenced this pull request Mar 14, 2016
This commit alters the current groupslices function to return the vector uniquerow that was originally calculated within the existing unique function.  The values contained within uniquerow for cases where there are no hash collisions are actually equal to what I was calculating in array ic.  As @simonster pointed out in comment JuliaLang#14142 (comment) the previous commit was not taking into account hash collisions for the values in ic.  As uniquerow within unique was already calculating the values in ic, taking into account hash collisons, and updating its values accordingly, we can just return uniquerow from groupslices.  For continuity with the conversation in JuliaLang#14142, I currently have assigned ic as an alias for uniquerow, but that can certainly be removed.

Change error to ArgumentError

uniquerow is ic, and seems to handle hash collisions already

Add initial tests for multidimensional.jl for groupslices and related functions

This commit adds some initial tests for the functions groupslices, groupinds, firstinds, and lastinds.  The initial tests are very basic and do not currently include a test for a multidimensional input array that would have hash collisions between slices.
@AndyGreenwell
Copy link
Contributor Author

New PR #15503 has been created to replace functionality in this PR. Closing this PR.

@tkelman tkelman removed the needs tests Unit tests are required for this change label Mar 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants