Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return index vectors from unique #1845

Closed
ViralBShah opened this issue Dec 28, 2012 · 27 comments
Closed

Return index vectors from unique #1845

ViralBShah opened this issue Dec 28, 2012 · 27 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@ViralBShah
Copy link
Member

In Matlab, unique can return optional index vectors, such that one can construct the unique version from the input and vice versa. This is quite handy, and would be nice to have in julia.

http://www.mathworks.com/help/matlab/ref/unique.html

We would probably need the function to be called unique_ind or something, until we have keyword args.

@johnmyleswhite
Copy link
Member

This is closely linked to the sortperm and order discussion we had some time back. I've since come to wonder if maybe we should call these sorts of things indunique and indsort to be consistent with indmax and indmin. Kind of ugly names, but it has the benefit of being very consistent.

@ViralBShah
Copy link
Member Author

I prefer the consistency but I would prefer ind to be a suffix making the functions discoverable through auto completion when you press tab.

@StefanKarpinski
Copy link
Member

A more composable solution would be better here. Consider the way that sortperm is implemented in terms of applying sort! to an index vector and using the Perm ordering type. The same approach could be used to generically allow using the exact same code for all of these functions and getting back indices instead of the sorted data.

@ViralBShah
Copy link
Member Author

+1 to following a similar approach here as well.

@ViralBShah
Copy link
Member Author

What we need is uniqueperm and then implement unique using that.

@tkelman
Copy link
Contributor

tkelman commented Apr 17, 2015

Would uniqueperm return the second output from the Matlab version, or the third output? Both are quite useful in different applications.

Side note, one of the most aggravating things Mathworks ever did IMO was changing the default result ordering of the index outputs from unique from "last match" to "first match." That broke so much of my code...

@StefanKarpinski
Copy link
Member

Unique perm wouldn't actually return a permutation, would it?

@tkelman
Copy link
Contributor

tkelman commented Apr 17, 2015

ah yeah it's not really the same thing, you get a many-to-few index map and/or a few-to-many index map

@StefanKarpinski
Copy link
Member

Let's not call it "perm" then.

@ViralBShah
Copy link
Member Author

Yes, it is not strictly a permutation. Perhaps a selection or something.

@johnmyleswhite
Copy link
Member

Hi @drgar, I'm really glad you're excited about Julia. Unfortunately, posting GPL code in a public forum puts us in legal danger, so I deleted all of the code you posted without reading it. It's a frustrating limitation, but you cannot safely post code to GitHub that you don't own the copyrights to.

@johnmyleswhite
Copy link
Member

Don't worry about it. I caught it fast enough that it's not an issue. I just wanted to make sure you understood why I edited your comment.

@nalimilan
Copy link
Member

@johnmyleswhite I think your statement is a bit too strong here. Posting the GPL code in public does not create any legal danger. Please don't spread the idea that GPL is something dangerous.

The problem is only about Julia developers being suspected of having read and taken inspiration of this code when writing BSD-licensed code in Julia. (So I agree it's better remove the code from this issue.)

@johnmyleswhite
Copy link
Member

@nalimilan: You're right. The correct summary is that posting GPL code in a public forum read by Julia developers is dangerous. Posting GPL code on the internet is harmless. It's only when you post the code somewhere that could be construed as influencing Julia's development that Julia is put in jeopardy.

@StefanKarpinski
Copy link
Member

We probably err on the side of being more vigilant than necessary, but we really do not want to accidentally create a situation – or the perception of one – where Julia is violating the copyrights of other projects, GPL or otherwise. If our GitHub repositories are littered with GPL code and even vaguely similar code – written independently – ends up in Julia, that could cause some very unfortunate legal doubts. This is why we're very careful not to post any GPL or proprietary code around here.

@AndyGreenwell
Copy link
Contributor

Please review the PR in #14142 and provide any comments related to implementation (choice of output arguments, efficiency of algorithmic choices, etc.) as well as documentation, and function name. I do still need to add a few tests for automated testing purposes, but there are a few examples of usage in the doc string.

@ViralBShah
Copy link
Member Author

ViralBShah commented Jul 17, 2017

This functionality was further discussed in #15503 implemented in https://github.com/AndyGreenwell/GroupSlices.jl. We can decide to refactor it, or bring into base as necessary.

@tribut
Copy link

tribut commented May 27, 2019

Given that the package is abandoned and unusable on current Julia, is there any interest to bringing this to Base eventually?

@kmsquire
Copy link
Member

@ueliwechsler
Copy link

Since GroupSlices does not seem to be maintained. Maybe this functionalities, as other julia implementations of Matlab functions could be added to the package https://github.com/ChrisRackauckas/VectorizedRoutines.jl.

@StefanKarpinski
Copy link
Member

The way forward with GroupSlices (Andy seems unresponsive—he probably never looks at GitHub anymore), is to figure out if there's a clean and relatively simple API to be extracted from it that covers the common use cases and then maybe import that into Base or put it into a new package. As I recall, there was a PR to put it into Base but it was just a little too sprawling of an API to be satisfying.

@biona001
Copy link

Has anything been done to include the functionalities of GroupSlices.jl into Base?

@StefanKarpinski
Copy link
Member

That’s what my last comment was addressing: what was attempted, why it wasn’t satisfactory, and what would need to be done to make forward progress.

@juliohm
Copy link
Contributor

juliohm commented May 17, 2020

From the discussion in this thread it is not clear what is the current status of this more general interface (i.e. sort/sortperm, unique/uniqueind, ...). Is there a package where the ideas discussed here are being explored?

@stevengj
Copy link
Member

Note that since unique nowadays accepts a function argument (#13622), you can simply implement uniqueind by:

uniqueind(x) = unique(i -> x[i], eachindex(x))

as suggested here.

Since this is a one-liner, there's not a pressing need to provide a library function for this. On the other hand, this trick is not obvious to new users and it seems to be requested a lot.

@nalimilan
Copy link
Member

Maybe the trick could be mentioned in the docstring?

@stevengj
Copy link
Member

stevengj commented May 12, 2022

Note that uniqueind(x) = unique(i -> x[i], eachindex(x)) is equivalent to the second output of Matlab's unique function, however, not the third output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests