Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "nlargest()", "nsmallest()" methods. #579

Open
garciparedes opened this issue Feb 5, 2020 · 3 comments
Open

Implement "nlargest()", "nsmallest()" methods. #579

garciparedes opened this issue Feb 5, 2020 · 3 comments

Comments

@garciparedes
Copy link

It would be interesting to implement this kind of methods over the DataFrame class. I think the needed memory to compute that values shouldn't be so big and could be really interesting to provide insightful indicators.

Here is the pandas counterpart documentation:

@maartenbreddels
Copy link
Member

Hi Sergio,

yes, I think we can do this. It's a bit tricky with how vaex works internally, but I'll keep this in mind when doing some refactor work. Let's keep this issue open as a reminder.

cheers,

Maarten

@garciparedes
Copy link
Author

Hi @maartenbreddels,

Thank you so much for accepting this feature request. 🙂

If you don't mind, I've a doubt about why the vaex implementation of this kind of statistics would be a bit tricky. Is the reason related with the "vectorial" nature of the output?

@maartenbreddels
Copy link
Member

it has to do with how vaex filters. Vaex always works with the unfiltered raw data, which means it is always tricky to map between an unfiltered index (say the 6th element of the unfiltered array) and the filtered index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants