-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Roundup #4
Comments
I've spent some time going through the libraries I mentioned and wrote up a quick introduction / analysis to each. I've uploaded it as PR #5. You can also view it rendered here. Basically, the TL;DR is: none of the libraries really cover (or would be able to cover) all the use cases we mention in #3, but I feel like we should be able to come up with a design that hits everything we're looking for by pulling ideas from each. |
Nice work! Thank you for taking the time to go through all of them! I'm the maintainer of black-jack, and the DataFrame side of things needs more love indeed, I've focused a lot on the functionality of Series to start with, and now with your write up, I'm a bit confused myself why it doesn't support arbitrary data types... Regarding the statement:
Can you point me to where this is? I interpreted this as attempting to get a column from the dataframe without knowing its type. If this is the case, it's not correct. DataFrame::get_column returns an Anyhow, good work again. I'm excited for how we can combine our efforts! 👍 |
You are correct; I was looking at the groupby implementation that does an |
Ah ok, thanks! That should be addressed anyway. Thanks again! |
Of interest to this group might be that the C++ community within Apache Arrow are kicking off a data frame project within the larger Arrow project. Some of the leaders in the Arrow C++ community have significant experience building such libraries, we could build our data frame solution within Arrow and leverage this knowledge. Just a thought... |
Hi @jblondin , Great work. Can you complete the page with rust channel (stable or nighlty) ? For my case nightly is a "no-go" to use a crate. |
The main issue with pandas dataframe is it is not distributed. It is a great library for a single node. Anaconda did try to make it distributed through Dask, but still the adoption is quite poor. |
I'm starting to work on a 'WIP Roundup' document which will provide a initial introduction and analysis of the existing DataFrame WIPs.
Currently, I'm looking at:
Are there any more we'd like to add to this list?
The text was updated successfully, but these errors were encountered: