-
Notifications
You must be signed in to change notification settings - Fork 706
Read and write compressed CSVs to S3 #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@JasonSanchez good to know about this PR, for sure it is something we have interest. We will add support for it as soon Pandas P.S. If we have a new Wrangler's release before it, we will update/fix the comment mentioned. Thanks! |
As pandas 1.2.0 has been released, has there been any updates to when |
We've added support to it in the PR above 👆 . pip install git+https://github.com/awslabs/aws-data-wrangler.git@write-compressed-text |
Hey, @igorborgest. Thanks for the quick response! I'll give this a go later this afternoon when I get |
It appears to working if I just read/write a gzip-compressed CSV directly to S3. However, I was following this notebook you authored to attempt to do a CSV partition (literally replaced I'm not sure if this later bit is expected behavior or not. I have not used CSV partitions before. EDIT: For completeness, when I attempt to read in the partition with the filter
|
Hi @gvermillion , thanks for testing it. Actually it is not related with the compression itself, it was a limitation in the CSV Datasets implementation that didn't support CSV headers. But I've just updated the branch adding support for that.
Could you give it another try? p.s. Uninstall the previous installation explicitly ( |
Hello @igorborgest , The updates worked as expected for me. Is there any estimate for when this update will be released? Thanks for the quick turnaround and addressing my follow-up question! |
It should be released on version |
Great. Thanks so much! |
Released on version 2.3.0 🚀 |
pandas-dev/pandas#35129 was recently merged into pandas-dev:master.
The comment below is unclear and makes it seem as if the feature were removed from pandas instead of that the feature will soon be available in pandas (and hopefully therefore in wrangler):
Anyways, hoping this would be on your roadmap. Thanks for the great tool!
The text was updated successfully, but these errors were encountered: