-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add gzip/bz2 compression to read_pickle() (and perhaps other read_*() methods) #11666
Comments
yeh, this wouldn't be hard for gzip/bz2 |
xref #5924 |
Yes please. Especially for |
I like xz/lzma2 format for pickle format 😄 |
@goldenbull pull-requests are welcome! (this is not very difficult, more of a bit of code reorg to share the compression code) |
Fantastic! @jreback, are there plans by chance to implement compression support across all |
@gfairchild what do you think is useful? keeping in mind maybe |
ahh I c, @gfairchild do you want to open a new issue (xref this one and the PR). for |
After considering more closely, |
Since this issue does say "perhaps other |
yeah let's just create a new issue |
Not a problem. I'll do that in a few hours. |
Sorry for the delay, but I just created the issue: #15644 |
@jreback I think |
closes pandas-dev#11666 Author: goldenbull <goldenbull@gmail.com> Author: Chen Jinniu <goldenbull@users.noreply.github.com> Closes pandas-dev#13317 from goldenbull/pickle_io_compression and squashes the following commits: e9c5fd2 [goldenbull] docs update d50e430 [goldenbull] update docs. re-write all tests to avoid round-trip read/write comparison. 86afd25 [goldenbull] change test to new pytest parameterized style 945e7bb [goldenbull] Merge remote-tracking branch 'origin/master' into pickle_io_compression ccbeaa9 [goldenbull] move pickle compression tests into a new class 9a07250 [goldenbull] Remove prepared compressed data. _get_handle will take care of compressed I/O 1cb810b [goldenbull] add zip decompression support. refactor using lambda. b8c4175 [goldenbull] add compressed pickle data file to io/tests 6df6611 [goldenbull] pickle compression code update 81d55a0 [Chen Jinniu] Merge branch 'master' into pickle_io_compression 025a0cd [goldenbull] add compression support for pickle
Right now,
read_csv()
has acompression
option, which allows the user to pass in a gzipped or bz2-compressed CSV file directly into Pandas to be read. It would be great ifread_pickle()
supported the same option. Pickles actually compress surprisingly well; I have a 567M Pandas pickle (resulting fromDataFrame.to_pickle()
) that packs down to 45M withpigz --best
. An order of magnitude difference in size is pretty significant. This makes storing static pickles long-term as gzipped archives a very attractive option. Workflow would be made easier if Pandas could natively handle mydataframe.pickle.gz
files in the same way it does compressed CSV files.More generally, a
compression
option should probably be allowed for mostread_*
methods. Many of theread_*
methods involve formats that compress very well.The text was updated successfully, but these errors were encountered: