Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpickling of pre 0.15 gzip-compressed objects not possible #10966

Closed
jphme opened this issue Sep 1, 2015 · 6 comments
Closed

Unpickling of pre 0.15 gzip-compressed objects not possible #10966

jphme opened this issue Sep 1, 2015 · 6 comments
Labels
Compat pandas objects compatability with Numpy or Python functions

Comments

@jphme
Copy link

jphme commented Sep 1, 2015

I pickled and gzip_compressed DataFrames in a tuple pre-0.15

with gzip.open('/tmp/test.pklz', 'wb') as datafile:
    pickle.dump((df1,df2), datafile)

Pre-0.15 i would load them again like:

with gzip.open('/tmp/test.pklz', 'rb') as datafile:
    df1, df2 = pickle.load(datafile)

Now after updating pandas, I can't load them back anyme:

>> TypeError: _reconstruct: First argument must be a sub-type of ndarray

If I directly try pd.read_pickle, it can't decrompress the file properly:

 df1, df2= pd.read_pickle('/tmp/test.pklz')
>> KeyError: '\x1f'

Reading the decompressed file does not work too:

with gzip.open('/tmp/test.pklz', 'rb') as datafile:
    df1, df2 = pd.read_pickle(datafile)
>> TypeError: coercing to Unicode: need string or buffer, GzipFile found

Is there any solution for this? This seems to be a serious issue for backwards compatibility if you need to downgrade pandas again to get the data back if there is no backup in other formats.

@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

http://pandas.pydata.org/pandas-docs/stable/io.html#io-pickle

you would need to show what you are actually pickling and what versions

this is backwards compatible back to 0.11 so not sure what the problem is

@shoyer
Copy link
Member

shoyer commented Sep 2, 2015

You should be able to write the unzipped pickle to a file and use read_pickle to read it. But pickle is really not a very good fileformat for data, for reasons exactly like this...

@jphme
Copy link
Author

jphme commented Sep 2, 2015

@jreback : Python 2.7.10, Pandas 0.14.1 (before update), Pandas 0.16.2 after update.The main problem is, that compatibility breaks if GZIP compression is used. The example with just open('/tmp/test.pklz', 'wb') works without any problems, I can then unpickle it with pd.read_pickle() after update.

@shoyer You are totally right, it was just a quick hack. Unzipping manually and reading the file afterwards does work and is apparently the only solution.

Overall not a huge problem (and another reminder one should not use pickle as data storage) - but due to this combination, you need quite a bit of code and error handling to work around this when updating. The optimal solution would be if pd.read_pickle would just work on GzipFile objects too (but I see that this is probably not widely used and therefore unnecessary..).

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Sep 2, 2015
@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

The issue is this one: #5924

pd.read_pickle expects an actual file (and not a bytes-like), pretty easy to do if you are interested.

@jreback jreback closed this as completed Sep 2, 2015
@jphme
Copy link
Author

jphme commented Sep 2, 2015

Well, I have zero experience in that area but could be good start - will have a look if I find some time!

@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

gr8!. contribution docs are here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants