Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpickling models across python3 and python2 #853

Closed
jayantj opened this issue Sep 11, 2016 · 8 comments
Closed

Unpickling models across python3 and python2 #853

jayantj opened this issue Sep 11, 2016 · 8 comments
Labels
difficulty medium Medium issue: required good gensim understanding & python skills wishlist Feature request

Comments

@jayantj
Copy link
Contributor

jayantj commented Sep 11, 2016

Training and saving the model with python3 -

from gensim.test.test_word2vec import LeeCorpus
from gensim.models.word2vec import Word2Vec
model = Word2Vec(LeeCorpus(), size=10)
model.save('temp-py3')

Loading it with python2.6 or 2.7 -

from gensim.models.word2vec import Word2Vec
model = Word2Vec.load('temp-py3')

Raises this error -

Traceback (most recent call last):
  File "<input>", line 1, in <module>
    model = Word2Vec.load('temp-py3')
  File "gensim/models/word2vec.py", line 1684, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "gensim/utils.py", line 248, in load
    obj = unpickle(fname)
  File "gensim/utils.py", line 911, in unpickle
    return _pickle.loads(f.read())
AttributeError: 'module' object has no attribute 'defaultdict'

This goes the other way around too (loading models saved with python2.7 in python3), although with a different error -

Traceback (most recent call last):
  File "<input>", line 1, in <module>
    m = Word2Vec.load('w2v_py2.7')
  File "/home/jayant/Projects/gensim/gensim/models/word2vec.py", line 1684, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "/home/jayant/Projects/gensim/gensim/utils.py", line 252, in load
    obj = unpickle(fname)
  File "/home/jayant/Projects/gensim/gensim/utils.py", line 915, in unpickle
    return _pickle.loads(f.read())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfb in position 0: ordinal not in range(128)
@jayantj jayantj changed the title Unpickling modes trained with python3 in python2 Unpickling models trained with python3 in python2 Sep 11, 2016
@jayantj jayantj changed the title Unpickling models trained with python3 in python2 Unpickling models across python3 and python2 Sep 11, 2016
@jayantj
Copy link
Contributor Author

jayantj commented Sep 12, 2016

The issue with unpickling a model pickled in python3 with python2.7 results from a bug in pickle which incorrectly maps collections to UserList/UserString
Here are some details - https://bugs.python.org/issue18473
The bug has been fixed in later versions of pickle, but python3.4 users are likely to run into issues.

The issue with unpickling a python2.7 model in python3 is because of encoding issues with numpy arrays - http://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3
Decoding with 'latin1' seems to work, though it could cause issues in case the pickle contains both numpy arrays and non-ISO-8859 characters

@tmylk
Copy link
Contributor

tmylk commented Sep 16, 2016

Same question on the mailing list https://groups.google.com/d/msg/gensim/LNS8X84B0yE/B6PZKmDnBgAJ

@tmylk
Copy link
Contributor

tmylk commented Sep 22, 2016

Strange that it is a different error to #860

@tmylk tmylk added wishlist Feature request difficulty medium Medium issue: required good gensim understanding & python skills labels Sep 22, 2016
@tmylk
Copy link
Contributor

tmylk commented Sep 28, 2016

The answer to this question is in Gensim FAQ
@jayantj is this answer sufficient to resolve the issue?

@jayantj
Copy link
Contributor Author

jayantj commented Sep 28, 2016

That sounds good, I'll add a small section for word2vec too (right now, it mentions only LDA models)

But this bug addresses loading models in Python 3, that were trained using Python 2.

The other bug is loading models trained in Python 2, that were trained using Python 3.

That results from a bug in versions of pickle pre-Python 3.5. It is fixable, but in case this isn't a common use case, we can possibly skip it. (and considering that I had to dig rather deep to uncover the bug and fix, I don't think a lot of people have faced this issue)

@tmylk
Copy link
Contributor

tmylk commented Dec 22, 2016

Fixed in #1039

@tmylk tmylk closed this as completed Dec 22, 2016
@yaoyao0830
Copy link

hi can i know the steps to solve this error..?

@harshaks23
Copy link

there is hickle which is faster than pickle and easier.
I tried to save and read it in pickle dump but while reading there were lot of problems and wasted an hour and still didnt find solution though I was working on my own data to create a chat bot.

vec_x and vec_y are numpy arrays

data=[vec_x,vec_y]
hkl.dump( data, 'new_data_file.hkl' )

Then you just read it and perform the operations

data2 = hkl.load( 'new_data_file.hkl' )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty medium Medium issue: required good gensim understanding & python skills wishlist Feature request
Projects
None yet
Development

No branches or pull requests

4 participants