Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving/loading Word2Vec as described in the tutorial fails #2170

Closed
jbayardo opened this issue Aug 30, 2018 · 4 comments
Closed

Saving/loading Word2Vec as described in the tutorial fails #2170

jbayardo opened this issue Aug 30, 2018 · 4 comments
Labels
need info Not enough information for reproduce an issue, need more info from author

Comments

@jbayardo
Copy link

Description

Trained a W2V model as shown in the documentation, along with a callback to save it to disk. When I try to reload one of the checkpoints, I get the following error:

'module' object has no attribute 'Word2VecVocab'

Steps/Code/Corpus to Reproduce

from XXX import get_output_path
from gensim.models.callbacks import CallbackAny2Vec
from gensim.models import Word2Vec
from datetime import datetime, timedelta
import os

class EpochSaver(CallbackAny2Vec):
    '''Callback to save model after each epoch.'''
    def __init__(self, start_date, end_date, identifier=None):
        self.epoch = 0
        
        self.base_path = get_output_path(...)
        
        try:
            os.makedirs(self.base_path)
        except OSError:
            pass
        
        
    def on_epoch_end(self, model):
        relpath = get_output_path(...)
        output_path = os.path.join(self.base_path, relpath)
        
        try:
            os.remove(output_path)
        except:
            pass
        
        model.save(output_path)

        self.epoch += 1

epoch_saver = EpochSaver(START_DATE, END_DATE, 'v0_1')

sentences = ...

model = Word2Vec(sentences,
                 size=128,
                 window=40,
                 min_count=10,
                 sg=1,
                 hs=0,
                 negative=20,
                 ns_exponent=-0.5,
                 sample=1e-4,
                 iter=10,
                 workers=10,
                 callbacks=[epoch_saver])

And then, from a different notebook, with the same environment:

from gensim.models import Word2Vec

model = Word2Vec.load(...)

Expected Results

Model loaded

Actual Results

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-aebad99cda1c> in <module>()
----> 1 model = Word2Vec.load(...)

/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/word2vec.pyc in load(cls, *args, **kwargs)
   1285             logger.info('Model saved using code from earlier Gensim Version. Re-loading old model in a compatible way.')
   1286             from gensim.models.deprecated.word2vec import load_old_word2vec
-> 1287             return load_old_word2vec(*args, **kwargs)
   1288 
   1289 

/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/deprecated/word2vec.pyc in load_old_word2vec(*args, **kwargs)
    151 
    152 def load_old_word2vec(*args, **kwargs):
--> 153     old_model = Word2Vec.load(*args, **kwargs)
    154     vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
    155     params = {

/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/deprecated/word2vec.pyc in load(cls, *args, **kwargs)
   1615     @classmethod
   1616     def load(cls, *args, **kwargs):
-> 1617         model = super(Word2Vec, cls).load(*args, **kwargs)
   1618         # update older models
   1619         if hasattr(model, 'table'):

/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/deprecated/old_saveload.pyc in load(cls, fname, mmap)
     85         compress, subname = SaveLoad._adapt_by_suffix(fname)
     86 
---> 87         obj = unpickle(fname)
     88         obj._load_specials(fname, mmap, compress, subname)
     89         logger.info("loaded %s", fname)

/home/ubuntu/workspace/machine_learning_tools-recommendations/mpozzer_env/local/lib/python2.7/site-packages/gensim/models/deprecated/old_saveload.pyc in unpickle(fname)
    380             return _pickle.loads(file_bytes, encoding='latin1')
    381         else:
--> 382             return _pickle.loads(file_bytes)
    383 
    384 

AttributeError: 'module' object has no attribute 'Word2VecVocab'

Versions

Linux-4.4.0-1062-aws-x86_64-with-Ubuntu-16.04-xenial
('Python', '2.7.12 (default, Dec  4 2017, 14:50:18) \n[GCC 5.4.0 20160609]')
('NumPy', '1.13.1')
('SciPy', '0.19.1')
('gensim', '3.5.0')
('FAST_VERSION', 1)
@piskvorky piskvorky added the bug Issue described a bug label Aug 30, 2018
@piskvorky
Copy link
Owner

piskvorky commented Aug 30, 2018

Thanks for the report. Related to #1882 and #2136.

@menshikh-iv
Copy link
Contributor

Not reproduced, this is an code that I used

from gensim.test.utils import common_corpus, common_texts, get_tmpfile
from gensim.models.callbacks import CallbackAny2Vec
from gensim.models import Word2Vec


class EpochSaver(CallbackAny2Vec):
    '''Callback to save model after each epoch.'''
    def __init__(self, path_prefix):
        self.path_prefix = path_prefix
        self.epoch = 0

    def on_epoch_end(self, model):
        output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
        model.save(output_path)
        self.epoch += 1




saver = EpochSaver("my_w2v")
model = Word2Vec(common_texts, iter=5, size=10, min_count=0, seed=42, callbacks=[saver])
# other interpreter

from gensim.models import Word2Vec

loaded_model = Word2Vec.load("/tmp/my_w2v_epoch3.model")

@jbayardo please

  • make your code executable (simplify and fill missing things like sentences), because I need it for reproducing an issue (in my example all works as expected)
  • double-check than you use same enviroment

@menshikh-iv menshikh-iv added need info Not enough information for reproduce an issue, need more info from author and removed bug Issue described a bug labels Sep 4, 2018
@jbayardo
Copy link
Author

I reinstalled gensim and everything worked as expected

@menshikh-iv
Copy link
Contributor

Great @jbayardo, thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need info Not enough information for reproduce an issue, need more info from author
Projects
None yet
Development

No branches or pull requests

3 participants