Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

Closed
xchangcheng opened this issue Jul 12, 2016 · 5 comments
Assignees

Comments

@xchangcheng
Copy link

xchangcheng commented Jul 12, 2016

I trained a model with the corpus, and saved it to the disk by:
mode.save(filename)

Then I load the model and try to call the 'infer_vector' to calculate the vector of a new sentence by:
model = Doc2Vec.load(filename)
words = ['This', 'is', 'an', 'example']
model.infer_vector(words)

However, I get an exception as:
AttributeError: 'Doc2Vec' object has no attribute 'syn0'

How can I fixed this? The same reason like #483 ?

Thanks

@xchangcheng xchangcheng changed the title How to calculate the vector of a new sentence? AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector Jul 12, 2016
@piskvorky
Copy link
Owner

@xchangcheng can you explain your fix / reason for closing? Other people may google up this issue in the future.

@xchangcheng
Copy link
Author

@piskvorky Sorry for closing it before.

As in the #483, I found that when I tried to load my model, I haven't loaded syn0 & syn1 successfully. I think the model I have trained before may have some problems.

So I retrained it and the problem gone. The following may be the output for a successful loading :)

2016-07-12 17:22:28,782 - gensim.utils - INFO - loading Doc2Vec object from ./imdb.d2v
2016-07-12 17:22:29,587 - gensim.utils - INFO - loading docvecs recursively from ./imdb.d2v.docvecs.* with mmap=None
2016-07-12 17:22:29,587 - gensim.utils - INFO - loading syn1neg from ./imdb.d2v.syn1neg.npy with mmap=None
2016-07-12 17:22:29,596 - gensim.utils - INFO - loading syn0 from ./imdb.d2v.syn0.npy with mmap=None
2016-07-12 17:22:29,604 - gensim.utils - INFO - loading syn1 from ./imdb.d2v.syn1.npy with mmap=None
2016-07-12 17:22:29,612 - gensim.utils - INFO - setting ignored attribute syn0norm to None
2016-07-12 17:22:29,612 - gensim.utils - INFO - setting ignored attribute cum_table to None

@bradhackinen
Copy link

bradhackinen commented Jul 20, 2016

I'm having a similar problem. I'm not sure exactly what steps are required to reproduce it because it doesn't seem to happen every time.

I have a script which trains a model on about 700,000 paragraphs, with a vocabulary of about 100,000 words and then immediately saves the trained model using model.save(). When I just run one epoch, everything works fine: The syn0 and syn1 matrices are saved and I can load the model and compute similarities. But every time I have trained the model with a larger number of epochs (I'm trying 20. This takes a while so I have only done it a handful of times), the syn0 and syn1 matrices are not saved. Furthermore, after trying to save, the model object no longer has syn0 or syn1 properties, so if I try to train it again, I get "RuntimeError: you must first finalize vocabulary before training the model".

I don't know if the number of epochs is making a difference or if it is just a coincidence...

This is the most relevant part of my code:

epochs = 20
max_alpha = 0.025
min_alpha = 0.0001
modelSettings = {'size':300,'min_count':5,'window':8,'workers':3,'dm_concat':0,'alpha':max_alpha,'min_alpha':max_alpha}

modelName = 'model_dm'
modelSettings['dm'] = dm

print 'Initializing',modelName
model = Doc2Vec(getTaggedParagraphs(paragraphs),**modelSettings)

for i in range(epochs):
    alpha = (max_alpha-min_alpha)*(epochs-i-1)/(epochs-1) + min_alpha
    print 'Training %s epoch %2d, alpha: %.4f' % (modelName,i,alpha)

    model.alpha =  alpha
    model.min_alpha = alpha

    random.shuffle(paragraphs)
    model.train(getTaggedParagraphs(paragraphs))


print 'Saving',modelName
if not os.path.exists(modelDir):
    os.makedirs(modelDir)
model.save(os.path.join(modelDir,modelName))

(My paragraphs object contains both a string and tag for each paragraph, so the shuffle isn't mixing those up)

@gojomo
Copy link
Collaborator

gojomo commented Jul 20, 2016

The number of epochs shouldn't affect saving at all: the structures have the same size/shape no matter how much training has occurred.

If a save() is both failing, and leaving the model damaged, perhaps something odd caused a mid-save failure. But, that should be obvious from a thrown error, logging output, or both.

I suggest making sure you're using the latest gensim, enabling logging to the INFO level, and extending your code example to confirm the expected existence of syn0 etc before save and its absence after.

Unrelated notes about your code: by supplying a corpus to the Doc2Vec constructor, training will automatically occur. And, by a default iter value from Word2Vec, training will make 5 iterations over the supplied corpus. So in fact your code is doing (1+20) trains of 5 iterations each, 105 passes over your corpus.

@tmylk
Copy link
Contributor

tmylk commented Sep 25, 2016

Closing as abandoned

@tmylk tmylk closed this as completed Sep 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants