AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

xchangcheng · 2016-07-12T08:23:26Z

I trained a model with the corpus, and saved it to the disk by:
mode.save(filename)

Then I load the model and try to call the 'infer_vector' to calculate the vector of a new sentence by:
model = Doc2Vec.load(filename)
words = ['This', 'is', 'an', 'example']
model.infer_vector(words)

However, I get an exception as:
AttributeError: 'Doc2Vec' object has no attribute 'syn0'

How can I fixed this? The same reason like #483 ?

Thanks

piskvorky · 2016-07-12T09:49:42Z

@xchangcheng can you explain your fix / reason for closing? Other people may google up this issue in the future.

xchangcheng · 2016-07-12T09:56:24Z

@piskvorky Sorry for closing it before.

As in the #483, I found that when I tried to load my model, I haven't loaded syn0 & syn1 successfully. I think the model I have trained before may have some problems.

So I retrained it and the problem gone. The following may be the output for a successful loading :)

2016-07-12 17:22:28,782 - gensim.utils - INFO - loading Doc2Vec object from ./imdb.d2v
2016-07-12 17:22:29,587 - gensim.utils - INFO - loading docvecs recursively from ./imdb.d2v.docvecs.* with mmap=None
2016-07-12 17:22:29,587 - gensim.utils - INFO - loading syn1neg from ./imdb.d2v.syn1neg.npy with mmap=None
2016-07-12 17:22:29,596 - gensim.utils - INFO - loading syn0 from ./imdb.d2v.syn0.npy with mmap=None
2016-07-12 17:22:29,604 - gensim.utils - INFO - loading syn1 from ./imdb.d2v.syn1.npy with mmap=None
2016-07-12 17:22:29,612 - gensim.utils - INFO - setting ignored attribute syn0norm to None
2016-07-12 17:22:29,612 - gensim.utils - INFO - setting ignored attribute cum_table to None

bradhackinen · 2016-07-20T18:06:23Z

I'm having a similar problem. I'm not sure exactly what steps are required to reproduce it because it doesn't seem to happen every time.

I have a script which trains a model on about 700,000 paragraphs, with a vocabulary of about 100,000 words and then immediately saves the trained model using model.save(). When I just run one epoch, everything works fine: The syn0 and syn1 matrices are saved and I can load the model and compute similarities. But every time I have trained the model with a larger number of epochs (I'm trying 20. This takes a while so I have only done it a handful of times), the syn0 and syn1 matrices are not saved. Furthermore, after trying to save, the model object no longer has syn0 or syn1 properties, so if I try to train it again, I get "RuntimeError: you must first finalize vocabulary before training the model".

I don't know if the number of epochs is making a difference or if it is just a coincidence...

This is the most relevant part of my code:

epochs = 20
max_alpha = 0.025
min_alpha = 0.0001
modelSettings = {'size':300,'min_count':5,'window':8,'workers':3,'dm_concat':0,'alpha':max_alpha,'min_alpha':max_alpha}

modelName = 'model_dm'
modelSettings['dm'] = dm

print 'Initializing',modelName
model = Doc2Vec(getTaggedParagraphs(paragraphs),**modelSettings)

for i in range(epochs):
    alpha = (max_alpha-min_alpha)*(epochs-i-1)/(epochs-1) + min_alpha
    print 'Training %s epoch %2d, alpha: %.4f' % (modelName,i,alpha)

    model.alpha =  alpha
    model.min_alpha = alpha

    random.shuffle(paragraphs)
    model.train(getTaggedParagraphs(paragraphs))


print 'Saving',modelName
if not os.path.exists(modelDir):
    os.makedirs(modelDir)
model.save(os.path.join(modelDir,modelName))

(My paragraphs object contains both a string and tag for each paragraph, so the shuffle isn't mixing those up)

gojomo · 2016-07-20T23:26:19Z

The number of epochs shouldn't affect saving at all: the structures have the same size/shape no matter how much training has occurred.

If a save() is both failing, and leaving the model damaged, perhaps something odd caused a mid-save failure. But, that should be obvious from a thrown error, logging output, or both.

I suggest making sure you're using the latest gensim, enabling logging to the INFO level, and extending your code example to confirm the expected existence of syn0 etc before save and its absence after.

Unrelated notes about your code: by supplying a corpus to the Doc2Vec constructor, training will automatically occur. And, by a default iter value from Word2Vec, training will make 5 iterations over the supplied corpus. So in fact your code is doing (1+20) trains of 5 iterations each, 105 passes over your corpus.

tmylk · 2016-09-25T11:44:55Z

Closing as abandoned

xchangcheng changed the title ~~How to calculate the vector of a new sentence?~~ AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector Jul 12, 2016

xchangcheng closed this as completed Jul 12, 2016

xchangcheng reopened this Jul 12, 2016

piskvorky assigned tmylk Jul 12, 2016

tmylk closed this as completed Sep 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

xchangcheng commented Jul 12, 2016 •

edited

Loading

piskvorky commented Jul 12, 2016

xchangcheng commented Jul 12, 2016

bradhackinen commented Jul 20, 2016 •

edited

Loading

gojomo commented Jul 20, 2016

tmylk commented Sep 25, 2016

AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

Comments

xchangcheng commented Jul 12, 2016 • edited Loading

piskvorky commented Jul 12, 2016

xchangcheng commented Jul 12, 2016

bradhackinen commented Jul 20, 2016 • edited Loading

gojomo commented Jul 20, 2016

tmylk commented Sep 25, 2016

xchangcheng commented Jul 12, 2016 •

edited

Loading

bradhackinen commented Jul 20, 2016 •

edited

Loading